Hybrid Artificial Intelligent Systems, Part II: 5th International Conference, HAIS 2010, San Sebastian, Spain, June 23-25, 2010, Proceedings (Lecture ... Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster Subseries of Lecture Notes i...

Author: Manuel Grana Romay | Alexandre Manhaes Savio

73 downloads 1154 Views 12MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science

6077

Emilio Corchado Manuel Graña Romay Alexandre Manhaes Savio (Eds.)

Hybrid Artificial Intelligence Systems 5th International Conference, HAIS 2010 San Sebastián, Spain, June 23-25, 2010 Proceedings, Part II

13

Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Emilio Corchado Universidad de Salamanca, Spain E-mail: [email protected] Manuel Graña Romay Facultad de informatica UPV/EHU San Sebastian, Spain E-mail: [email protected] Alexandre Manhaes Savio Facultad de informatica UPV/EHU San Sebastian, Spain E-mail: [email protected]

Library of Congress Control Number: 2010928917

CR Subject Classification (1998): I.2, H.3, F.1, H.4, I.4, I.5 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13

0302-9743 3-642-13802-0 Springer Berlin Heidelberg New York 978-3-642-13802-7 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180

Preface

The 5th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2010) has become a unique, established and broad interdisciplinary forum for researchers and practitioners who are involved in developing and applying symbolic and sub-symbolic techniques aimed at the construction of highly robust and reliable problem-solving techniques, and bringing the most relevant achievements in this field. Overcoming the rigid encasing imposed by the arising orthodoxy in the field of artificial intelligence, which has led to the partition of researchers into so-called areas or fields, interest in hybrid intelligent systems is growing because they give freedom to design innovative solutions to the ever-increasing complexities of real-world problems. Noise and uncertainty call for probabilistic (often Bayesian) methods, while the huge amount of data in some cases asks for fast heuristic (in the sense of suboptimal and ad-hoc) algorithms able to give answers in acceptable time frames. High dimensionality demands linear and non-linear dimensionality reduction and feature extraction algorithms, while the imprecision and vagueness call for fuzzy reasoning and linguistic variable formalization. Nothing impedes real-life problems to mix difficulties, presenting huge quantities of noisy, vague and high-dimensional data; therefore, the design of solutions must be able to resort to any tool of the trade to attack the problem. Combining diverse paradigms poses challenging problems of computational and methodological interfacing of several previously incompatible approaches. This is, thus, the setting of HAIS conference series, and its increasing success is the proof of the vitality of this exciting field. This volume of Lecture Notes on Artificial Intelligence (LNAI) includes accepted papers presented at HAIS 2010 held in the framework of the prestigious “Cursos de Verano of the Universidad del Pais Vasco” at the beautiful venue of Palacio de Miramar, San Sebastián, Spain, in June 2010. Since its first edition in Brazil in 2006, HAIS has become an important forum for researchers working on fundamental and theoretical aspects of hybrid artificial intelligence systems based on the use of agents and multi-agent systems, bioinformatics and bio-inspired models, fuzzy systems, artificial vision, artificial neural networks, optimization models and alike. HAIS 2010 received 269 technical submissions. After a rigorous peer-review process, the International Program Committee selected 133 papers which are published in these conference proceedings. In this edition emphasis was put on the organization of special sessions. Fourteen special sessions, containing 84 papers, were organized on the following topics: • • • • •

Real-World HAIS Applications and Data Uncertainty Computational Intelligence for Recommender Systems Signal Processing and Biomedical Applications Methods of Classifiers Fusion Knowledge Extraction Based on Evolutionary Learning

VI

Preface

• • • • • • • •

Systems, Man, and Cybernetics by HAIS Workshop Hybrid Intelligent Systems on Logistics Hybrid Reasoning and Coordination Methods on Multi-Agent Systems HAIS for Computer Security Hybrid and Intelligent Techniques on Multimedia Hybrid ANNs: Models, Algorithms and Data Hybrid Artificial Intelligence Systems Based on Lattice Theory Information Fusion: Frameworks and Architectures

The selection of papers was extremely rigorous in order to maintain the high quality of the conference, and we would like to thank the Program Committee for their hard work in the reviewing process. This process is very important for the creation of a conference of high standard, and the HAIS conference would not exist without their help. The large number of submissions is certainly not only testimony to the vitality and attractiveness of the field but an indicator of the interest in the HAIS conferences themselves. As a follow-up of the conference, we anticipate further publication of selected papers in special issues scheduled for the following journals: • • • • •

Information Science, Elsevier Neurocomputing, Elsevier Journal of Mathematical Imaging and Vision, Springer Information Fusion, Elsevier Logic Journal of the IPL, Oxford Journals

HAIS 2010 enjoyed outstanding keynote speeches by distinguished guest speakers: • • • • • •

Gerhard Ritter, University of Florida (USA) Mihai Datcu, Paris Institute of Technology, Telecom Paris (France) Marios Polycarpou, University of Cyprus (Cyprus) Ali-Akbar Ghorbani, University of New Brunswick (Canada) James Llinas, Universidad Carlos III de Madrid (Spain) Éloi Bossé, Defence Research and Development Canada (DRDC Valcartier) (Canada)

We would like to fully acknowledge support from the GICAP Group of the University of Burgos, the BISISTE Group from the University of Salamanca, the GIC (www.ehu.es/ccwintco), Vicerrectorado de Investigación and the Cursos de Verano of the Universidad del Pais Vasco, the Departamento de Educación, Ciencia y Universidades of the Gobierno Vasco, Vicomtech, and the Ministerio de Ciencia e Investigación. The IEEE Systems, Man & Cybernetics Society, through its Spanish chapter, and the IEEE-Spanish Section also supported this event. We also want to extend our warm gratitude to all the Special Session Chairs for their continuing support to the HAIS series of conferences.

VII

Preface

We wish to thank Alfred Hoffman and Anna Kramer from Springer for their help and collaboration during this demanding publication project. The local organizing team (Alexandre Manhaes Savio, Ramón Moreno, Maite García Sebastian, Elsa Fernandez, Darya Chyzyk, Miguel Angel Veganzones, Ivan Villaverde) did a superb job. Without their enthusiastic support the whole conference burden would have crushed our frail shoulders.

June 2010

Emilio Corchado Manuel Graña

Organization

Honorary Chair Carolina Blasco María Isabel Celáa Diéguez Marie Cottrell Daniel Yeung

Director of Telecommunication, Regional Goverment of Castilla y León (Spain) Consejera de Educación del Gobierno Vasco (Spain) Institute SAMOS-MATISSE, Universite Paris 1 (France) IEEE SMCS President (China)

General Chairs Emilio Corchado Manuel Graña

University of Salamanca (Spain) University of the Basque Country (Spain)

International Advisory Committee Ajith Abraham Carolina Blasco Pedro M. Caballero Andre de Carvalho Juan M. Corchado José R. Dorronsoro Mark A. Girolami Petro Gopych Francisco Herrera César Hervás-Martínez Tom Heskes Lakhmi Jain Samuel Kaski Daniel A. Keim Isidro Laso Witold Pedrycz Xin Yao Hujun Yin Michal Wozniak

Norwegian University of Science and Technology (Norway) Director of Telecommunication, Regional Goverment of Castilla y León (Spain) CARTIF (Spain) University of São Paulo (Brazil) University of Salamanca (Spain) Autonomous University of Madrid (Spain) University of Glasgow (UK) Universal Power Systems USA-Ukraine LLC (Ukraine) University of Granada (Spain) University of Córdoba (Spain) Radboud University Nijmegen (The Netherlands) University of South Australia (Australia) Helsinki University of Technology (Finland) Computer Science Institute, University of Konstanz (Germany) D.G. Information Society and Media (European Commission) University of Alberta (Canada) University of Birmingham (UK) University of Manchester (UK) Wroclaw University of Technology (Poland)

X

Organization

Publicity Co-chairs Emilio Corchado Manuel Graña

University of Salamanca (Spain) University of the Basque Country (Spain)

Program Committee Manuel Graña Emilio Corchado Agnar Aamodt Jesús Alcalá-Fernández Rafael Alcalá José Luis Álvarez Davide Anguita Bruno Apolloni Antonio Aráuzo-Azofra Estefania Argente Fidel Aznar Jaume Bacardit Antonio Bahamonde Javier Bajo John Beasley Bruno Baruque Joé Manuel Benítez Ester Bernadó Richard Blake Juan Botía Prof Vicente Botti Robert Burduk José Ramón Cano Cristóbal José Carmona Blanca Cases Oscar Castillo Paula María Castro Castro Jonathan Chan Richard Chbeir Enhong Chen Camelia Chira Sung-Bae Cho Darya Chyzhyk Juan Manuel Corchado Emilio Corchado

University of the Basque Country (Spain) (PC Co-chair) University of Salamanca (Spain) (PC Co-chair) Norwegian University of Science and Technology (Norway) University of Granada (Spain) University of Granada (Spain) University of Huelva (Spain) University of Genoa (Italy) Università degli Studi di Milano (Italy) University of Córdoba (Spain) University of Valencia (Spain) University of Alicante (Spain) University of Nottingham (UK) University of Oviedo (Spain) Universidad Pontifícia de Salamanca (Spain) Brunel University (UK) University of Burgos (Spain) University of Granada (Spain) Universitat Ramon Lull (Spain) Norwegian University of Science and Technology University of Murcia (Spain) Universidad Politécnica de Valencia (Spain) Wroclaw University of Technology (Poland) University of Jaén (Spain) University of Jaén (Spain) University of the Basque Country (Spain) Tijuana Institute of Technology (Mexico) Universidade da Coruña (Spain) King Mongkut's University of Technology Thonburi (Thailand) Bourgogne University (France) University of Science and Technology of China (China) University of Babes-Bolyai (Romania) Yonsei University (Korea) University of the Basque Country (Spain) University of Salamanca (Spain) University of Salamanca (Spain)

Organization

Rafael Corchuelo Guiomar Corral Raquel Cortina Parajon Carlos Cotta José Alfredo F. Costa Leticia Curiel Alfredo Cuzzocrea Keshav Dahal Theodoros Damoulas Ernesto Damiani Bernard De Baets Enrique de la Cal Javier de Lope Asiain Marcilio de Souto María José del Jesús Ricardo del Olmo Joaquín Derrac Nicola Di Mauro António Dourado Richard Duro Susana Irene Díaz José Dorronsoro Pietro Ducange Talbi El-Ghazali Aboul Ella Hassanien Marc Esteva Juan José Flores Alberto Fernández Alberto Fernández Elías Fernández-Combarro Álvarez Elsa Fernández Nuno Ferreira Richard Freeman Rubén Fuentes Giorgio Fumera Bogdan Gabrys João Gama Matjaz Gams Jun Gao TOM Heskes Isaías García José García Salvador García Neveen Ghali

University of Seville (Spain) University Ramon Lull (Spain) University of Oviedo (Spain) University of Málaga (Spain) Universidade Federal do Rio Grande do Norte (Brazil) University of Burgos (Spain) University of Calabria (Italy) University of Bradford (UK) Cornell University (UK) University of Milan (Italy) Ghent University (Belgium) University of Oviedo (Spain) Universidad Politécnica de Madrid (Spain) Universidade Federal do Rio Grande do Norte (Brazil) University of Jaén (Spain) University of Burgos (Spain) University of Granada (Spain) University of Bari (Italy) University of Coimbra (Portugal) University of Coruña (Spain) University of Oviedo (Spain) Universidad Autónoma de Madrid (Spain) University of Pisa (Italy) University of Lille (France) University of Cairo (Egypt) Artificial Intelligence Research Institute (Spain) University of Michoacana (Mexico) Universidad Rey Juan Carlos (Spain) University of Granada (Spain) University of Oviedo (Spain) University of the Basque Country (Spain) Instituto Politécnico de Coimbra (Portugal) Capgemini (Spain) Universidad Complutense de Madrid (Spain) University of Cagliari (Italy) Bournemouth University (UK) University of Porto (Portugal) Jozef Stefan Institute Ljubljana (Slovenia) Hefei University of Technology (China) Radboud University Nijmegen (The Netherlands) University of León (Spain) University of Alicante (Spain) University of Jaén (Spain) Azhar University (Egypt)

XI

XII

Organization

Adriana Giret Jorge Gómez Pedro González Petro Gopych Juan Manuel Górriz Maite García-Sebastián Manuel Graña Maciej Grzenda Arkadiusz Grzybowski Jerzy Grzymala-Busse Anne Håkansson Saman Halgamuge José Alberto Hernández Carmen Hernández Francisco Herrera Álvaro Herrero Sean Holden Vasant Honavar Vicente Julián Konrad Jackowski Yaochu Jin Ivan Jordanov Ulf Johansson Juha Karhunen Frank Klawonn Andreas König Mario Köppen Rudolf Kruse Bernadetta Kwintiana Dario Landa-Silva Soo-Young Lee Lenka Lhotská Hailin Liu Otoniel López Karmele López Teresa Ludermir Julián Luengo Wenjian Luo Núria Macià Kurosh Madani Ana Maria Madureira

Universidad Politécnica de Valencia (Spain) Universidad Complutense de Madrid (Spain) University of Jaén (Spain) Universal Power Systems USA-Ukraine LLC (Ukraine) University of Granada (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) Warsaw University of Technology (Poland) Wroclaw University of Technology (Poland) University of Kansas (USA) Stockholm University (Sweden) The University of Melbourne (Australia) Universidad Autónoma del Estado de Morelos (Mexico) University of the Basque Country (Spain) University of Granada (Spain) University of Burgos (Spain) University of Cambridge (UK) Iowa State University (USA) Universidad Politécnica de Valencia (Spain) Wroclaw University of Technology (Poland) Honda Research Institute Europe (Germany) University of Portsmouth (UK) University of Borås (Sweden) Helsinki University of Technology (Finland) University of Applied Sciences Braunschweig/Wolfenbuettel (Germany) University of Kaiserslautern (Germany) Kyushu Institute of Technology (Japan) Otto-von-Guericke-Universität Magdeburg (Germany) Universität Stuttgart (Germany) University of Nottingham (UK) Brain Science Research Center (Korea) Czech Technical University in Prague (Czech Republic) Guangdong University of Technology (China) Universidad Autónoma de Madrid (Spain) University of the Basque Country (Spain) Universidade Federal de Pernambuco (Brazil) University of Granada (Spain) University of Science and Technology of China (China) Universitat Ramon Llull (Spain) University of Paris-Est Creteil (France) Instituto Politécnico do Porto (Portugal)

Organization

Roque Marin Yannis Marinakis José Fco. Martínez-Trinidad José Luis Martínez Jacinto Mata Giancarlo Mauri David Meehan Gerardo M. Méndez Abdel-Badeeh M. Salem Masoud Mohammadian José Manuel Molina Claudio Moraga Marco Mora Ramón Moreno Susana Nascimento Martí Navarro Yusuke Nojima Alberto Ochoa Albert Orriols Rubé Ortiz Vasile Palade Stephan Pareigis Witold Pedrycz Elzbieta Pekalska Carlos Pereira Antonio Peregrín Lina Petrakieva Gloria Phillips-Wren Han Pingchou Camelia Pintea Julio Ponce Khaled Ragab José Ranilla Javier Ramírez Romain Raveaux Carlos Redondo Raquel Redondo Bernadete Ribeiro Ramón Rizo Peter Rockett Adolfo Rodríguez Rosa M. Rodríguez Maraña Katya Rodriguez-Vázquez Fabrice Rossi António Ruano Ozgur Koray Sahingoz

XIII

University of Murcia (Spain) Technical University of Crete (Grece) INAOE (Mexico) University of Castilla - La Mancha (Spain) University of Huelva (Spain) University of Milano-Bicocca (Italy) Dublin Institute of Technology (Ireland) Instituto Tecnológico de Nuevo León (Mexico) Ain Shams University (Egypt) University of Canberra (Australia) University Carlos III of Madrid (Spain) European Centre for Soft Computing (Spain) Universidad Católica del Maule (Spain) University of the Basque Country (Spain) Universidade Nova de Lisboa (Portugal) Universidad Politécnica de Valencia (Spain) Osaka Prefecture University (Japan) Juarez City University/CIATEC (Mexico) University Ramon LLull (Spain) Universidad Rey Juan Carlos (Spain) Oxford University (USA) Hamburg University of Applied Sciences (Germany) University of Alberta (Canada) University of Manchester (UK) Universidade de Coimbra (Portugal) University of Huelva (Spain) Glasgow Caledonian University (UK) Loyola College in Maryland (USA) Peking University (China) University of Babes-Bolyai (Romania) Universidad Autónoma de Aguascalientes (Mexico) King Faisal University (Saudi Arabia) University of Oviedo (Spain) University of Granada (Spain) La Rochelle University (France) University of León (Spain) University of Burgos (Spain) University of Coimbra (Portugal) University of Alicante (Spain) University of Sheffield (UK) University of León (Spain) University of León (Spain) Universidad Nacional Autónoma de México (Mexico) TELECOM ParisTech (France) University of Algarve (Portugal) Turkish Air Force Academy (Turkey)

XIV

Organization

Wei-Chiang Samuelson Hong Luciano Sánchez José Santamaría Alexandre Savio Mrs. Fatima Sayuri Quezada Gerald Schaefer Robert Schaefer Javier Sedano Leila Shafti Dragan Simic Konstantinos Sirlantzis Dominik Slezak Cecilia Sönströd Ying Tan Ke Tang Nikos Thomaidis Alicia Troncoso Eiji Uchino Roberto Uribeetxeberria José Valls Miguel Ángel Veganzones Sebastian Ventura José Luis Verdegay José Ramón Villar José Ramón Cano Krzysztof Walkowiak Guoyin Wang Michal Wozniak Zhuoming Xu Ronald Yager Hujun Yin Constantin Zopounidis Huiyu Zhou Rodolfo Zunino Urko Zurutuza

Oriental Institute of Technology (Taiwan) University of Oviedo (Spain) University of Jaén (Spain) University of the Basque Country (Spain) Universidad Autónoma de Aguascalientes (Mexico) Aston University (UK) AGH University of Science and Technology (Poland) University of Burgos (Spain) Universidad Autónoma de Madrid (Spain) Novi Sad Fair (Serbia) University of Kent (UK) University of Regina (Canada) University of Borås (Sweden) Peking University (China) University of Science and Technology of China (China) University of the Aegean (Greece) Universidad Pablo de Olavide de Sevilla (Spain) Yamaguchi University (Japan) Mondragon University (Spain) University Carlos III of Madrid (Spain) University of the Basque Country (Spain) Universidad de Córdoba (Spain) University of Granada (Spain) University of Oviedo (Spain) University of Jaén (Spain) Wroclaw University of Technology (Poland) Chongqing University of Posts and Telecommunications (China) Wroclaw University of Technology (Poland) Hohai University (China) Iona College (US) The University of Manchester (UK) Technical University of Crete (Greece) Brunel University (UK) University of Genoa (Italy) Mondragon University (Spain)

Special Session Committees Real-World HAIS Applications and Data Uncertainty José Ramón Villar André Carvalho Camelia Pintea

University of Oviedo (Spain) University of São Paulo (Brazil) University of Babes-Bolyai (Romania)

Organization

Eduardo Raúl Hruschka Oscar Ibañez Paula Mello Javier Sedano Adolfo Rodríguez Camelia Chira José Ramón Villar Luciano Sánchez Luis Oliveira María del Rosario Suárez Carmen Vidaurre Enrique de la Cal Gerardo M. Méndez Ana Palacios Luis Junco

University of São Paulo (Brazil) European Centre for Soft Computing (Spain) University of Bologna (Italy) University of Burgos (Spain) Universidad de León (Spain) University of Babes-Bolyai (Romania) University of Oviedo (Spain) University of Oviedo (Spain) University of Oviedo (Spain) University of Oviedo (Spain) Technical University of Berlin (Germany) University of Oviedo (Spain) Instituto Tecnológico de Nuevo León (Mexico) University of Oviedo (Spain) University of Oviedo (Spain)

Signal Processing and Biomedical Applications Juan Manuel Górriz Carlos G. Putonet Elmar W. Lang Javier Ramírez Juan Manuel Gorriz Manuel Graña Maite García-Sebastián Alexandre Savio Ana María Pefeito Tome Elsa Fernández Isabel Barbancho Diego Pablo Ruiz Padillo Fermín Segovia Román Ingo Keck Manuel Canton Miriam Lopez Perez Rosa Chaves Rodríguez Roberto Hornero Andres Ortiz Diego Salas-Gonzalez Ignacio Álvarez Ignacio Turias Jose Antonio Piedra Maria del Carmen Carrión Roberto Hornero Ruben Martín

University of Granada (Spain) University of Granada (Spain) University of Regensburg (Germany) University of Granada (Spain) University of Granada (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of Aveiro (Portugal) University of the Basque Country (Spain) University of Málaga (Spain) University of Granada (Spain) University of Granada (Spain) University of Regensburg (Spain) University of Almeria (Spain) University of Granada University of Granada (Spain) University of Valladolid (Spain) University of Malaga (Spain) University of Granada (Spain) University of Granada (Spain) University of Cadiz (Spain) University of Almeria (Spain) University of Granada (Spain) University of Valladolid University of Seville (Spain)

XV

XVI

Organization

Methods of Classifiers Fusion Michal Wozniak Álvaro Herrero Bogdan Trawinski Giorgio Fumera José Alfredo F. Costa Konrad Jackowski Konstantinos Sirlantzis Przemyslaw Kazienko Bruno Baruque Jerzy Stefanowski Robert Burduk Michal Wozniak Emilio Corchado Igor T. Podolak Vaclav Snasel Elzbieta Pekalska Bogdan Gabrys

Wroclaw University of Technology (Poland) University of Burgos (Spain) Wroclaw University of Technology (Poland) University of Cagliari (Italy) Universidade Federal do Rio Grande do Norte (Brazil) Wroclaw University of Technology (Poland) University of Kent (UK) Wroclaw University of Technology (Poland) University of Burgos (Spain) Poznan University of Technology (Poland) Wroclaw University of Technology (Poland) Wroclaw University of Technology (Poland) University of Salamanca (Spain) Jagiellonian University (Poland) VSB-Technical University of Ostrava (Czech Republic) University of Manchester (UK) Bournemouth University (UK)

Knowledge Extraction Based on Evolutionary Learning Sebastián Ventura Amelia Zafra Eva Lucrecia Gibaja Jesus Alcala-Fernández Salvador García Mykola Pechenizkiy Pedro González Antonio Peregrin Rafael Alcalá Cristóbal Romero Ekaterina Vasileya

University of Córdoba (Spain) University of Córdoba (Spain) University of Córdoba (Spain) University of Granada (Spain) University of Jaén (Spain) Technical University of Eindhoven (The Netherlands) University of Jaén (Spain) University of Huelva (Spain) University of Granada (Spain) University of Córdoba (Spain) Technical University of Eindhoven (The Netherlands)

Systems, Man, and Cybernetics by HAIS Workshop Emilio Corchado Juan M. Corchado Álvaro Herrero Bruno Baruque Javier Sedano Juan Pavón Manuel Graña Ramón Rizo

University of Salamanca (Spain) University of Salamanca (Spain) University of Burgos (Spain) University of Burgos (Spain) University of Burgos (Spain) University Complutense Madrid (Spain) University of the Basque Country (Spain) University of Alicante (Spain)

Organization

Richard Duro Sebastian Ventura Vicente Botti José Manuel Molina Lourdes Sáiz Barcena Francisco Herrera Leticia Curiel César Hervás Sara Rodríguez

University of A Coruña (Spain) University of Córdoba (Spain) Polytechnical University of Valencia (Spain) University Carlos III of Madrid (Spain) University of Burgos (Spain) University of Granada (Spain) University of Burgos (Spain) Univesity of Córdoba (Spain) University of León (Spain)

Hybrid Intelligent Systems on Logistics Camelia Chira Alberto Ochoa Zezzati Arturo Hernández Katya Rodríguez Fabricio Olivetti Gloria Cerasela Crisan Anca Gog Camelia-M. Pintea Petrica Pop Barna Iantovics

Babes-Bolyai University (Romania) Juarez City University (Mexico) CIMAT (Mexico) UNAM (Mexico) University of Campinas (Brazil) University of Bacau (Romania) Babes-Bolyai University (Romania) Babes-Bolyai University (Romania) North University Baia-Mare (Romania) Petru Maior University Targu-Mures (Romania)

Hybrid Reasoning and Coordination Methods on Multi-agent Systems Martí Navarro Yacer Javier Bajo Juan Botía Juan Manuel Corchado Luís Búrdalo Stella Heras Vicente Botti Vicente J. Julián Rubén Ortiz Rubén Fuentes Adriana Giret Alberto Fernández Marc Esteva Carlos Carrascosa Martí Navarro

Universidad Politécnica de Valencia (Spain) Universidad Pontificia de Salamanca (Spain) Universidad de Murcia (Spain) Universidad de Salamanca (Spain) Universidad Politécnica de Valencia (Spain) Universidad Politécnica de Valencia (Spain) Universidad Politécnica de Valencia (Spain) Universidad Politécnica de Valencia (Spain) Universidad Rey Juan Carlos (Spain) Universidad Complutense de Madrid (Spain) Universidad Politécnica de Valencia (Spain) Universidad Rey Juan Carlos (Spain) IIIA-CSIC (Spain) Universidad Politécnica de Valencia (Spain) Universidad Politécnica de Valencia (Spain)

HAIS for Computer Security (HAISfCS) Álvaro Herrero Emilio Corchado

University of Burgos (Spain) University of Salamanca (Spain)

XVII

XVIII

Organization

Huiyu Huiyu Zhou Belén Vaquerizo Cristian I. Pinzón Dante I. Tapia Javier Bajo Javier Sedano Juan F. De Paz Santana Sara Rodríguez Raquel Redondo Leticia Curiel Bruno Baruque Ángel Arroyo Juan M. Corchado

Queen's University Belfast (UK) University of Burgos (Spain) University of Salamanca (Spain) University of Salamanca (Spain) Pontifical University of Salamanca (Spain) University of Burgos (Spain) University of Salamanca (Spain) University of Salamanca (Spain) University of Burgos (Spain) University of Burgos (Spain) University of Burgos (Spain) University of Burgos (Spain) University of Salamanca (Spain)

Hybrid and Intelligent Techniques on Multimedia Adriana Dapena Janeiro José Martínez Otoniel López Ramón Moreno Manuel Graña Eduardo Martínez Javier Ruiz José Luis Martínez Daniel Iglesia Elsa Fernández Paula Castro

Universidade da Coruña (Spain) Universidad Autónoma de Madríd (Spain) Miguel Hernandez University (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of Murcia (Spain) Polytechnic University of Catalonia (Spain) Universidad de Castilla-La Mancha (Spain) Universidade da Coruña (Spain) University of the Basque Country (Spain) Universidade da Coruña (Spain)

Hybrid ANN: Models, Data Models, Algorithms and Data César Hervás-Martinez Pedro Antonio Gutiérrez Francisco Fernández-Navarroi Aldo Franco Dragoni Ángel Manuel Pérez-Bellido Daniel Mateos-García Francisco Fernández-Navarro Germano Vallesi José C. Riquelme-Santos Sancho Salcedo-Sanz Ekaitz Zulueta-Guerrero Emilio G. Ortíz-García Kui Li Liang Yu Alicia D'Anjou Francisco José Martínez-Estudillo

University of Córdoba (Spain) University of Córdoba (Spain) University of Córdoba (Spain) Università Politecnica delle Marche (Italy) University of Alcalá (Spain) University of Alcalá (Spain) University of Córdoba (Spain) Università Politecnica delle Marche (Italy) University of Sevilla (Spain) University of Alcalá (Spain) University of the Basque Country (Spain) University of Alcalá (Spain) Xidian University (China) (China) University of the Basque Country (Spain) University of Córdoba (Spain)

Organization

Juan Carlos Fernández Lin Gao Javier Sánchez-Monedero

XIX

University of Córdoba (Spain) Hefei University of Technology (China) University of Cordoba (Spain)

Hybrid Artificial Intelligence Systems Based on Lattice Theory Vassilis Kaburlasos Cliff Joslyn Juan Humberto Sossa Azuela Angelos Amanatiadis George Papakostas Gonzalo Urcid Peter Sussner Radim Belohlavek Theodore Pachidis Vassilis Syrris Anestis Hatzimichailidis Gonzalo Aranda-Corral Kevin Knuth Manuel Graña Gerhard Ritter Lefteris Moussiades Isabelle Bloch

Technological Educational Institution of Kavala (Greece) Pacific Northwest National Laboratory (USA) Centro de Investigación en Computación (Mexico) Technological Educational Institution of Kavala (Greece) Democritus University of Thrace (Greece) National Institute of Astrophysics, Optics and Electronics (Mexico) State University of Campinas (Brazil) Palacky University (Czech Republic) Technological Educational Institution of Kavala (Greece) Aristotle University of Thessaloniki (Greece) Technological Educational Institution of Kavala (Greece) University of Huelva (Spain) University at Albany (USA) University of the Basque Country (Spain) University of Florida (USA) Technological Educational Institution of Kavala (Greece) Ecole Nationale Supérieure des Télécommunications (France)

Information Fusion: Frameworks and Architectures José Manuel Molina López Javier Bajo Jose M. Armingol Juan A. Besada Miguel A. Patricio Arturo de la Escalera Eloi Bosse Jesus Garcia Jose M. Molina Antonio Berlanga James Llinas Ana M. Bernardos

University Carlos III (Spain) University of Salamanca (Spain) University Carlos III (Spain) Universidad Politécnica de Madrid (Spain) University Carlos III (Spain) University Carlos III (Spain) Defence R&D Canada (Canada) University Carlos III (Spain) University Carlos III (Spain) University Carlos III (Spain) University Carlos III (Spain) Universidad Politécnica de Madrid (Spain)

XX

Organization

Local Organizing Committee Manuel Graña Darya Chyzhyk Elsa Fernández Maite García-Sebastián Carmen Hernández Gómez Alexandre Manhães Savio Ramón Moreno Miguel Angel Veganzones Iván Villaverde

University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain)

Table of Contents – Part II

SIFT-SS: An Advanced Steady-State Multi-Objective Genetic Fuzzy System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michel Gonz´ alez, Jorge Casillas, and Carlos Morell

1

Evolving Multi-label Classiﬁcation Rules with Gene Expression Programming: A Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Jos´e Luis Avila-Jim´ enez, Eva Gibaja, and Sebasti´ an Ventura

9

Solving Classiﬁcation Problems Using Genetic Programming Algorithms on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alberto Cano, Amelia Zafra, and Sebasti´ an Ventura

17

Analysis of the Eﬀectiveness of G3PARM Algorithm . . . . . . . . . . . . . . . . . . J.M. Luna, J.R. Romero, and S. Ventura

27

Reducing Dimensionality in Multiple Instance Learning with a Filter Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amelia Zafra, Mykola Pechenizkiy, and Sebasti´ an Ventura

35

Graphical Exploratory Analysis of Educational Knowledge Surveys with Missing and Conﬂictive Answers Using Evolutionary Techniques . . . Luciano S´ anchez, In´es Couso, and Jos´e Otero

45

Data Mining for Grammatical Inference with Bioinformatics Criteria . . . Vivian F. L´ opez, Ramiro Aguilar, Luis Alonso, Mar´ıa N. Moreno, and Juan M. Corchado

53

Hybrid Multiagent System for Automatic Object Learning Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ana Gil, Fernando de la Prieta, and Vivian F. L´ opez

61

On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miguel A. Veganzones and Carmen Hern´ andez

69

Self-emergence of Lexicon Consensus in a Population of Autonomous Agents by Means of Evolutionary Strategies . . . . . . . . . . . . . . . . . . . . . . . . . Dar´ıo Maravall, Javier de Lope, and Ra´ ul Dom´ınguez

77

Enhanced Self Organized Dynamic Tree Neural Network . . . . . . . . . . . . . . Juan F. De Paz, Sara Rodr´ıguez, Ana Gil, Juan M. Corchado, and Pastora Vega

85

XXII

Table of Contents – Part II

Agents and Computer Vision for Processing Stereoscopic Images . . . . . . . Sara Rodr´ıguez, Fernando de la Prieta, Dante I. Tapia, and Juan M. Corchado Incorporating Temporal Constraints in the Planning Task of a Hybrid Intelligent IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Alvaro Herrero, Mart´ı Navarro, Vicente Juli´ an, and Emilio Corchado HERA: A New Platform for Embedding Agents in Heterogeneous Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ ´ Ricardo S. Alonso, Juan F. De Paz, Oscar Garc´ıa, Oscar Gil, and Ang´elica Gonz´ alez A Genetic Algorithm for Solving the Generalized Vehicle Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P.C. Pop, O. Matei, C. Pop Sitar, and C. Chira

93

101

111

119

Using Cultural Algorithms to Improve Intelligent Logistics . . . . . . . . . . . . Alberto Ochoa, Yazmani Garc´ıa, Javier Ya˜ nez, and Yaddik Teymanoglu

127

A Cultural Algorithm for the Urban Public Transportation . . . . . . . . . . . . Laura Cruz Reyes, Carlos Alberto Ochoa Ort´ız Zezzatti, Claudia G´ omez Santill´ an, Paula Hern´ andez Hern´ andez, and Mercedes Villa Fuerte

135

Scalability of a Methodology for Generating Technical Trading Rules with GAPs Based on Risk-Return Adjustment and Incremental Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.A. de la Cal, E.M. Fern´ andez, R. Quiroga, J.R. Villar, and J. Sedano Hybrid Approach for the Public Transportation Time Dependent Orienteering Problem with Time Windows . . . . . . . . . . . . . . . . . . . . . . . . . . Ander Garcia, Olatz Arbelaitz, Pieter Vansteenwegen, Wouter Souﬀriau, and Maria Teresa Linaza

143

151

A Functional Taxonomy for Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Esparcia and Estefan´ıa Argente

159

A Case-Based Reasoning Approach for Norm Adaptation . . . . . . . . . . . . . Jordi Campos, Maite L´ opez-S´ anchez, and Marc Esteva

168

An Abstract Argumentation Framework for Supporting Agreements in Agent Societies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stella Heras, Vicente Botti, and Vicente Juli´ an

177

Reaching a Common Agreement Discourse Universe on Multi-Agent Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro Torre˜ no, Eva Onaindia, and Oscar Sapena

185

Table of Contents – Part II

Integrating Information Extraction Agents into a Tourism Recommender System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Esparcia, V´ıctor S´ anchez-Anguix, Estefan´ıa Argente, Ana Garc´ıa-Fornes, and Vicente Juli´ an Adaptive Hybrid Immune Detector Maturation Algorithm . . . . . . . . . . . . . Jungan Chen, Wenxin Chen, and Feng Liang Interactive Visualization Applets for Modular Exponentiation Using Addition Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hatem M. Bahig and Yasser Kotb Multimedia Elements in a Hybrid Multi-Agent System for the Analysis of Web Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Mosqueira-Rey, B. Baldonedo del R´ıo, D. Alonso-R´ıos, E. Rodr´ıguez-Poch, and D. Prado-Gesto An Approach for an AVC to SVC Transcoder with Temporal Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rosario Garrido-Cantos, Jos´e Luis Mart´ınez, Pedro Cuenca, and Antonio Garrido

XXIII

193

201

209

217

225

A GPU-Based DVC to H.264/AVC Transcoder . . . . . . . . . . . . . . . . . . . . . . Alberto Corrales-Garc´ıa, Rafael Rodr´ıguez-S´ anchez, Jos´e Luis Mart´ınez, Gerardo Fern´ andez-Escribano, Jos´e M. Claver, and Jos´e Luis S´ anchez

233

Hybrid Color Space Transformation to Visualize Color Constancy . . . . . . Ram´ on Moreno, Jos´e Manuel L´ opez-Guede, and Alicia d’Anjou

241

A Novel Hybrid Approach to Improve Performance of Frequency Division Duplex Systems with Linear Precoding . . . . . . . . . . . . . . . . . . . . . Paula M. Castro, Jos´e A. Garc´ıa-Naya, Daniel Iglesia, and Adriana Dapena

248

Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW) . . . . . . . . . . Otoniel L´ opez, Miguel Mart´ınez-Rach, Pablo Pi˜ nol, Manuel P. Malumbres, and Jos´e Oliver

256

Color Video Segmentation by Dissimilarity Based on Edges . . . . . . . . . . . Luc´ıa Ramos, Jorge Novo, Jos´e Rouco, Antonio Mosquera, and Manuel G. Penedo

264

Label Dependent Evolutionary Feature Weighting for Remote Sensing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel Mateos-Garc´ıa, Jorge Garc´ıa-Guti´errez, and Jos´e C. Riquelme-Santos

272

XXIV

Table of Contents – Part II

Evolutionary q-Gaussian Radial Basis Functions for Binary-Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. Fern´ andez-Navarro, C. Herv´ as-Mart´ınez, P.A. Guti´errez, M. Cruz-Ram´ırez, and M. Carbonero-Ruz Evolutionary Learning Using a Sensitivity-Accuracy Approach for Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier S´ anchez-Monedero, C. Herv´ as-Mart´ınez, F.J. Mart´ınez-Estudillo, Mariano Carbonero Ruz, M.C. Ram´ırez Moreno, and M. Cruz-Ram´ırez An Hybrid System for Continuous Learning . . . . . . . . . . . . . . . . . . . . . . . . . Aldo Franco Dragoni, Germano Vallesi, Paola Baldassarri, and Mauro Mazzieri Support Vector Regression Algorithms in the Forecasting of Daily Maximums of Tropospheric Ozone Concentration in Madrid . . . . . . . . . . . E.G. Ortiz-Garc´ıa, S. Salcedo-Sanz, A.M. P´erez-Bellido, J. Gasc´ on-Moreno, and A. Portilla-Figueras Neuronal Implementation of Predictive Controllers . . . . . . . . . . . . . . . . . . . Jos´e Manuel L´ opez-Guede, Ekaitz Zulueta, and Borja Fern´ andez-Gauna α-Satisﬁability and α-Lock Resolution for a Lattice-Valued Logic LP(X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xingxing He, Yang Xu, Yingfang Li, Jun Liu, Luis Martinez, and Da Ruan

280

288

296

304

312

320

On Compactness and Consistency in Finite Lattice-Valued Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaodong Pan, Yang Xu, Luis Martinez, Da Ruan, and Jun Liu

328

Lattice Independent Component Analysis for Mobile Robot Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ivan Villaverde, Borja Fernandez-Gauna, and Ekaitz Zulueta

335

An Introduction to the Kosko Subsethood FAM . . . . . . . . . . . . . . . . . . . . . Peter Sussner and Estev˜ ao Esmi An Increasing Hybrid Morphological-Linear Perceptron with Evolutionary Learning and Phase Correction for Financial Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricardo de A. Ara´ ujo and Peter Sussner Lattice Associative Memories for Segmenting Color Images in Diﬀerent Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gonzalo Urcid, Juan Carlos Valdiviezo-N., and Gerhard X. Ritter

343

351

359

Table of Contents – Part II

Lattice Neural Networks with Spike Trains . . . . . . . . . . . . . . . . . . . . . . . . . . Gerhard X. Ritter and Gonzalo Urcid Detecting Features from Confusion Matrices Using Generalized Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carmen Pel´ aez-Moreno and Francisco J. Valverde-Albacete Reconciling Knowledge in Social Tagging Web Services . . . . . . . . . . . . . . . Gonzalo A. Aranda-Corral and Joaqu´ın Borrego-D´ıaz 2-D Shape Representation and Recognition by Lattice Computing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis Order Metrics for Semantic Knowledge Systems . . . . . . . . . . . . . . . . . . . . . . Cliﬀ Joslyn and Emilie Hogan

XXV

367

375 383

391 399

Granular Fuzzy Inference System (FIS) Design by Lattice Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vassilis G. Kaburlasos

410

Median Hetero-Associative Memories Applied to the Categorization of True-Color Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto A. V´ azquez and Humberto Sossa

418

A Comparison of VBM Results by SPM, ICA and LICA . . . . . . . . . . . . . . Darya Chyzyk, Maite Termenon, and Alexandre Savio Fusion of Single View Soft k-NN Classiﬁers for Multicamera Human Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rodrigo Cilla, Miguel A. Patricio, Antonio Berlanga, and Jose M. Molina Self-adaptive Coordination for Organizations of Agents in Information Fusion Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sara Rodr´ıguez, Bel´en P´erez-Lancho, Javier Bajo, Carolina Zato, and Juan M. Corchado Sensor Management: A New Paradigm for Automatic Video Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lauro Snidaro, Ingrid Visentini, and Gian Luca Foresti A Simulation Framework for UAV Sensor Fusion . . . . . . . . . . . . . . . . . . . . . Enrique Mart´ı, Jes´ us Garc´ıa, and Jose Manuel Molina An Embeddable Fusion Framework to Manage Context Information in Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ana M. Bernardos, Eva Madrazo, and Jos´e R. Casar

429

436

444

452

460

468

XXVI

Table of Contents – Part II

Embodied Moving-Target Seeking with Prediction and Planning . . . . . . . Noelia Oses, Matej Hoﬀmann, and Randal A. Koene Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zorana Bankovi´c, Elena Romero, Javier Blesa, ´ Jos´e M. Moya, David Fraga, Juan Carlos Vallejo, Alvaro Araujo, Pedro Malag´ on, Juan-Mariano de Goyeneche, Daniel Villanueva, and Octavio Nieto-Taladriz A SVM and k-NN Restricted Stacking to Improve Land Use and Land Cover Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jorge Garcia-Gutierrez, Daniel Mateos-Garcia, and Jose C. Riquelme-Santos

478

486

493

A Bio-inspired Fusion Method for Data Visualization . . . . . . . . . . . . . . . . . Bruno Baruque and Emilio Corchado

501

CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks . . . . . ´ Cristian Pinz´ on, Alvaro Herrero, Juan F. De Paz, Emilio Corchado, and Javier Bajo

510

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

521

Table of Contents – Part I

Y-Means: An Autonomous Clustering Algorithm (Invited Paper) . . . . . . . Ali A. Ghorbani and Iosif-Viorel Onut

1

A Survey and Analysis of Frameworks and Framework Issues for Information Fusion Applications (Invited Paper) . . . . . . . . . . . . . . . . . . . . . James Llinas

14

A Regular Tetrahedron Formation Strategy for Swarm Robots in Three-Dimensional Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Fikret Ercan, Xiang Li, and Ximing Liang

24

Markovian Ants in a Queuing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilija Tanackov, Dragan Simi´c, Siniˇsa Sremac, Jovan Tepi´c, and Sunˇcica Koci´c-Tanackov

32

A Parametric Method Applied to Phase Recovery from a Fringe Pattern Based on a Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . J.F. Jimenez, F.J. Cuevas, J.H. Sossa, and L.E. Gomez

40

Automatic PSO-Based Deformable Structures Markerless Tracking in Laparoscopic Cholecystectomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haroun Djaghloul, Mohammed Batouche, and Jean-Pierre Jessel

48

A Framework for Optimization of Genetic Programming Evolved Classiﬁer Expressions Using Particle Swarm Optimization . . . . . . . . . . . . . Hajira Jabeen and Abdul Rauf Baig

56

Developing an Intelligent Parking Management Application Based on Multi-agent Systems and Semantic Web Technologies . . . . . . . . . . . . . . . . . Andr´es Mu˜ noz and Juan A. Bot´ıa

64

Linked Multicomponent Robotic Systems: Basic Assessment of Linking Element Dynamical Eﬀect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Borja Fernandez-Gauna, Jose Manuel Lopez-Guede, and Ekaitz Zulueta

73

Social Simulation for AmI Systems Engineering . . . . . . . . . . . . . . . . . . . . . . Teresa Garcia-Valverde, Emilio Serrano, and Juan A. Botia

80

Automatic Behavior Pattern Classiﬁcation for Social Robots . . . . . . . . . . Abraham Prieto, Francisco Bellas, Pilar Caama˜ no, and Richard J. Duro

88

Healthcare Information Fusion Using Context-Aware Agents . . . . . . . . . . . Dante I. Tapia, Juan A. Fraile, Ana de Luis, and Javier Bajo

96

XXVIII

Table of Contents – Part I

Multivariate Discretization for Associative Classiﬁcation in a Sparse Data Application Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mar´ıa N. Moreno Garc´ıa, Joel Pinho Lucas, Vivian F. L´ opez Batista, and M. Jos´e Polo Mart´ın

104

Recognition of Turkish Vowels by Probabilistic Neural Networks Using Yule-Walker AR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erdem Yavuz and Vedat Topuz

112

A Dynamic Bayesian Network Based Structural Learning towards Automated Handwritten Digit Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . Olivier Pauplin and Jianmin Jiang

120

A Dual Network Adaptive Learning Algorithm for Supervised Neural Network with Contour Preserving Classiﬁcation for Soft Real Time Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Piyabute Fuangkhon and Thitipong Tanprasert The Abnormal vs. Normal ECG Classiﬁcation Based on Key Features and Statistical Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Dong, Jia-fei Tong, and Xia Liu

128

136

Classiﬁcation of Wood Pulp Fibre Cross-Sectional Shapes . . . . . . . . . . . . . Asuka Yamakawa and Gary Chinga-Carrasco

144

A Hybrid Cluster-Lift Method for the Analysis of Research Activities . . . Boris Mirkin, Susana Nascimento, Trevor Fenner, and Lu´ıs Moniz Pereira

152

Protein Fold Recognition with Combined SVM-RDA Classiﬁer . . . . . . . . . Wieslaw Chmielnicki and Katarzyna St¸apor

162

Data Processing on Database Management Systems with Fuzzy Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˙ Irfan S ¸ im¸sek and Vedat Topuz

170

A Hybrid Approach for Process Mining: Using From-to Chart Arranged by Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eren Esgin, Pinar Senkul, and Cem Cimenbicer

178

Continuous Pattern Mining Using the FCPGrowth Algorithm in Trajectory Data Warehouses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcin Gorawski and Pawel Jureczek

187

Hybrid Approach for Language Identiﬁcation Oriented to Multilingual Speech Recognition in the Basque Context . . . . . . . . . . . . . . . . . . . . . . . . . . N. Barroso, K. L´ opez de Ipi˜ na, A. Ezeiza, O. Barroso, and U. Susperregi

196

Table of Contents – Part I

An Approach of Bio-inspired Hybrid Model for Financial Markets . . . . . . Dragan Simi´c, Vladeta Gaji´c, and Svetlana Simi´c Interactive and Stereoscopic Hybrid 3D Viewer of Radar Data with Gesture Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jon Goenetxea, Aitor Moreno, Luis Unzueta, Andoni Gald´ os, and ´ Alvaro Segura Recognition of Manual Actions Using Vector Quantization and Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcel Martin, Jonathan Maycock, Florian Paul Schmidt, and Oliver Kramer Protecting Web Services against DoS Attacks: A Case-Based Reasoning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cristian Pinz´ on, Juan F. De Paz, Carolina Zato, and Javier P´erez

XXIX

205

213

221

229

Ranked Tag Recommendation Systems Based on Logistic Regression . . . J.R. Quevedo, E. Monta˜ n´es, J. Ranilla, and I. D´ıaz

237

A Hybrid Robotic Control System Using Neuroblastoma Cultures . . . . . . J.M. Ferr´ andez, V. Lorente, J.M. Cuadra, F. delaPaz, ´ Jos´e Ram´ on Alvarez-S´ anchez, and E. Fern´ andez

245

Image Segmentation with a Hybrid Ensemble of One-Class Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boguslaw Cyganek

254

Power Prediction in Smart Grids with Evolutionary Local Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Kramer, Benjamin Satzger, and J¨ org L¨ assig

262

Automatic Quality Inspection of Percussion Cap Mass Production by Means of 3D Machine Vision and Machine Learning Techniques . . . . . . . . A. Tellaeche, R. Arana, A. Ibarguren, and J.M. Mart´ınez-Otzeta

270

Speaker Veriﬁcation and Identiﬁcation Using Principal Component Analysis Based on Global Eigenvector Matrix . . . . . . . . . . . . . . . . . . . . . . . Minkyung Kim, Eunyoung Kim, Changwoo Seo, and Sungchae Jeon

278

Hybrid Approach for Automatic Evaluation of Emotion Elicitation Oriented to People with Intellectual Disabilities . . . . . . . . . . . . . . . . . . . . . . R. Mart´ınez, K. L´ opez de Ipi˜ na, E. Irigoyen, and N. Asla

286

Fusion of Fuzzy Spatial Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nadeem Salamat and El-hadi Zahzah

294

Reducing Artifacts in TMS-Evoked EEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Juan Jos´e Fuertes, Carlos M. Travieso, A. Alvarez, M.A. Ferrer, and J.B. Alonso

302

XXX

Table of Contents – Part I

Model Driven Image Segmentation Using a Genetic Algorithm for Structured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Romain Raveaux and Guillaume Hillairet Stamping Line Optimization Using Genetic Algorithms and Virtual 3D Line Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier A. Garc´ıa-Sedano, Jon Alzola Bernardo, ´ Asier Gonz´ alez Gonz´ alez, Oscar Berasategui Ruiz de Gauna, and Rafael Yuguero Gonz´ alez de Mendivil Evolutionary Industrial Physical Model Generation . . . . . . . . . . . . . . . . . . . Alberto Carrascal and Amaia Alberdi

311

319

327

Evolving Neural Networks with Maximum AUC for Imbalanced Data Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaofen Lu, Ke Tang, and Xin Yao

335

A Neuro-genetic Control Scheme Application for Industrial R3 Workspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Irigoyen, M. Larrea, J. Valera, V. G´ omez, and F. Artaza

343

Memetic Feature Selection: Benchmarking Hybridization Schemata . . . . . M.A. Esseghir, Gilles Goncalves, and Yahya Slimani

351

A Hybrid Cellular Genetic Algorithm for Multi-objective Crew Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fariborz Jolai and Ghazal Assadipour

359

GENNET-Toolbox: An Evolving Genetic Algorithm for Neural Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vicente G´ omez-Garay, Eloy Irigoyen, and Fernando Artaza

368

An Evolutionary Feature-Based Visual Attention Model Applied to Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto A. V´ azquez, Humberto Sossa, and Beatriz A. Garro

376

Eﬃcient Plant Supervision Strategy Using NN Based Techniques . . . . . . . Ramon Ferreiro Garcia, Jose Luis Calvo Rolle, and Francisco Javier Perez Castelo

385

FDI and Accommodation Using NN Based Techniques . . . . . . . . . . . . . . . . Ramon Ferreiro Garcia, Alberto De Miguel Catoira, and Beatriz Ferreiro Sanz

395

A Hybrid ACO Approach to the Matrix Bandwidth Minimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Camelia-M. Pintea, Gloria-Cerasela Cri¸san, and Camelia Chira

405

Table of Contents – Part I

Machine-Learning Based Co-adaptive Calibration: A Perspective to Fight BCI Illiteracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carmen Vidaurre, Claudia Sannelli, Klaus-Robert M¨ uller, and Benjamin Blankertz Analysing the Low Quality of the Data in Lighting Control Systems . . . . Jose R. Villar, Enrique de la Cal, Javier Sedano, and Marco Garc´ıa-Tamargo Type-1 Non-singleton Type-2 Takagi-Sugeno-Kang Fuzzy Logic Systems Using the Hybrid Mechanism Composed by a Kalman Type Filter and Back Propagation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gerardo M. Mendez, Angeles Hern´ andez, Alberto Cavazos, and Marco-Tulio Mata-Jim´enez An Hybrid Architecture Integrating Forward Rules with Fuzzy Ontological Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefano Bragaglia, Federico Chesani, Anna Ciampolini, Paola Mello, Marco Montali, and Davide Sottara Selecting Regions of Interest in SPECT Images Using Wilcoxon Test for the Diagnosis of Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Salas-Gonzalez, J.M. G´ orriz, J. Ram´ırez, Fermin Segovia, Rosa Chaves, Miriam L´ opez, I.A. Ill´ an, and Pablo Padilla Eﬀective Diagnosis of Alzheimer’s Disease by Means of Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rosa Chaves, Javier Ram´ırez, J.M. G´ orriz, Miriam L´ opez, D. Salas-Gonzalez, I.A. Ill´ an, Fermin Segovia, and Pablo Padilla Exploratory Matrix Factorization for PET Image Analysis . . . . . . . . . . . . A. Kodewitz, I.R. Keck, A.M. Tom´e, J.M. G´ orriz, and Elmar W. Lang NMF-Based Analysis of SPECT Brain Images for the Diagnosis of Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pablo Padilla, Juan-Manuel G´ orriz, Javier Ram´ırez, ´ Elmar Lang, Rosa Chaves, Fermin Segovia, Ignacio Alvarez, Diego Salas-Gonz´ alez, and Miriam L´ opez Partial Least Squares for Feature Extraction of SPECT Images . . . . . . . . Fermin Segovia, Javier Ram´ırez, J.M. G´ orriz, Rosa Chaves, ´ D. Salas-Gonzalez, Miriam L´ opez, Ignacio Alvarez, Pablo Padilla, and C.G. Puntonet Sensor Fusion Adaptive Filtering for Position Monitoring in Intense Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alberto Olivares, J.M. G´ orriz, Javier Ram´ırez, and Gonzalo Olivares

XXXI

413

421

429

438

446

452

460

468

476

484

XXXII

Table of Contents – Part I

Prediction of Bladder Cancer Recurrences Using Artiﬁcial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ekaitz Zulueta Guerrero, Naiara Telleria Garay, Jose Manuel Lopez-Guede, Borja Ayerdi Vilches, Eider Egilegor Iragorri, David Lecumberri Casta˜ nos, Ana Bel´en de la Hoz Rastrollo, and Carlos Pertusa Pe˜ na Hybrid Decision Support System for Endovascular Aortic Aneurysm Repair Follow-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jon Haitz Legarreta, Fernando Boto, Iv´ an Mac´ıa, Josu Maiora, Guillermo Garc´ıa, C´eline Paloc, Manuel Gra˜ na, and Mariano de Blas

492

500

On the Design of a CADS for Shoulder Pain Pathology . . . . . . . . . . . . . . . K. L´ opez de Ipi˜ na, M.C. Hern´ andez, E. Mart´ınez, and C. Vaquero

508

Exploring Symmetry to Assist Alzheimer’s Disease Diagnosis . . . . . . . . . . I.A. Ill´ an, J.M. G´ orriz, Javier Ram´ırez, D. Salas-Gonzalez, Miriam L´ opez, Pablo Padilla, Rosa Chaves, Fermin Segovia, and C.G. Puntonet

516

Thrombus Volume Change Visualization after Endovascular Abdominal Aortic Aneurysm Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Josu Maiora, Guillermo Garc´ıa, Iv´ an Mac´ıa, Jon Haitz Legarreta, Fernando Boto, C´eline Paloc, Manuel Gra˜ na, and Javier Sanchez Abu´ın

524

Randomness and Fuzziness in Bayes Multistage Classiﬁer . . . . . . . . . . . . . Robert Burduk

532

Multiple Classiﬁer System with Radial Basis Weight Function . . . . . . . . . Konrad Jackowski

540

Mixture of Random Prototype-Based Local Experts . . . . . . . . . . . . . . . . . . Giuliano Armano and Nima Hatami

548

Graph-Based Model-Selection Framework for Large Ensembles . . . . . . . . . Krisztian Buza, Alexandros Nanopoulos, and Lars Schmidt-Thieme

557

Rough Set-Based Analysis of Characteristic Features for ANN Classiﬁer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Urszula Sta´ nczyk

565

Boosting Algorithm with Sequence-Loss Cost Function for Structured Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomasz Kajdanowicz, Przemyslaw Kazienko, and Jan Kraszewski

573

Table of Contents – Part I

XXXIII

Application of Mixture of Experts to Construct Real Estate Appraisal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Magdalena Graczyk, Tadeusz Lasota, Zbigniew Telec, and Bogdan Trawi´ nski

581

Designing Fusers on the Basis of Discriminants – Evolutionary and Neural Methods of Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michal Wozniak and Marcin Zmyslony

590

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

599

SIFT-SS: An Advanced Steady-State Multi-Objective Genetic Fuzzy System Michel Gonz´ alez1 , Jorge Casillas2 , and Carlos Morell1 1

2

Universidad Central “Marta Abreu” de Las Villas, CUBA {michelgb,cmorellp}@uclv.edu.cu Dept. Computer Science and Artiﬁcial Intelligence, University of Granada, Spain [email protected]

Abstract. Nowadays, automatic learning of fuzzy rule-based systems is being addressed as a multi-objective optimization problem. A new research area of multi-objective genetic fuzzy systems (MOGFS) has capture the attention of the fuzzy community. Despite the good results obtained, most of existent MOGFS are based on a gross usage of the classic multi-objective algorithms. This paper takes an existent MOGFS and improves its convergence by modifying the underlying genetic algorithm. The new algorithm is tested in a set of real-world regression problems with successful results.

1

Introduction

In the last few years have grown the number of publications, in which automatic learning of fuzzy rule-based systems (FRBSs) is deﬁned as a multi-objective optimization problem [1,2]. In the current approach, several interpretability and accuracy metrics are optimized during the learning process and higher quality models are obtained. Multi-objective genetic fuzzy systems (MOGFSs) are playing a fundamental role in this quest. They take the best of two research ﬁelds. On the one hand, genetic fuzzy systems (GFS) have a solid base of data structures and coding schemes that can be used to simultaneously learn many features of the FRBS. On the other hand, multi-objective evolutionary algorithms (MOEAs) are among the best and more versatile techniques for multi-objective optimization. Despite the advances obtained, there is still much work to be done. Most of the existent MOGFSs are based on a gross usage of standard MOEAs like NSGA-II [3] and SPEA2. However, there are recent results about current MOEAs limitations and new multi-objective techniques that require attention from the MOGFS’s community [4]. Besides, there are speciﬁc problem requirements in fuzzy modeling that can be taken into account to enhance the search process [5]. This paper is a study case in which the speciﬁcities of fuzzy modeling are incorporated into an existent MOGFS. For the study we focus on the algorithm SIFT [6] (Section 2) due to the high number of optimization objectives, its good performance and its standard MOEA core based on NSGA-II. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 1–8, 2010. c Springer-Verlag Berlin Heidelberg 2010

2

M. Gonz´ alez, J. Casillas, and C. Morell

The proposed SIFT-SS (Section 3) introduces the following modiﬁcations in the original algorithm. – – – – –

The generational scheme is changed to an iterational scheme. A new Objective Scale Crowding Distance is introduced (Section 3.1). A new Crowding-Based Mating heuristic is introduced (Section 3.2). The population size is dynamically adjusted (Section 3.3). The phenotypical copies are removed (Section 3.4).

These modiﬁcations do not interfere with the algorithm’s speciﬁc components, thus they can be easily implemented in other existent algorithms.

2

SIFT

Simpliﬁcation of Fuzzy Models by Tuning, SIFT [6], is a multi-objective genetic algorithm with a generational evolutionary scheme based on NSGA-II [3]. It tunes the whole database deﬁnition (fuzzy variables, number and type of linguistic terms, and membership function parameters), while the fuzzy rule base is adapted to the tuned database by a greedy approach. The individuals are optimized according to three objectives: the mean square error (MSE) over the training dataset, the total number of linguistic terms (NL), and the number of fuzzy rules (NR). The output is a Pareto of optimal fuzzy models. SIFT presents a set of advantages. It is a highly eﬃcient for large-scale regressions problems. It generates very legible fuzzy partitions thanks to its interpretability constrains and the fact of tuning the complete database deﬁnition. Besides, the obtained models are small and very accurate because of the multiobjective approach. Although SIFT is a good algorithm, its evolutionary process can be improved. As explained by Gacto [5] there are speciﬁc issues that need to be considered in the integration of MOEAs and GFS. In the case of SIFT, a careful analysis shows that two of the objectives (NL and NR) are correlated by deﬁnition and converge faster than the third objective (MSE) pulling to local minimums. This leads to many solutions with very low number of rules and poor ﬁtness that are not very useful. Another disadvantage of SIFT is its disability to work with large population sizes. It takes a long time for a generational algorithm like NSGA-II reach a suﬃcient number of iterations with a large population. Although the generational nature of SIFT allows the use of parallelism, there is not always suﬃcient computational resources to do it.

3

SIFT-SS: An Improved Steady-State Version of SIFT

The proposed steady-state SIFT (SIFT-SS) reuses the codiﬁcation, genetic operators and evaluation of SIFT; and modiﬁes only the underlying genetic algorithm. The changes include the substitution of the generational scheme by an

SIFT-SS: An Advanced Steady-State MOGFS

3

iterational scheme. The iterational scheme improves convergence because each new born individual is instantly introduced in the population. As a result, it is expected a rapid advance with the same number of evaluations. The SIFT-SS core consists of the next steps: 1. Generate an initial population P and evaluate P . 2. Build a dominance rank and calculate the crowding distance for every front (see section 3.1 for objective scale crowding distance). 3. While not reached the maximun number of iterations do: (a) Select of two parents from P (see section 3.2 for mating heuristics). (b) Cross and mutate the selected parents and produce two new individuals. (c) Evaluate the new individuals. (d) Insert the new individuals in P , rebuild the rank and update crowding distance values (see section 3.4 for copies check). (e) Remove the worst individuals and ajust the population size (see section 3.3 for variable population size). 4. Output P 3.1

Objective Scaled Crowding Distance (OSCD)

All multi-objective evolutionary algorithms based on dominance selection sooner or later get stock when all the solutions in the population are non-dominated. The higher number of objectives, the sooner they will get stock [4]. NSGA-II is victim of this issue, in which the ﬁrst selection criterion is unable to distinguish the best individuals. Of course, there is no such thing like best individuals when they are all non-dominated. The second selection criterion in NSGA-II is the Crowding Distance (CD) [3]. This is a measure which acts as an estimation of the density of solutions surrounding particular solution. When the solutions ranking is built, the most isolated ones are preferred. The CD preserves the diversity of the population and also will lead to an equally spread solution set. But, is it always an equally spread solution set the most representative or desired? Although in other optimization problems it may be true, in the case of a rule-based system learning algorithm like SIFT the researcher may be more interested in obtaining more accurate solutions but highest number of rules rules than having very inaccurate solutions with few number of rules [5]. In this section we present a new Objective Scaled Crowding Distance (OSCD) that extends the traditional deﬁnition in order to take into account the objectives values. The OSCD will be equal to the product of the traditional CD and an Expansion Factor (K), given by the expression (1). bis (ei − 1) + 1 K(s) = (1) where bis ∈ [0, 1] indicates the relative “goodness” of the solution s for the objective i among the rest of solutions in the same Pareto front and the parameter ei ∈ [1, ∞) establishes the level of strength desired for objective i (e.g. 1x, 2x, . . . ).

4

M. Gonz´ alez, J. Casillas, and C. Morell

There are many ways to measure the relative goodness of a solution for an objective. In the particular case of SIFT in which solutions with better MSEs are preferred, bMSE is deﬁned as the relative position of the solution in the list s of solutions ordered by MSE. Therefore, the simpliﬁed expression for OSCD in SIFT is as follows: pmse (s) (emse − 1) + 1 CD(s) OSCD(s) = (2) n with pmse (s) being the inverted ranking of the solution s with respect to the M SE objective (where n is the value of the most accurate solution and 1 the one of the worst). The expansion of crowding distance allows an increased solution density in the zone with better MSE values. This allows more activity in the evolution of accuracy than in the rest of objectives, but, unlike Gacto approach [5] the OSCD is based in the control of crowding rather than SP EA2Acc . 3.2

Crowding-Based Mating (CBM)

The Crowding-Based Mating (CBM) considers the crowding distance in order to exploit the most promising solutions. It works in the following way: 1. Select the ﬁrst parent p1 at random from the ﬁrst front, using a discrete probability distribution proportional to each individual crowding distance. 2. Select the second parent p2 by binary tournament selection. The use of CBM in combination with OSCD guaranties that one of the two parents is likely to be an accurate and isolated solution. This press to recombine the most accurate solutions in order to obtain the least number of rules [5], but also preserves diversity. To avoid over-ﬁtting, CBM is applied with probability Pcbm . Otherwise, both parents are selected by binary tournament in the same way the original SIFT does. 3.3

Variable Population Size (VPS)

The variable population size (VPS) is one of the fundamental strengths of the iterational scheme thus it can be more ﬂexible to include a larger number of solutions without compromising too much the overall performance. The population size in SIFT-SS can dynamically grow when all the individuals are non-dominated and dynamically shrink when dominated individuals appear. The variation is controlled between a minimum and maximum number of individuals deﬁned by the user. The VPS gives certain degree of ﬂexibility to keep optimal solutions, that otherwise would have been eliminated. In problems that tends to many nondominated solutions (either because of the large search space or the use of many objectives to be optimized) this type of population size management can be useful.

SIFT-SS: An Advanced Steady-State MOGFS

3.4

5

Copies Check (CC)

The SIFT-SS also implements a copy check routine that prevents the insertion of phenotypical copies1 in the population. The decision of removing copies in SIFT-SS is sustained in two aspects. On the one hand, a copy consumes space in the population with redundant phenotypical information that does not help to the selection process. On the other hand, in the iterational scheme, copies are not as important for the survival and reproduction of elite individuals as they are in generational scheme.

4

Experimental Analysis

This section compares the original SIFT against nine diﬀerent conﬁgurations of the proposed SIFT-SS. The conﬁguration 1 was chossen to observe if the new iterational approach improves the generational approach. Conﬁgurations 2, 3 and 4 study the eﬀect of the CC, the VPS and the OSCD heuristics separatly. Conﬁgurations 5 and 6 test the OSCD reinforced with CBM in the half and totality of crossovers respectively. Finally conﬁgurations 7, 8 and 9 are analogous to 4, 5 and 6 with the inclusion of CC. The detailed parameter speciﬁcations for SIFT-SS conﬁgurations are listed in Table 1 (if a parameter is not used then is marked with a dash). The experimentation has been performed with a 5-fold cross validation. Each algorithm was executed with six diﬀerent random seeds, so a total of 30 experiments per problem and algorithm were done. The comparison covered twelve real-world regression problems of increasing level of complexity (from 2 to 40 input variables, from 43 to 16,599 instances). All experiments were initialized using a population of 30 individuals; the crossover probability was set to 0.7 and the mutation probability to 0.2. The stop condition was set to 50,000 evaluations. As regards the initial maximum

Table 1. Algorithm Conﬁgurations

1

Algorithm Pcbm Emse M CD V P S CC

Description

sift-ss.1 sift-ss.2 sift-ss.3 sift-ss.4 sift-ss.5 sift-ss.6 sift-ss.7 sift-ss.8 sift-ss.9

sift-ss sift-ss sift-ss sift-ss sift-ss sift-ss sift-ss sift-ss sift-ss

0.5 1.0 0.5 1.0

2x 2x 2x 2x 2x 2x

no no no yes yes yes yes yes yes

30-60 -

no yes no no no no yes yes yes

i.e. solutions with the same objective values.

+ + + + + + + +

cc vps 30-60 oscd 2x cbm 50% + oscd 2x cbm 100% + oscd 2x oscd 2x + cc cbm 50% + oscd 2x + cc cbm 100% + oscd 2x + cc

6

M. Gonz´ alez, J. Casillas, and C. Morell

number of linguistic terms, in the output variable it was set to seven, while in the input variable it was set to seven for problems with 2 input variables (diabetes and ele1), ﬁve for problems with 4 to 9 input variables (laser, ele2, dee, concrete, and ankara), or three for problems with 15 to 40 input variables (mortgage, treasury, elevators, compactiv, and ailerons). To measure the convergence improvement of each algorithm we used the Generational Distance (GD) proposed by Van Veldhuizen [7] which is expressed as follows (3): GD(S) =

1 min {f (x) − f (y) : y ∈ S ∗ } |S|

(3)

x∈S

where S is the Pareto solution of the algorithm; S ∗ is the true Pareto-optimal solution for the problem, and f (x) − f (y) is the Euclidean Distance between two solutions in the objectives space (objectives values were scale to [0, 1]). Since S ∗ is unknown in the considered real-world problems, we use as S ∗ the set of non-dominated solutions among all solutions examined in our computational experiments in this paper [8], i.e., the joined Pareto obtained by all the analyzed algorithms. The GD meassures the proximity of a solution S to the “best known” solution S ∗ for a problem. Considering that all algorithms produce the same number of individuals with the same genetic operators, a signiﬁcant reduction in the average GD means that the proposed modiﬁcation provides a better orientation in the search process. The average values of GD for each problem are calculated in Table 2. The best result for each problem is shown underlined. At ﬁrst glance, the iterational scheme by its own (conﬁguration 1) did not show the expected improvement of GD compared to SIFT. Nevertheless, its major advantage (i.e., dealing with larger population size) was not evaluated in the experiments. As recommended by Demˇsar [9], the average values of GD were statistically processed using a Friedman test. The test detected highly signiﬁcant diﬀerences (p < 0.05) among algorithms. The mean ranks (Table 2) conﬁrm a better performance of 8 and 9. Next, a Wilcoxon Signed-Ranks test was applied between each pair of algorithms. Table 3 shows the summarized result. Above the diagonal; there is a “+” when the row is signiﬁcant better than the column (p < 0.05), a “−” when the column is signiﬁcant better than the row and a “=” if there is no signifcant diﬀerences between them. Below the diagonal appears the p-values of the test. The results of Table 3 show that the full combination of OSCD, CBM and CC (8 and 9) achieve the best results. The test stands a highly signiﬁcant improvement of the GD compared to the rest, except for 7. The OSCD performs really good, whether in combination with CC or alone (4 and 7). The increased density in the zone of accurate solutions makes possible a substantial reduction of the number of rules. The eﬀect of OSCD can be observed in Figure 1 where some of the best solutions have been plotted.

SIFT-SS: An Advanced Steady-State MOGFS

7

Table 2. Average Generational Distance SIFT-SS 5 6

Dataset

SIFT

1

2

3

4

diabetes ele1 laser ele2 dee concrete ankara mortgage treasury elevator compactiv ailerons

0.0769 0.0619 0.0906 0.0867 0.0440 0.0531 0.1176 0.1546 0.0922 0.0863 0.0471 0.0653

0.0789 0.0557 0.0911 0.0682 0.0393 0.0560 0.1314 0.1921 0.1091 0.0808 0.0498 0.0453

0.0746 0.0596 0.0865 0.0770 0.0370 0.0587 0.1119 0.1379 0.1146 0.0522 0.0486 0.0526

0.0895 0.0595 0.0758 0.0863 0.0354 0.0587 0.1462 0.1824 0.1660 0.0782 0.0505 0.0562

0.0921 0.0576 0.0791 0.0706 0.0361 0.0510 0.0717 0.1533 0.0914 0.0506 0.0507 0.0481

0.0840 0.0609 0.0816 0.0671 0.0364 0.0528 0.0912 0.1171 0.0862 0.0718 0.0515 0.0569

7.00

6.17

7.50

5.17

5.83

Mean Ranks 7.50

7

8

9

0.0862 0.0623 0.0801 0.0691 0.0373 0.0574 0.0771 0.1387 0.0817 0.0747 0.0470 0.0489

0.0783 0.0619 0.0712 0.0645 0.0358 0.0535 0.0857 0.1601 0.0751 0.0668 0.0473 0.0429

0.0774 0.0530 0.0767 0.0599 0.0361 0.0514 0.0589 0.1333 0.0896 0.0585 0.0424 0.0436

0.0888 0.0589 0.0790 0.0540 0.0343 0.0509 0.0613 0.1178 0.0625 0.0587 0.0405 0.0473

6.08

4.33

2.67

2.75

Table 3. Wilcoxon signed-rank test (p-values) 8 SIFT-SS (8) SIFT-SS (9) SIFT-SS (7) SIFT-SS (4) SIFT-SS (5) SIFT-SS (6) SIFT-SS (2) SIFT-SS (1) SIFT (S) SIFT-SS (3)

0.937 0.136 0.019 0.050 0.012 0.012 0.002 0.003 0.005

9

7

4

5

6

2

1

S

3

=

= =

+ + =

+ + = =

+ + = = =

+ + = = = =

+ + + = = = =

+ + = + + + = =

+ + + + = + + = =

0.099 0.028 0.012 0.005 0.034 0.019 0.010 0.006

0.695 0.084 0.209 0.272 0.012 0.060 0.005

0.638 0.937 0.209 0.099 0.034 0.034

500 400 300

4.5 3.5 2.5 1.5

0

5

10

15

20

O2 (#R)

25

30

35

0.35 0.30 0.25 0.20

200 100

sift sift-ss.4 optimal

0.40

O1 (RMSE)

600

Ailerons 0.45

sift sift-ss.8 optimal

5.5

O1 (RMSE)

700

O1 (RMSE)

6.5

sift sift-ss.9 optimal

800

0.347 0.209 0.530 0.023 0.084 1.000 0.034 0.034 0.209 0.480

Ankara

Ele2 900

1.000 0.272 0.182 0.015 0.071

0.15 0

20

40

60

80

O2 (#R)

Fig. 1. Average Pareto solutions

0

20

40

O2 (#R)

60

80

8

M. Gonz´ alez, J. Casillas, and C. Morell

The CBM reinforced the eﬀect of the OSCD, but this only led to a better performance if the copies were avoided by the CC approach. In general, the removal of copies CC was found important when using OSCD or CBM as can be observed in conﬁgurations 7, 8 and 9 compared to 4, 5 and 6. The VPS approach did not make any diﬀerence. Due to the interpretability constrains of SIFT, the number of non-dominated was not high and the modiﬁcation was not eﬀective.

5

Conclusion and Further Work

We have proposed an improved MOFGS by modifying the underlying genetic algorithm to consider the speciﬁc needs of fuzzy modeling. The proposed SIFTSS implements heuristics like OSCD and CBM that allow a better trade-oﬀ between the accuracy and interpretability objectives. In the future this heuristics are going to be analyzed in the generational approach of SIFT. Also there is going to be further experimentation with larger search spaces and many objectives to test the true capacities of the iterational approach and the VPS.

References 1. Ishibuchi, H.: Multiobjective genetic fuzzy systems: Review and future research directions. In: Proc. of 2007 IEEE International Conference on Fuzzy Systems, London, UK, July 23-26, pp. 913–918 (2007) 2. Ishibuchi, H.: Evolutionary multiobjective design of fuzzy rule-based systems. In: Proc. of 2007 IEEE Symposium on Foundation of Computational Intelligence, Honolulu, USA, April 1-5, pp. 9–16 (2007) 3. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002) 4. Ishibuchi, H., Tsukamoto, N., Nojima, Y.: Evolutionary Many-Objective Optimization. In: Proc. of 3rd International Workshop on Genetic and Evolving Fuzzy Systems, Witten-Bommerholz, Germany, pp. 47–52 (2008) 5. Gacto, M.J., Alcal´ a, R., Herrera, F.: Adaptation and application of multi-objective evolutionary algorithms for rule reduction and parameter tuning of fuzzy rule-based systems. Soft Comput. 13(5), 419–436 (2008) 6. Casillas, J.: Eﬃcient multi-objective genetic tuning of fuzzy models for large-scale regression problems. In: Proc. of 2009 IEEE International Conference on Fuzzy Systems, Jeju, Republic of Korea, pp. 1712–1717 (2009) 7. Van Veldhuizen, D.A.: Multiobjective Evolutionary Algorithms: Classiﬁcations, Analyses, and New Innovations. Ph.D. dissertation. Air Force Institute of Technology, Dayton (1999) 8. Ishibuchi, H., Narukawa, K., Tsukamoto, N., Nojima, Y.: An empirical study on similarity-based mating for evolutionary multiobjective combinatorial optimization. European Journal of Operational Research 188(1), 57–75 (2008) 9. Demˇsar, J.: Statistical Comparisons of Classiﬁers over Multiple Data Sets. Journal of Machine Learning Research 7, 30 (2006)

Evolving Multi-label Classification Rules with Gene Expression Programming: A Preliminary Study ´ Jos´e Luis Avila-Jim´ enez, Eva Gibaja, and Sebasti´ an Ventura Department of Computer Sciences and Numerical Analysis, University of C´ ordoba

Abstract. The present work expounds a preliminary work of a genetic programming algorithm to deal with multi-label classiﬁcation problems. The algorithm uses Gene Expression Programming and codiﬁes a classiﬁcation rule into each individual. A niching technique assures diversity in the population. The ﬁnal classiﬁer is made up by a set of rules for each label that determines if a pattern belongs or not to the label. The proposal have been tested over several domains and compared with other multi-label algorithms and the results shows that it is specially suitable to handle with nominal data sets.

1

Introduction

Most of classiﬁcation problems associate only one label per pattern, li , from a set of disjoint labels, L. Nevertheless, this is not the only possible scenario, because there are many problems of increasing actuality, like text and sound categorization, protein and gene classiﬁcation or semantic scene classiﬁcation, where a pattern can have associated not just one, but a set of class labels, Y ⊆ L. To deal with this kind of situation, an automatic learning paradigm called multi-label classiﬁcation has emerged, which allows the one label per pattern restriction to be avoided. Thus, the main characteristic of multi-label classiﬁcation is that labels are not mutually excluding, allowing patterns with more than one associated label. Multi-label classiﬁcation problems have been basically dealt with from two points of view [1]. On one hand some studies describe several transformation methods, also called preprocessing techniques, which transform an original multilabel problem into several single label problems allowing the use of a classical classiﬁcation algorithm. On the other hand, many algorithm adaptation techniques have been developed in order to adapt a classical algorithm to directly work with multi-label data. For instance, some SVM approaches have been developed [2]. With reference to other techniques, in [3] an adaptation of the K-nearest neighbor algorithm, called ML-KNN has been proposed. The use of syntactic trees for hierarchical multi-label classiﬁcation has been studied in [4] and an adaptation of the C4.5 algorithm to multi-label classiﬁcation is presented in [5]. In spite E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 9–16, 2010. c Springer-Verlag Berlin Heidelberg 2010

10

´ J.L. Avila-Jim´ enez, E. Gibaja, and S. Ventura

of their wide use in classical classiﬁcation, bio-inspired approaches have rarely been applied to solve multi-label problems. It can be highlighted the MULAM algorithm based on ant colonies [6] and the work of Vallim et al. [7] where a genetic algorithm is proposed. In addition the authors have developed a genetic programming model that uses discriminant functions to learn multilabel classiﬁers [8]. However, there are few multilabel approach that allows to build understandable models the user can interpret as useful knowledge. Thats why this research proposes a multi-label classiﬁcation algorithm designed for ﬁnding rules, which are more useful in certain domains. The proposed technique is based in GEP [9], a genetic programming paradigm successfully applied in classiﬁcation problems. The global results point out that our approach is able to obtain results that are better or comparable to other classical multi-label approaches. The paper is organized as follows: ﬁrst we describe our proposal, the experiments carried out and the metrics used to measure the performance of the multi-label algorithms studied. Finally, the results are shown with conclusions about our study and future research lines.

2

GC-ML Algorithm

This section describes the most relevant features of the proposed algorithm, called GC-ML (GEPCLASS Multi-label ). It is a Genetic Programming algorithm that maintains a population of individuals each of them will potentially be part of a multilabel classiﬁer. The population will evolve to better classiﬁer by the application of genetic operators like selection, mutation and crossover. Firstly, a description of the individual representation and evaluation is showed and after that, the overall operation of the algorithm will be described. 2.1

Individual Representation

The proposed algorithm tries to learn a set of rules for each label in the problem in order to build a multi-label classiﬁer. Each rule will be in the form if A then L where A is the clause that must be satisﬁed by the pattern to be associated with a label L, and it is composed by a set of conditions joined by logical operators (and, or or not ). Each condition is a combination of an attribute and a constant value joint by a relational operator (=, =, > or <). In our algorithm, as in GEP algorithms, individuals have a dual encoding, that is, they have both genotype and phenotype. Genotype is a lineal string that consists of several genes, whose number and length is ﬁxed and predetermined for each problem. Genes are linked by a connection function which is a parameter of the algorithm. Each gene is divided into two sections, head and tail. The former contains terminal and non-terminal elements, whereas the latter can just contain terminal elements. Non-terminal elements are functions but terminal elements are constants and the input elements associated with patterns’s attributes.

Evolving Multi-label Classiﬁcation Rules

11

The head size is a parameter of the algorithm while the tail size is calculated with the following expression: t = int((h × (n − 1) + 1)/2), where int() returns the integer part of the argument and n is the largest arity found in the function set. Expression trees (ETs) are the phenotypical representation of the chromosome. Selection will operate over ETs while reproduction will operate over the chromosomes. In order to use rules some modiﬁcations have been made to the original GEP individual representation, that are similar to Weiner and Lopes proposal [10]. The main modiﬁcation is related to the non-terminal set, which has been divided in two groups, the logical function set and the relational function set. Logical functions can have as oﬀspring in the expression tree other logical or relational function but not a terminal element. However, a relational function only can have as oﬀspring an attribute and a constant value, both terminal elements. This modiﬁcation is necessary to distinguish between them at phenotypical level. A second modiﬁcation in the structure of the genes has been made to assure that each relational function has just one constant and one attribute as oﬀspring. Thus, the terminal set will contain a collection of constant/value pairs. In Figure 1 it is shown an individual with two genes of diﬀerent length, the expression tree that generates as phenotype (considering AN D as connection function) and the rule that it represents. The consequent of the rule is not stored neither in the genotype or in the phenotype, but it is assigned during the evaluation phase as it will be showed below. Genotype

6 d|1

1 2 3 AND NOT >

TAIL

4 >

5 6 b|4 c|3

7 a|1

TAIL

4 5 a|4 b|3

HEAD

3 > HEAD

1 2 OR =

Phenotype (Expression tree) AND

Classification rule

AND OR

>

NOT =

a

IF(((a=4) OR (b>3)) AND ((NOT(B>4)) AND(C>3))) THEN Label

>

4

b

C

>

3

b

3

4

Fig. 1. Example of GC-ML individual

The ﬁnal classiﬁer will be composed by a set of rules for each class. An so, whenever any of these rules determines that the pattern belongs to the class, the classiﬁer will consider the patter belonging to the class regardless of the results yielded by the rest.

12

2.2

´ J.L. Avila-Jim´ enez, E. Gibaja, and S. Ventura

Individual Evaluation

GC-ML has a population of individuals with the above mentioned features. Due to the multi-label nature of the problem, during the evaluation stage each of these individuals is evaluated for each label in the training set by means of a fitness function. Thus, instead an unique ﬁtness value, a ﬁtness vector is stored for each individual in order to store the ﬁtness value for each label. It is necessary to point out that during the application of the selection operators, only the greatest ﬁtness value in the vector will be considered. The ﬁtness function used is the harmonic mean between precision and recall, also known as the F-score. Precision for a label is the number of items correctly labeled as belonging to the class divided by the total number of patterns belonging to the label, and recall is deﬁned as the number of items correctly labeled divided by the total number of patterns that the classiﬁer has considered belonging to the label. The expressions of precision, recall and F-Score are the following: precision = recall = F − score = 2.3

tp tp + f n

tp tp + f p

2 × precision × recall precision + recall

(1) (2) (3)

Evolutionary Algorithm

The algorithm developed is similar to that proposed with GEP[9], but it has to calculate n ﬁtness values for each individual, and it also must do the multi-label token competition to correct the ﬁtness of individuals after evaluation. The algorithm starts generating the initial population,and, while the algorithm does not peak the maximum number of generations, the following actions are performed for each of them: 1. For each label present in the problem, all individuals are evaluated, and the ﬁtness array is reached. 2. After the evaluation of the individuals, the algorithm applies the Token Competition [11] technique to correct the ﬁtness values calculated. This approach has been widely used in other genetic algorithms applied to classiﬁcation problems [12]. GC-ML carries out a token competition for each label where a token is played for each positive pattern associated with the label. This token is won by the individual with the highest ﬁtness that correctly classiﬁes it. After token distribution, algorithm proceeds to correct the label ﬁtness of each individual using the following expression: new f itness =

original f itness × tokens won total tokens

(4)

Evolving Multi-label Classiﬁcation Rules

13

3. After the token competition the best individuals are selected to constitute the next generation. To make this selection the best ﬁtness of each individual is used. Moreover the genetic operators are applied over the population. When the algorithm ﬁnishes, it is easy to ﬁnd the individuals that must be in the learned classiﬁer. Only those individuals that have won some tokens are relevant for the classiﬁer and the rest can be rejected.

3

Experimental Section

The implementation of the algorithm was carried out using the JCLEC library [13]. JCLEC is a framework to develop evolutionary computation applications implemented in Java. JCLEC provides a set of generic classes that abstract the main features presented in evolutionary algorithms. The objective of experiments is to determine the performance of the proposed algorithm and then to compare it with other multi-label proposals, both transformation methods and pure multi-label methods. Our algorithm has been compared to three other methods namely, Binary Relevance (BR), Label Powerset (LP) and the ML-KNN method. Both BR and LP are problem transformation methods[14] while ML-KNN is an implementation of the k-nearest neighbor method speciﬁcally designed for multi-label data. The conﬁguration parameters of the algorithm have been obtained testing it previously to the main experiments. A population of 5000 individuals have been used, each of them with 6 genes with a head length of 35. The maximum number of generations is 60 and the crossover probability is 0.8. In addition, the mutation and transposition probability is 0.2. The selection method is tournament of 2 individuals size. The four data sets used to perform the experiments have been scene [15], yeast [16], genbase [17] and medical [18]. These data sets belong to a wide variety of application domains and they have been used in other multi-label studies covering several paradigms. Table 1 shows the main characteristics of these data sets, including label cardinality that is the average number of labels per example, and label density, which is the same number divided by the number of labels in the problem, |L|. Table 1. Characteristics of datasets Dataset #Patterns #Labels Cardinality Density Scene 2407 6 1.061 0.176 Genbase 662 27 1.252 0.046 Yeast 2417 14 4.228 0.302 Medical 978 45 1.245 0.028

´ J.L. Avila-Jim´ enez, E. Gibaja, and S. Ventura

14

4

Results and Discussion

The measures of accuracy, precision and recall have been used in order to compare GC-ML with BR, LP and ML-KNN methods. Accuracy is the percent of correct answers of the classiﬁer divided by the total answers (Equation 5) and precision and recall are the same measures used to calculate ﬁtness function, previously showed in Equations 1 and 2. A 10 fold cross validation has been made for each data set and algorithm. Table 2 shows the average metrics. Accuracy =

tp + tn tp + tn + f p + f n

IF (PARAM_48 > 0.5) OR (((PARAM_19 = 0.4) AND (PARAM_71 != 0.1)) OR (PARAM_13 >0.0)) OR ((NOT (PARAM_60 <0.5) OR (PARAM_29 = 0.3)) OR (PARAM_25 != 0.6)) OR ((PARAM_16 >0.8) OR ((PARAM_36 = 0.1) AND (PARAM_11 = 0.4) ) OR (PARAM_46 != -6.0)) OR ((NOT (PARAM_54 >0.3) OR (PARAM_11 = 0.2)) AND PARAM_14 = 1.6)) OR ((PARAM_42 >3.1) AND ((PARAM_16 = 0.3) OR (PARAM_11 = 0.4) ) OR (PARAM_12 < -1.0))

THEN LABEL_1

Fig. 2. Example of a discovered rule

Table 2. Experimental results Algorithm Bin. Rel. Label Pow. ML-Knn GC-ML Dataset Accuracy values Scene 0.434 0.577 0.629 0.571 Genbase 0.273 0.684 0.638 0.778 Yeast 0.421 0.398 0.492 0.432 Medical 0.592 0.617 0.560 0.649 Precision values Scene 0.443 0.602 0.660 0.554 Genbase 0.276 0.694 0.674 0.753 Yeast 0.527 0.528 0.732 0.672 Medical 0.651 0.678 0.573 0.702 Recall values Scene 0.815 0.591 0.678 0.690 Genbase 0.273 0.654 0.638 0.683 Yeast 0.619 0.528 0.549 0.572 Medical 0.619 0.650 0.568 0.699

(5)

Evolving Multi-label Classiﬁcation Rules

15

The proposed algorithm obtains better results, in general, than the other algorithms for each data set and measure. These results are comparable with those obtained by the other algorithms. Nevertheless, our algorithm obtains better results with categorical datasets. For example, if we see the results of scene and yeast, which are numeric data sets, GC-ML has the best results in accuracy, but it is similar to those obtained by the rest. However, with respect to the medical and genbase results, the proposed algorithm obtains better a accuracy. This behavior can be observed with the other metrics and it is reasonable because classiﬁcation rules are more suitable to deal with nominal attributes than with numerical ones. Despite the other results, it is worth noting that, regardless of the kind of data set, our algorithm obtains always better values than the transformation methods (LP and BR) for every measure and dataset.

5

Conclusions

The present work shows the GC-ML algorithm, an algorithm for multi-label classiﬁcation. This is an evolutionary proposal, based on GEP, where individuals encode classiﬁcation rules to determine whether a pattern belongs or not to a particular label in a multi-label context. The ﬁnal classiﬁer is built as the combination of multiple rules present in the population. The algorithm implements a token competition technique speciﬁcally designed to deal with multilabel patterns whose aim is to ensure that there are individuals in the population associated with all the classes present in the problem. Studies developed to verify the performance of the GC-ML algorithm with respect to other alternatives indicate that it gets better results than problem transformation proposals like BR and LP. Besides, it obtains similar results, or better in most of cases, than the multi-label implementation of the KNN algorithm. The experiments also show that the proposed algorithm is well indicated to be used with nominal data sets. Regarding to future research, the algorithm is being tested with other datasets and also compared with other approaches for multi-label classiﬁcation.

Acknowledges This work has been ﬁnanced in part by the TIN2008-06681-C06-03 project of the Spanish Inter-Ministerial Commission of Science and Technology (CICYT), the P08-TIC-3720 project of the Andalusian Science and Technology Department and FEDER funds.

References 1. Tsoumakas, G., Katakis, I., Vlahavas, I.: A review of multi-label classiﬁcation methods. In: Proceedings of the 2nd ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD 2006), pp. 99–109 (2006)

16

´ J.L. Avila-Jim´ enez, E. Gibaja, and S. Ventura

2. Wan, S.P., Xu, J.H.: A multi-label classiﬁcation algorithm based on triple class support vector machine. In: Proc. 2007 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR 2007), Beijing, China (November 2007) 3. Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classiﬁcation, vol. 2, pp. 718–721. The IEEE Computational Intelligence Society (2005) 4. Blockeel, H., Schietgat, L., Struyf, J., Dzeroski, S., Clare, A.: Decision trees for hierarchical multilabel classiﬁcation: A case study in functional genomics. In: F¨ urnkranz, J., Scheﬀer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI, LNB), vol. 4213, pp. 18–29. Springer, Heidelberg (2006) 5. Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, p. 42. Springer, Heidelberg (2001) 6. Chan, A., Freitas, A.A.: A new ant colony algorithm for multi-label classiﬁcation with applications in bioinfomatics. In: GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 27–34. ACM Press, New York (2006) 7. A new approach for multi-label classiﬁcation based on default hierarchies and organizational learning (2008) ´ 8. Avila, J.L., Galindo, E.L.G., Zafra, A., Ventura, S.: A niching algorithm to learn discriminant functions with multi-label patterns. In: Corchado, E., Yin, H. (eds.) IDEAL 2009. LNCS, vol. 5788, pp. 570–577. Springer, Heidelberg (2009) 9. Ferreira, C.: Gene expression programming: a new adaptative algorithm for solving problems. Complex Systems 13(2), 87–129 (2001) 10. Weinert, W.R., Lopes, H.S.: GEPCLASS: A classiﬁcation rule discovery tool using gene expression programming. In: Li, X., Za¨ıane, O.R., Li, Z.-h. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 871–880. Springer, Heidelberg (2006) 11. Wong, M.L., Leung, K.S.: Data Mining Using Grammar-Based Genetic Programming and Applications. Kluwer Academic Publishers, Norwell (2000) 12. Lu, W., Traore, I.: Detecting new forms of network intrusion using genetic programming. In: The 2003 Congress on Evolutionary Computation, CEC 2003, vol. 3, pp. 2165–2172 (2003) 13. Ventura, S., Romero, C., Zafra, A., Delgado, J.A., Herv´ as, C.: JCLEC: A Java framework for evolutionary computation. Soft Computing 12(4), 381–392 (2008) 14. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: An ensemble method for multilabel classiﬁcation. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladeniˇc, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI, LNB), vol. 4701, pp. 406–417. Springer, Heidelberg (2007) 15. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classiﬁcation. Pattern Recognition 37(9), 1757–1771 (2004) 16. Elisseeﬀ, A., Weston, J.: A kernel method for multi-labelled classiﬁcation. In: Advances in Neural Information Processing Systems, vol. 14 (2001) 17. Diplaris, S., Tsoumakas, G., Mitkas, P., Vlahavas, I.: Protein classiﬁcation with multiple algorithms. In: Advances in Informatics, pp. 448–456 (2005) 18. Diesner, J., Frantz, T.L., Carley, K.M.: Communication networks from the enron email corpus it’s always about the people. enron is no diﬀerent. Comput. Math. Organ. Theory 11(3), 201–228 (2005)

Solving Classification Problems Using Genetic Programming Algorithms on GPUs Alberto Cano, Amelia Zafra, and Sebasti´ an Ventura Department of Computing and Numerical Analysis, University of C´ ordoba 14071 C´ ordoba, Spain {i52caroa,azafra,sventura}@uco.es

Abstract. Genetic Programming is very eﬃcient in problem solving compared to other proposals but its performance is very slow when the size of the data increases. This paper proposes a model for multi-threaded Genetic Programming classiﬁcation evaluation using a NVIDIA CUDA GPUs programming model to parallelize the evaluation phase and reduce computational time. Three diﬀerent well-known Genetic Programming classiﬁcation algorithms are evaluated using the parallel evaluation model proposed. Experimental results using UCI Machine Learning data sets compare the performance of the three classiﬁcation algorithms in single and multithreaded Java, C and CUDA GPU code. Results show that our proposal is much more eﬃcient.

1

Introduction

Evolutionary Algorithms (EA) are a good method, inspired by natural evolution, to ﬁnd a reasonable solution for data mining and knowledge discovery [1], but they can be slow at converging and have complex and great dimensional problems. Their parallelization has been an object of intensive study. Concretely, we focus on the Genetic Programming (GP) paradigm. GP has been parallelized in many ways to take advantage both of diﬀerent types of parallel hardware and of diﬀerent features in particular problem domains. Most parallel algorithms during the last two decades deal with implementation in clusters or Massively Parallel Processing architectures. More recently, the studies on parallelization focus on using graphic processing units (GPUs) which provide fast parallel hardware for a fraction of the cost of a traditional parallel system. The purpose of this paper is to improve the eﬃciency of GP classiﬁcation models in solving classiﬁcation problems. Once it has been demonstrated that the evaluation phase is the one that requires the most computational time, the proposal is to parallelize this phase generically, to be used by diﬀerent algorithms. An evaluator is designed using GPUs to speed up the performance, receiving a classiﬁer and returning the confusion matrix of that classiﬁer to a database. Three of the most popular GP algorithms are tested using the parallel evaluator proposed. Results greatly speed algorithm performance up to 1000 times with respect to a non-parallel version executed sequentially. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 17–26, 2010. c Springer-Verlag Berlin Heidelberg 2010

18

A. Cano, A. Zafra, and S. Ventura

The remainder of this paper is organized as follows. Section 2 provides an overview of GPU architecture and related experiences with Evolutionary Computation in GPUs. Section 3 analyzes GP classiﬁcation algorithms to ﬁnd out which parts of the algorithms are more susceptible to performance improvement, and discusses their parallelization and implementation. Section 4 describes the experimental setup and presents computational results for the models benchmarked. Conclusions of our investigation and future research tasks are summarized in Section 5.

2

Overview

In this section, ﬁrst, the CUDA programming model on GPU will be speciﬁed. Then, the more important previous studies of GPU on GP will be described. 2.1

CUDA Programming Model on GPU

The CUDA programming model [13] executes kernels as batches of parallel threads in a Single Instruction Multiple Data (SIMD) programming style. These kernels comprise thousands to millions of lightweight GPU threads for each kernel invocation. CUDA threads are organized into a two-level hierarchy represented in Figure 1. At the higher one, all the threads in a data-parallel execution phase form a grid. Each call to a kernel initiates a grid composed of many thread groupings, called thread blocks. All the blocks in a grid have the same number of threads, with a maximum of 512. The maximum number of thread blocks is 65535 x 65535, so each device can run up to 65535 x 65535 x 512 = 2 · 1012 threads. To properly identify threads, each thread in a thread block has a unique ID in the form of a three-dimensional coordinate, and each block in a grid also has a unique two-dimensional coordinate. Thread blocks are executed in streaming multiprocessors (SM). A SM can perform zero-overhead scheduling to interleave warps and hide the latency of long-latency arithmetic and memory operations. There are four diﬀerent main memory spaces: global, constant, shared and local. GPU’s memories are specialized and have diﬀerent access times, lifetimes and output limitations. 2.2

Related GP Works with GPUS

Several studies have been done on GPUs and Massively Parallel Processing architectures within the framework of evolutionary computation. Concretely, we can cite some studies on Genetic Programming on GPUs [7]. D. Chitty [4] describes the technique of general purpose computing using graphics cards and how to extend this technique to Genetic Programming. The

Solving Classiﬁcation Problems Using GP Algorithms on GPUs

19

Fig. 1. CUDA threads and blocks

improvement in the performance of Genetic Programming on single processor architectures is also demonstrated. S. Harding [8] goes on to report on how exactly the evaluation of individuals on GP could be accelerated. Previous research on GPUs have shown evaluation phase speedups for large training case sets like the one in our proposal. Furthermore, D. Robilliard [14] proposes a parallelization scheme to exploit the power of the GPU on small training sets. To optimize with a modest-sized training set, instead of sequentially evaluating the GP solutions parallelizing the training cases, the parallel capacity of the GPU are shared by the GP programs and data. Thus, diﬀerent GP programs are evaluated in parallel, and a cluster of elementary processors are assigned to each of them to treat the training cases in parallel. A similar technique but using an implementation based on the Single Program Multiple Data (SPMD) model is proposed by W. Landgdon and A. Harrison [12]. The use of SPMD instead of Single Instruction Multiple Data (SIMD) aﬀords the opportunity to achieve increased speedups since, for example, one cluster can interpret the if branch of a test while another cluster treats the else branch independently. On the other hand, performing the same computation inside a cluster is also possible, but the two branches are processed sequentially in order to respect the SIMD constraint: this is called divergence and is, of course, less eﬃcient. These reports have helped us to design and optimize our proposal to achieve maximum performance in data sets with diﬀerent dimensions and population sizes.

20

3

A. Cano, A. Zafra, and S. Ventura

Genetic Programming Algorithms on GPU

In this section the computation time of the diﬀerent phases of the GP generational algorithm is evaluated to determine the most expensive part of algorithm. Then, it is speciﬁed how parallelize using GPU the evaluation phase of a GP algorithm. 3.1

GP Classification Algorithm

Genetic Programming, introduced by Koza [10], is a learning methodology belonging to the family of EA [17]. Among successful EA implementations, GP retains a signiﬁcant position due to such valuable characteristics as: its ﬂexible variable length solution representation, the fact that a priori knowledge is not needed about the statistical distribution of the data (data distribution free), data in their original form can be used to operate directly on them, unknown relationships that exist among data can be detected and expressed as a mathematical expression and ﬁnally, the most important discriminative features of a class can be discovered. These characteristics convert these algorithms into a paradigm of growing interest both for obtaining classiﬁcation rules [2],[18], and for other tasks related to prediction, such as feature selection and the generation of discriminant functions. The evolutionary process of Genetic Programming algorithms [6], similar to other EA paradigms, is represented in Figure 2 and consists of the following steps. The initial population is created randomly and evaluated. For each generation, the algorithm performs a selection of the best parents. This subset of individuals is recombined and mutated by applying diﬀerent genetic operators,

Table 1. CPU GP classiﬁcation time test Phase Time (ms) Percentage Initialization 8647 8.96% Creation 382 0.39% 8265 8.57% Evaluation Generation 87793 91.04% 11 0.01% Selection 13 0.01% Crossover Mutation 26 0.03% 82282 85.32% Evaluation 26 0.03% Replacement Control 5435 5.64% Total 96440 100 % Fig. 2. GP Evolution Model

Solving Classiﬁcation Problems Using GP Algorithms on GPUs

21

thus obtaining oﬀspring. These new individuals must now be evaluated using the ﬁtness function. Diﬀerent replacement strategies can be employed on parents and oﬀspring so that the next generation population size remains constant. The algorithm performs a control stage in which it terminates if it ﬁnds acceptable solutions or the generation count reaches a limit. The consecutive processes of selection of parents, crossover, mutation, evaluation, replacement and control constitute a generation of the algorithm. Experiments have been conducted to evaluate the computation time of the diﬀerent phases of the generational algorithm. The experiment using GP algorithms represented in table 1 proves that around 94% of the time is taken up by the evaluation phase. This percentage is mainly linked to the population size and the number of patterns, increasing up to 98% in large problems. Thus, the most signiﬁcant improvement is obtained by accelerating the evaluation phase, and this is what we do in our proposal. The most computationally expensive phase is evaluation since it involves the test of all individuals generated throughout all the patterns. Each individual’s expression must be interpreted or translated into an executable format which is then evaluated for each training pattern. The result of the individual’s evaluation throughout all the patterns is used to build the confusion matrix. The confusion matrix allows us to apply diﬀerent quality indexes to get the individual’s ﬁtness. The evaluation process of individuals within the population consists of two loops, where each individual iterates each pattern and checks if the rule covers that pattern. These two loops make the algorithm really slow when the population or pattern size increases. 3.2

Implementation on GPU

To parallelize the evaluation phase, the designed implementation is meant to be as generic as possible to be employed by diﬀerent classiﬁcation algorithms. The implementation receives a classiﬁer and returns the confusion matrix that is used by most algorithms for the calculation of the ﬁtness function. To take advantage of GPU architecture [15], all individuals are evaluated throughout all the patterns simultaneously. An easy way to do that, is to create a grid of thread blocks sized as follows: one dimension is sized as the number of individuals and the other dimension is sized as the number of patterns in the data set. This organization means that one thread is the evaluation of one individual in one pattern. To achieve full performance, we have to maximize multiprocessor occupancy, so each block represents the evaluation of one individual in 128 patterns. This way, each thread within the block computes one single evaluation, then the size of the second dimension of the grid is the number of patterns divided by 128. This conﬁguration allows up to 65536 individuals and 8.388.608 patterns per GPU and kernel call, large enough for all the data sets tested. Larger populations can be evaluated using several GPUs, up to 4 devices per host or iterating kernel calls. Before running the evaluator on the GPU, the individuals’s rules must be copied to the memory device using the PCI-E bus. Full performance can be

22

A. Cano, A. Zafra, and S. Ventura

obtained by copying the rules into constant memory and pointing all the threads in a warp toward the same rule, resulting in a single memory instruction access. Constant memory provides a 64 KB cache written by the host and read by GPU threads. The evaluation takes place in two steps. In the ﬁrst kernel, each thread checks if the antecedent of the rule covers the pattern and the consequent matches the pattern class and stores a value, generally a hit or a miss. This implementation allows reusability for diﬀerent classiﬁcation models by changing the value stored depending on whether the antecedent does or does not cover the pattern, and the resulting matches or non-matches of the pattern class. The kernel function must analyze the expression working with Polish notation, also known as preﬁx notation. Its distinguishing feature is that it places operators to the left of their operands. If the arity of the operators is ﬁxed, the result is a syntax lacking parentheses or other brackets. While there remain tokens to be analyzed in the individual’s expression, it checks what it has to do next by using a stack in order to store numerical values. Finally, we check the top value of the stack. If this value is true, that means the antecedent was true, so depending on the algorithm used, we compare this value to the known class given for the pattern. The second kernel performs a reduction [5] and counts the results by subsets of 128, to get the total number of hits and misses. Using the total number of hits and misses, the proposed kernel performs the calculation of the confusion matrix on GPU, which could be used by any implementation of GP classiﬁcation algorithms or even any genetic algorithm. This way, the kernel calculates the ﬁtness of individuals in parallel using the confusion matrix and the quality metrics required by the classiﬁcation model. Finally, results are copied back to the host memory and set to individuals for the next generation.

4

Experimental Results

Experiments carried out compare the performance of three diﬀerent GP algorithms in single and multithreaded Java, C and CUDA GPU code. This section explains several experimentation details related with the data sets and the algorithms and then the speedup of the diﬀerent implementations is compared. 4.1

Experimental Setup

This paper presents an implementation of a GPU GP evaluator for data classiﬁcation using JCLEC [16]. JCLEC is a software system for Evolutionary Computation (EC) research, developed in Java programming language. It provides a high-level software environment for any kind of Evolutionary Algorithm, with support for Genetic Algorithms, Genetic Programming and Evolutionary Programming. We have selected two databases from the UCI repository for benchmarks, shuttle and poker hand inverse. The shuttle data set contains 9 attributes, 58000 instances and 7 classes. The poker hand inverse data set contains 11 attributes, 106 instances and 10 classes.

Solving Classiﬁcation Problems Using GP Algorithms on GPUs

23

Experiments were run on a PC equipped with an Intel Core i7 processor running at 2.66GHz with two NVIDIA GeForce 285 GTX video cards equipped with 2GB of GDDR3 video RAM. No overclocking was done for any of the hardware. Three diﬀerent Grammar-Guided Genetic-Programming classiﬁcation algorithms that were proposed in the literature are tested using our evaluator proposal. The evaluation concerning each algorithm is detailed for parallelization purposes. Falco, Della and Tarantino [9] propose a method to achieve rule ﬁtness by evaluating the antecedent throughout all the patterns within the data set. The adjustment function calculates the diﬀerence between the number of examples where the rule does or does not correctly predict the membership of the class and number of examples; if the opposite occurs, then the prediction is wrong. Speciﬁcally it measures the number of true positives tp , false positives fp , true negatives tn and false negatives fn . Finally ﬁtness is expressed as: f itness = I − ((tp + tn ) − (fp + fn )) + α ∗ N

(1)

where I is the total number of examples from all training, α is a value between 0 and 1 and N is the number of nodes. Tan, Tay, Lee and Heng [11] propose a ﬁtness function that combines two indicators that are commonplace in the domain, namely sensitivity (Se) and speciﬁty (Sp), deﬁned as follows: Se =

tp tp + f n

Sp =

tn f p + tn

(2)

Thus, ﬁtness is deﬁned by the product of these two parameters. f itness = Se ∗ Sp

(3)

The proposal by Bojarczuk, Lopes and Freitas [3] presents a method in which each rule is simultaneously evaluated for all the classes in a pattern. The classiﬁer is formed by taking the best individual for each class generated during the evolutionary process. GP does not produce simple solutions. The comprehensibility of a rule is inversely proportional to its size. Therefore Bojarczuk deﬁnes the simplicity (Sy) and then the ﬁtness of a rule: Sy = 4.2

maxnodes − 0.5 ∗ numnodes − 0.5 maxnodes − 1

f itness = Se ∗ Sp ∗ Sy

(4)

Comparing the Performance of GPU and Other Proposals

The results of the three GP classiﬁcation algorithms are benchmarked using two UCI data sets that are shown in Tables 2 and 3. The rows represent the

24

A. Cano, A. Zafra, and S. Ventura

Speed up

Table 2. Shuttle generation speedup results Tan Model 200 400

Pop

100

Java

1

1

C1

2,8

C2

5,5

Falco Model 200 400 800

Bojarczuk Model 100 200 400 800

800

100

1

1

1

3,1

3,2

2,9

5,4

6,1

6,3

5,7

10,6

15,9 10,5 10,1

35,9 24,6 22,1 18,1

C4

10,1 11,5 12,5 10,7

19,7

30,3 20,5 19,8

65,2 47,3 40,1 33,7

C8

11,1 12,4 13,4 10,3

19,9

30,1 21,2 20,6

65,7 46,8 40,5 34,6

GPU

218

267

293

253

487

660

460

453

614

408

312

269

GPUs

436

534

587

506

785 1187

899

867

1060

795

621

533

1

1

1

8,1

5,2

5,0

1

1

1

1

18,8 12,5 11,2

9,5

Table 3. Poker-I generation speedup results

Pop Speed up

Java C1 C2

100

Tan Model 200 400

800

100

Falco Model 200 400

800

Bojarczuk Model 100 200 400 800

1

1

1

1

1

1

1

1

1

1

1

1

2,7

3,2

3,1

3,0

4,6

5,0

5,6

4,9

5,5

5,7

5,8

4,7

5,5

6,5

6,7

5,5

9,0

9,8

11,1

C4

10,3 11,1 12,8 10,5

16,8

18,9

21,6

18,9

9,7 10,6 11,0 11,6 11,3 20,3 20,7 22,5 22,3

C8

11,2 12,9 14,0 10,2

18,5

20,5

23,3

26,4

21,8 22,0 24,2 23,8

GPU

155

174

234

221

688

623

648

611

142

147

148

142

GPUs

288

336

500

439 1275 1200 1287 1197

267

297

288

283

speedup compared to Java execution time. Each column is labeled with the algorithm execution conﬁguration from left to right: Population size, Java single CPU thread, C single CPU thread, C two CPU threads, C four CPU threads, C eight CPU threads, 1 GPU device, 2 GPU devices. Benchmark results prove the ability of GPUs to solve GP evaluation. Intel i7 quadcore performs linear scalability from 1 to 2 and 4 threads, but not any further. After that point, GPUs perform much better. Their parallelized model allows the time spent on a classiﬁcation problem to drop from one month to only one hour. Real classiﬁcation training usually needs dozens of evaluations to get an accurate result, so the absolute time saved is actually a great deal of time. The greatest speedup is obtained with the Falco model which increases performance by a factor up to 1200 over the Java solution and up to 150 over the C single threaded CPU implementation. The speed up obtained is actually similar when considering diﬀerent population sizes, for example a population of 200 individuals is selected to represent the ﬁgures. Fig. 3 and Fig. 4 display the speed up obtained by the diﬀerent proposals with respect to the sequential Java version in the Shuttle and Poker data sets respectively. Note the progessive

Solving Classiﬁcation Problems Using GP Algorithms on GPUs ϮϬϰϴ

ϮϬϰϴ

&ĂůĐŽ

ϭϬϮϰ

ϱϭϮ

ŽũĂƌĐǌƵŬ

Ϯϱϲ

dĂŶ

Ϯϱϲ ^ƉĞĞĚƵƉ

ϭϮϴ ϲϰ ^ƉĞĞĚƵƉ

&ĂůĐŽ

ϭϬϮϰ

dĂŶ

ϱϭϮ

25

ϯϮ

ϲϰ ϯϮ

ϭϲ

ϭϲ

ϴ

ϴ

ϰ

ϰ

Ϯ

ŽũĂƌĐǌƵŬ

ϭϮϴ

Ϯ

ϭ

ϭ :ĂǀĂ

WhͲ ϭ

WhͲ Ϯ

WhͲ ϰ

WhͲ ϴ

'WhͲ ϭ

'WhͲ Ϯ

Fig. 3. Shuttle data set speed up

:ĂǀĂ

WhͲ ϭ

WhͲ Ϯ

WhͲ ϰ

WhͲ ϴ

'WhͲ ϭ

'WhͲ Ϯ

Fig. 4. Poker-I data set speed up

improvement obtained by threading C implementation and the signiﬁcant increase that occurs when using GPUs.

5

Conclusions

Massive parallelization using the NVIDIA CUDA framework provides a hundred or thousand time speedup over Java and C implementation. GPUs are best for massive multithreaded tasks where each thread does its job but all of them collaborate in the execution of the program. The CPU solution is lineal and complex. However, the GPU groups the threads into a block; then a grid of blocks is executed in SMs multiprocessors, one per SM. Thus, linearity is approximated by the number of 30-block grids. This implementation allows future scalability for GPUs with more processors. Next the NVIDIA GPU code-named Fermi doubles the number of cores available to 512 and performs up to 1.5 TFLOPs in single precision. It is noteworthy that i7 CPU scores are 2.5 times faster than 3.0 GHz PIV in our benchmarks. Further work will implement the whole algorithm inside the GPU, so that selection, crossover, mutation, replacement and control phases will be parallelized, reducing data memory transfers between CPU and GPU devices.

Acknowledgments The authors gratefully acknowledge the ﬁnancial support provided by the Spanish department of Research under TIN2008-06681-C06-03,P08-TIC-3720 Projects and FEDER funds.

References 1. Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, Heidelberg (2002) 2. Tsakonas, A.: A comparison of classiﬁcation accuracy of four Genetic Programming-evolved intelligent structures. Information Sciences 176(6), 691–724 (2006)

26

A. Cano, A. Zafra, and S. Ventura

3. Bojarczuk, C.C., Lopes, H.S., Freitas, A.A., Michalkiewicz, E.L.: A constrainedsyntax Genetic Programming system for discovering classiﬁcation rules: application to medical data sets. Artiﬁcial Intelligence in Medicine 30(1), 27–48 (2004) 4. Chitty, D.: A data parallel approach to Genetic Programming using programmable graphics hardware. In: GECCO 2007: Proceedings of the Conference on Genetic and Evolutionary Computing, pp. 1566–1573 (2007) 5. Kirk, D., Hwu, W.-m.W., Stratton, J.: Reductions and Their Implementation. University of Illinois, Urbana-Champaign (2009) 6. Deb, K.: A population-based algorithm-generator for real-parameter optimization. Soft Computing 9(4), 236–253 (2005) 7. Genetic Programming on General Purpose Graphics Processing Units, GP GP GPU, http://www.gpgpgpu.com 8. Harding, S., Banzhaf, W.: Fast Genetic Programming and artiﬁcial developmental systems on GPUS. In: HPCS 2007: Proceedings of the Conference on High Performance Computing and Simulation (2007) 9. De Falco, I., Della Cioppa, A., Tarantino, E.: Discovering interesting classiﬁcation rules with Genetic Programming. Applied Soft Computing Journal 1(4), 257–269 (2002) 10. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992) 11. Tan, K.C., Tay, A., Lee, T.H., Heng, C.M.: Mining multiple comprehensible classiﬁcation rules using Genetic Programming. In: CEC 2002: Proceedings of the Evolutionary Computation on 2002, pp. 1302–1307 (2002) 12. Langdon, W., Harrison, A.: GP on SPMD parallel graphics hardware for mega bioinformatics data mining. Soft Computing. A Fusion of Foundations, Methodologies and Applications 12(12), 1169–1183 (2008) 13. NVIDIA Programming and Best Practices Guide 2.3, NVIDIA CUDA Zone, http://www.nvidia.com/object/cuda_home.html 14. Robilliard, D., Marion-Poty, V., Fonlupt, C.: Genetic programming on graphics processing units. Genetic Programming and Evolvable Machines 10(4), 447–471 (2009) 15. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.-m.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 73–82 (2008) 16. Ventura, S., Romero, C., Zafra, A., Delgado, J.A., Herv´ as, C.: JCLEC: A Java framework for evolutionary computation. Soft Computing 12(4), 381–392 (2007) 17. Back, T., Fogel, D., Michalewicz, Z.: Handbook of Evolutionary Computation. Oxford University Press, Oxford (1997) 18. Lensberg, T., Eilifsen, A., McKee, T.E.: Bankruptcy theory development and classiﬁcation via Genetic Programming. European Journal of Operational Research 169(2), 677–697 (2006)

Analysis of the Eﬀectiveness of G3PARM Algorithm J.M. Luna, J.R. Romero, and S. Ventura Dept. of Computer Science and Numerical Analysis, University of C´ ordoba, Rabanales Campus, Albert Einstein building, 14071 C´ ordoba, Spain {i32luarj,jrromero,sventura}@uco.es

Abstract. This paper presents an evolutionary algorithm using G3P (Grammar Guided Genetic Programming) for mining association rules in diﬀerent real-world databases. This algorithm, called G3PARM, uses an auxiliary population made up of its best individuals that will then act as parents for the next generation. The individuals are deﬁned through a context-free grammar and it allows us to obtain datatype-generic and valid individuals. We compare our approach to Apriori and FP-Growth algorithms and demonstrate that our proposal obtains rules with better support, conﬁdence and coverage of the dataset instances. Finally, a preliminary study is also introduced to compare the scalability of our algorithm. Our experimental studies illustrate that this approach is highly promising for discovering association rules in databases. Keywords: Genetic Programming, Association Rules, G3P.

1

Introduction

Association mining is used to obtain useful (or interesting) rules from which new knowledge can be mined. This kind of systems try to facilitate pattern discovery and to produce rules or inferences for subsequent interpretation by the end user. A leading algorithm [1] based on the discovery of frequent patterns has served as the starting point for many related research studies [2,6]. This algorithm, known as Apriori, was ﬁrst used successfully in extracting association rules. In short, it permits the extraction of frequent itemsets, and uses this knowledge to obtain association rules. To reduce its computational cost, the Apriori algorithm establishes that if any length k pattern is not frequent in the database, its length (k + 1) super-pattern can never be frequent. Han et al. [5] proposed a novel frequent-pattern tree (FP-tree) structure, which is an extended preﬁx-tree structure for storing compressed, crucial information about frequent patterns, and develop an eﬃcient FP-tree-based mining method, FP-Growth, for mining the complete set of frequent patterns by pattern fragment growth. Eﬃciency of mining is achieved with three techniques. Firstly, a large database is compressed into a condensed, smaller data structure, FPtree; then, the FP-tree-based mining adopts a pattern-fragment growth method E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 27–34, 2010. c Springer-Verlag Berlin Heidelberg 2010

28

J.M. Luna, J.R. Romero, and S. Ventura

to avoid the costly generation of a large number of candidate sets; and last, a divide-and-conquer method is used to decompose the mining task into a set of smaller tasks, which dramatically reduces the search space. This algorithm has served for other related research studies [3,8]. Many studies have already proposed Evolutionary Algorithms (EAs) [4] for rule extraction from databases, considering this kind of algorithm, and especially Genetic Algorithms (GAs), as one of the most successful search techniques applied in complex problems, since they have proved to be an important technique for learning and mining knowledge [7]. GAs are robust and ﬂexible search methods. The same GA can be executed using diﬀerent representations and also allows feasible solutions to be obtained within speciﬁed time limits. It is why the data mining experts have shown an increasing interest in both EAs and GAs. Many of the algorithms of association rules are brute-force algorithms, among which are Apriori and FP-Growth, where the computational time is too high. In this paper, we present the G3PARM algorithm wich uses G3P (Grammar Guided Genetic Programming) for mining association rules [9]. G3P is a genetic programming extension that allows correct programs to be obtained (in this case, rules) by deﬁning individuals through context-free grammars. Therefore, each individual generated by G3P is a derivation tree that generates and represents a solution using the language deﬁned by the grammar. This allows to use G3PARM with diﬀerent types of data by simply changing the grammar. The G3P algorithm proposed here makes use of an auxiliary population of individuals that exceed a certain quality threshold. We present the results of empirical comparison between G3PARM, Apriori and FP-Growth 1 algorithms and demonstrate that our proposal obtains rules with high support, high conﬁdence and high coverage of the dataset instances. Moreover, we present preliminary results reﬂecting the relative scalability of these algorithms. This paper is structured as follows: Section 2 describes the model conceived and its main characteristics; Section 3 describes the datasets used in the experiments; Section 4 describes both execution and results; ﬁnally, some conclusion remarks are underscored.

2

G3PARM Algorithm

This section presents our model along with its major characteristics: how individuals are represented, the genetic operators used, the evaluation process and the algorithm used. 2.1

Individual Representation

Each individual is composed of two distinct components: (a) a genotype, encoded using G3P with a tree structure with limited depth to avoid inﬁnite derivations, and (b) a phenotype, that represents the complete rule consisting of an antecedent 1

Coenen, F. (2003), The LUCS-KDD FP-Growth Association Rule Mining Algorithm, http://www.cxc.liv.ac.uk/~ frans/KDD/Software/FPgrowth/fpGrowth.html, Department of Computer Science, The University of Liverpool, UK.

Analysis of the Eﬀectiveness of G3PARM Algorithm

29

G = (ΣN , ΣT , Rule, P ) with: ΣN = {Rule, Antecedent, Consequent, Comparison, Categorical Comparator, Categorical Attribute Comparison } ΣT = {AND, “! =”, “=”, “name”, “value”} P = {Rule = Antecedent, Consequent ; Antecedent = Comparison | AND, Comparison, Antecedent ; Consequent = Comparison ; Comparison = Categorical Comparator, Categorical Attribute Comparison ; Categorical Comparator = “! =” | “=” ; Categorical Attribute Comparison = “name”, “value” ;} Fig. 1. Context-free grammar expressed in Extended BNF

and a consequent. The antecedent of each rule is formed by a series of conditions that contain the values of certain attributes that must all be satisﬁed, and the consecuent is composed of a single condition. Figure 1 shows the context-free grammar through which the population individuals are codiﬁed, where the nonterminal grammar symbol “name” is determined by the dataset attributes used each time. Moreover, for each grammar attribute the value that is assigned is determined by the range of values of that attribute in the dataset. Each individual is generated from the initial grammar symbol Rule through the random application of production rules P until a valid derivation chain is reached. The number of derivations is determined by the maximum derivation size provided in algorithm conﬁguration parameters. To carry out the derivation of the symbols that appear in the grammar, we use the cardinality concept, which is deﬁned as the number of elements generated in a set. The cardinality of each nonterminal symbol will be based on the set generated in d derivations. If a nonterminal symbol can be derived in several ways, the cardinality of the nonterminal symbols will be determined by the sum of the cardinalities of each of the possible derivations of that symbol. If a derivation has more than one nonterminal symbol, the cardinality of the set comprised by the symbols will be determined by the product of the cardinalities of each nonterminal symbol present in the derivation. 2.2

Genetic Operators

We use two genetic operators to generate new individuals in a given generation of the evolutionary algorithm: Crossover: this operator creates new individuals by making an exchange of two parent derivation subtrees from two randomly selected compatible nodes in each of them. Two nodes are compatible if they belong to the same nonterminal symbol, thus avoiding the production of an individual who does not ﬁt the deﬁned grammar. Mutator: this other operator randomly selects a tree node and will act based on the symbol type. If, the selected node is a nonterminal symbol, a new derivation

30

J.M. Luna, J.R. Romero, and S. Ventura

is performed from that node. If, however, the selected node is a terminal symbol, it changes the value of the terminal symbol at random. 2.3

Evaluation

Firstly, we must carry out the individual decoding by ﬁnding the association rule that corresponds to the genotype. This process is to build an in depth expression of the syntax tree, remove nonterminal symbols that appear and verifying that individuals do not have equal attributes in the antecedent and consecuent. The evaluation process of individuals is performed by obtaining the fitness function value. It will be the support, which is deﬁned as the ratio of records that contain A and C to the total number of records in the database. Here, A is called antecedent, and C consequent. D refers to dataset records. Another heuristic that we will use is the rule confidence. This is deﬁned as the ratio of the number of records that contain A and C to records that contain A. Both measures are expressed on a per unit basis. 2.4

Algorithm

The algorithm, represented by the pseudocode in Algorithms 1 and 2, starts producing the population by randomly generating individuals from the context-free grammar deﬁned in Figure 1 and fulﬁlling the maximum number of derivations. In its initial generation, the auxiliary population will be empty. Based on the population, individuals are selected via a binary tournament from the union of the current population and the auxiliary population. This selector works by selecting two individuals randomly from the current population and after comparing them, it keeps the best of both. Individuals are selected to act as parents for the crossover. The next step is to perform the mutation of the individuals selected. Once we have the new population by crossover and mutation, we update the auxiliary population by combining the previous auxiliary population and the current population. Then, the individuals are ranked according to the support and those with the same genotype are eliminated. The G3PARM algorithm considers two individuals to be equal if, despite having diﬀerent genotypes, they are comprised by the same attributes. For example, rules A AND B → C and B AND A → C are equal. From the resulting set, it selects the individuals that exceed a certain threshold of support and conﬁdence. The algorithm terminates once all the instances from the dataset are properly covered, or when it reaches a certain number of generations, returning auxiliary population individuals.

3

Experimentation

To evaluate the usefulness of G3PARM algorithm, several experiments have been carried out on diﬀerents datasets and using an Intel Core i7 with 12Gb of memory and running CentOS 5.4. The diﬀerent datasets used are: Credit-g (1000 instances, 21 attributes), HH (22784 instances, 17 attributes), M ushroom (8124

Analysis of the Eﬀectiveness of G3PARM Algorithm

31

Algorithm 1. G3PARM algorithm Require: max generations, N Ensure: A 1: P0 ← random(N ) 2: A0 ← ∅ 3: while num generations < max generations do 4: Select parents (Pt ∪ At ) 5: Crossover (P ) 6: Mutation (P ) 7: P ← P 8: Update auxiliary population (At ) 9: num generations + + 10: end while 11: return A

Algorithm 2. Update auxiliary population Require: A Ensure: A 1: A ← P + At 2: Order (A ) 3: Eliminate duplicate (A ) 4: At ← Threshold(A ) 5: return A

instances, 23 attributes), Segment (1500 instances, 20 attributes), Sonar (208 instances, 36 attributes), Soybean (683 instances, 36 attributes) and the Wisconsin Breast Cancer data source (683 instances, 11 attributes). In order to be compared with the Apriori and FP-Growth algorithms, numerical data were preprocessed using equal-width binning and equal-frequency binning 2 discretization techniques in ﬁve and ten intervals. However, G3PARM allows to use numerical datasets without any discretization, obtaining very promising results. G3PARM behaves diﬀerently depending on the conﬁguration parameters used, so to obtain the best parameters, a serie of tests were performed to obtain the population size, crossover probability, mutation probability, etc. The conﬁguration parameters obtained from these tests are 50 individuals, 70% crossover probability, 10% mutation probability, a maximum derivation number of 24, an external population of size 20, 90% external conﬁdence threshold and 70% external support threshold, and a limit of generations of 1000. For the Apriori and FP-Growth algorithms, we use the same threshold for support and conﬁdence than the used for the external population in the G3PARM algorithm. The results obtained by the proposed algorithm are the average results obtained running our algorithm with ten diﬀerent seeds. These seeds are used for the generation of random individuals based on the seed used. 2

These methods involves dividing the values range in constant size and frequent intervals respectively.

32

J.M. Luna, J.R. Romero, and S. Ventura Table 1. Obtained results

Dataset CreditEqF re10 CreditEqF re5 CreditEqW id10 CreditEqW id5 HHEqF re10 HHEqF req5 HHEqW id10 HHEqW id5 M ushroom SegmentEqF re10 SegmentEqF re5 SegmentEqW id10 SegmentEqW id5 SonarEqF re10 SonarEqF re5 SonarEqW id10 SonarEqW id5 Soybean W BCEqF re10 W BCEqF re5 W BCEqW id10 W BCEqW id5 Ranking

4

Average support (1) (2) (3) 0.780 0.709 0.850 0.780 0.709 0.850 0.780 0.709 0.892 0.773 0.709 0.858 None None 0.803 None None 0.740 0.761 0.761 0.922 0.765 0.765 0.902 0.824 0.817 0.890 0.876 0.876 0.813 0.876 0.876 0.817 0.815 0.815 0.884 0.860 0.860 0.882 None None 0.782 None None 0.583 None None 0.958 0.747 None 0.835 0.778 0.722 0.822 None None 0.875 None None 0.806 0.821 0.821 0.900 0.872 0.872 0.864 2.204 2.522 1.272

Average confidence (1) (2) (3) 0.941 0.855 0.939 0.941 0.855 0.953 0.941 0.855 0.965 0.942 0.863 0.961 None None 0.913 None None 0.909 0.950 0.950 0.986 0.955 0.955 0.979 0.968 0.960 0.978 0.975 0.975 0.926 0.975 0.975 0.974 0.968 0.968 0.979 0.964 0.964 0.969 None None 0.909 None None 0.731 None None 0.887 0.942 None 0.947 0.950 0.953 0.957 None None 0.958 None None 0.928 0.996 0.996 0.971 0.996 0.996 0.956 2.159 2.431 1.409

%Instances (1) (2) (3) 0.987 0.987 1.000 0.987 0.987 1.000 0.987 0.987 1.000 0.989 0.989 1.000 None None 1.000 None None 0.997 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.996 1.000 0.996 0.996 1.000 1.000 1.000 1.000 1.000 1.000 1.000 None None 1.000 None None 0.626 None None 1.000 0.846 None 1.000 1.000 1.000 1.000 None None 1.000 None None 1.000 0.821 0.821 1.000 0.872 0.872 1.000 2.340 2.386 1.272

Results

The results obtained with (1) Apriori, (2) FP-Growth and (3) G3PARM algorithms for each dataset are shown in Table 1 (the best results for each measure are highlighted in bold typefaces), where Average support is the average support of the rule set; Average confidence refers to the average conﬁdence of the rule set; and %Instances states for the percentage (expressed on a per unit basis) of instances covered by the rules on the total instances in the database. For each dataset, it is indicated the preprocessing type (EqFre for equal-frequency binning, and EqWid for equal-width binning) and the number of intervals used. Analyzing the results presented in Table 1, notice that the G3PARM algorithm obtains rules with better support and conﬁdence than Apriori and FPGrowth algorithms. Furthermore, G3PARM obtains rules that cover 100% of dataset instances. Only in two experiments does the algorithm not cover all the instances: HH (with a coverage of 99.75%) and Sonar (with a coverage of 62.69%). Furthermore, we must bear in mind that the maximum number of rules obtained is delimited by the auxiliary population size so the results present in this paper have been obtained with few rules (a maximum of 20 rules).

Analysis of the Eﬀectiveness of G3PARM Algorithm

33

To compare the obtained results and to be able to precisely analyze whether there are signiﬁcant diﬀerences between the three algorithms, we use the Friedman test. If the Friedman test rejects the null-hypothesis indicating that there are signiﬁcative diﬀerences, then we performed a Bonferroni-Dunn test to reveal the diﬀerences. We evaluate the performance of G3PARM by comparing it to the other algorithms in terms of their average support, average conﬁdence and percentage of instances covered of each algorithm. The average ranking for every algorithm is also shown in Table 1, where the computed control algorithm (i.e., the algorithm with the lowest ranking) is our proposal, as can be noted. The Friedman average ranking statistic for average support measure distributed according to FF with k − 1 and (k − 1)(N − 1) degrees of freedom is 15.332; 8.185 for average conﬁdence measure; and 13.838 for percentage of instances covered measure. None of them belongs to the critical interval [0, 3.219]. Thus, we reject the null-hynothesis that all algorithms perform equally well for these three measures. Due to the signiﬁcant diﬀerences between the three algorithms, we use the Bonferroni-Dunn test to reveal the diﬀerence in performance and the Critical Diﬀerence (CD) value is 0.846 considering p = 0.01. The results indicate that, at a signiﬁcance level of p = 0.01 (i.e., with a probability of 99%), there are signiﬁcant diﬀerences between the three algorithms. The performance of G3PARM is statistically better than others for the %Instances and Average support measures. Concerning to the average conﬁdence measure, the results indicate that, at a signiﬁcance level of p = 0.01 there are signiﬁcant diﬀerences between G3PARM and FP-Growth, since the performance of G3PARM is statistically better than the value of FP-Growth. G3PARM is also pretty competitive with Apriori in terms of the average conﬁdence measure. Several diﬀerent experiments have been also carried out to analyse the computation time of the three algorithms using the HH dataset. Figure 2(a) shows the relationship between the runtime and the number of instances. The Y axis represents time in seconds, whereas the X axis states for the percentage of instances using all attributes. In the same way, Figure 2(b) shows the relationship Apriori

FPGrowth

G3PARM

Runtime (seconds)

Runtime (seconds)

G3PARM

1000

100

10

1 10%

FPGrowth

Apriori

1000

20%

30%

40%

50%

60%

70%

80%

Percentage of instances

(a) Percentage of instances.

90%

100%

100

10

1

3

5

7

9

11

13

15

17

Number of attributes

(b) Number of attributes.

Fig. 2. Relationship between the runtime, the number of attributes and the percentage of instances

34

J.M. Luna, J.R. Romero, and S. Ventura

between the runtime and the number of attributes. The Y axis represents time in seconds and the X axis, the number of attributes using the 100% of instances. G3PARM scales better to the larger number of attributes in comparison with Apriori and FP-Growth since, the greater the number of attributes, the greater the combinatorial explosion caused to obtain the frequent itemsets.

5

Concluding Remarks

In this paper we presented G3PARM, a novel G3P-based algorithm for mining association rules from large data sources. The results obtained during the experimentation phase outline some conclusions concerning to the eﬀectiveness of our proposal: (a) the mined association rules maintain high support, high conﬁdence and high coverage of the dataset instances, providing the user with high representative rules; and (b) the runtime of our approach scales quite linearly as we increase up the dataset size and the number of attributes. Acknowledgments. This work has been supported by the Regional Government of Andalucia and the Ministry of Science and Technology projects, P08-TIC-3720 and TIN2008-06681-C06-03 respectively, and FEDER funds.

References 1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, September 1994, pp. 487–499 (1994) 2. Borgelt, C.: Eﬃcient implementations of Apriori and Eclat. In: FIMI 2003, 1st Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA (December 2003) 3. Coenen, F., Goulbourne, G., Leng, P.: Tree structures for mining association rules. Data Mining and Knowledge Discovery 8(1), 25–51 (2003) 4. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, New York (2003) 5. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000) 6. Pap`e, N.F., Alcal´ a-Fdez, J., Bonarini, A., Herrera, F.: Evolutionary extraction of association rules: A preliminary study on their eﬀectiveness. In: Corchado, E., ´ Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, Wu, X., Oja, E., Herrero, A., pp. 646–653. Springer, Heidelberg (2009) 7. Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Sequential genetic search for ensemble feature selection. In: Nineteenth International Joint Conference on Artiﬁcial Intelligence (IJCAI 2005), Edinburgh, Scotland, August 2005, pp. 877–882 (2005) 8. Wei, Z., Hongzhi, L., Na, Z.: Research on the fp growth algorithm about association rule mining. In: ICFCC 2009, International Conference on Future Computer and Communication, Kuala Lumpur, Malaysia, April 2009, pp. 572–576 (2009) 9. Yang, G., Shimada, K., Mabu, S., Hirasawa, K.: A nonlinear model to rank association rules based on semantic similarity and genetic network programming, vol. 4, pp. 248–256. Institute of Electrical Engineers of Japan (2009)

Reducing Dimensionality in Multiple Instance Learning with a Filter Method Amelia Zafra1 , Mykola Pechenizkiy2, and Sebasti´ an Ventura1 1

Department of Computer Science and Numerical Analysis. University of Cordoba 2 Department of Computer Science. Eindhoven University of Technology

Abstract. In this article, we describe a feature selection algorithm which can automatically ﬁnd relevant features for multiple instance learning. Multiple instance learning is considered an extension of traditional supervised learning where each example is made up of several instances and there is no speciﬁc information about particular instance labels. In this scenario, traditional supervised learning can not be applied directly and it is necessary to design new techniques. Our approach is based on principles of the well-known Relief-F algorithm which is extended to select features in this new learning paradigm by modifying the distance, the diﬀerence function and computation of the weight of the features. Four diﬀerent variants of this algorithm are proposed to evaluate their performance in this new learning framework. Experiment results using a representative number of diﬀerent algorithms show that predictive accuracy improves signiﬁcantly when a multiple instance learning classiﬁer is learnt on the reduced data set.

1

Introduction

Theoretically, having more features to carry out machine learning tasks should give us more discriminating power. However, the real-world provides us with many reasons why this is not generally the case. Thus, if we reduce the set of features considered by the algorithm, we can considerably decrease the running time of the induction algorithms increasing the accuracy of the resulting model. In light of this, a considerable amount of research has addressed the issue of feature subset selection in machine learning. Most studies are focused on traditional supervised learning and very few studies have dealt with a multiple instance learning framework. Multiple Instance Learning (MIL) introduced by Dietterich et al. [1] consists of representing each example in a data set as a collection or a bag composed of single or multiple instances. In machine learning, MIL extends traditional supervised learning for problems maintaining incomplete knowledge about labels in training examples. In supervised learning, every training instance is assigned a label that is discrete or has a real-value. In comparison, in MIL, the labels are assigned only to bags of instances. In the binary case, a bag is labeled positive if at least one instance in that bag is positive, and the bag is labeled negative if all the instances in it are negative, but there are no labels for individual instances. The goal of MIL is to classify unseen bags by using the labeled bags as the training data. From the formulation of MIL, it is easy to see E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 35–44, 2010. c Springer-Verlag Berlin Heidelberg 2010

36

A. Zafra, M. Pechenizkiy, and S. Ventura

that a positive bag may contain some negative instances in addition to one or more positive instances. Hence, the true labels for the instances in a positive bag may or may not be the same as the corresponding bag label and, consequently, the instance labels are inherently ambiguous. There have been very few feature selection studies done about MIL [2,3,4,5,6] and most of them deal with wrapper approaches. These approaches directly incorporate the bias of MIL algorithms and they proposed to achieve a better method for MIL that improves on previous algorithms which do not use dimensional reduction. However, there is no proposal for a ﬁlter method which can be used as a preprocessing step for any algorithm designed nor general empirical studies with a representative number of MIL algorithms to work with the same reduced data set to show the relevance of feature selection in the MIL framework. In this paper, we address the problem of selecting a subset of important features for MIL from a ﬁlter perspective. In the ﬁlter model, feature selection is performed as a preprocessing step to induction y therefore it can be applied in any previous MIL algorithm. We propose an eﬀective feature selection approach to MIL called ReliefF-MI which extends the ReliefF algorithm [7] to multiple instance learning. The proposed method assigns a real-valued weight to each feature to indicate its relevance to the problem. First, the features are ranked according to their importance and then a subset of important features is selected. One of the relevant characteristics of this method is that it can be applied to continuous and discrete problems in multiple instance classiﬁcation, is aware of contextual information and can correctly estimate the quality of attributes in problems with strong dependencies between attributes. To check the eﬀectiveness of the proposed model, diﬀerent versions of ReliefF-MI are considered, modifying the similarity function to calculate the distance between patterns and the weight of the diﬀerent features. Seventeen algorithms including diﬀerent paradigms of machine learning, such as, decision rules, support vector machines, naive Bayes, decision trees, logistic regression, diverse density and based on distances are tested to establish the eﬀectiveness of ﬁlter methods in MIL. Experimental results show, on one hand, that the new similarity function designed gets that ReliefF-MI method achieves the best results and, on the other hand, it is proven that algorithms always out-perform the results obtained when they apply the dimensionality reduction provided by ReliefF-MI as a preprocessing step. Thereby, it is demonstrated considering an important number of diﬀerent methods that knowing the more relevant features in classiﬁcation with MIL, this task is become more eﬃcient. The paper is structured as follows. Section 2 describes the proposals developed to reduce features in MIL. Section 3 evaluates and compares the methods proposed with respect to considering the whole feature set. Finally, section 4 ends with conclusions and raises several issues for future work.

2

A Filter Approach - ReliefF-MI

The procedure proposed is based on the principles of the ReliefF algorithm [7]. The original ReliefF algorithm estimates the quality of attributes by how well

Reducing Dimensionality in Multiple Instance Learning with a Filter Method

37

their values distinguish between instances that are near each other. In MIL, the distance between two patterns has to be calculated taking into account that each pattern contains one or more instances. Therefore, a new concept has to be introduced concerning the similarity function and the calculation of the diﬀerences between attribute A in patterns R and H, dif f (A, R, H). The literature proposes diﬀerent distance-based approaches to solve MI problems [8,9,10]. The most extensively used metric is Hausdorﬀ distance [11], which measures the distance between two sets. Three adaptations of this measurement have been proposed in MIL literature: maximal Hausdorﬀ, minimal Hausdorﬀ and average Hausdorﬀ. Here, a new metric is also designed based on previous ones that has been called the Adapted Hausdorﬀ. With each diﬀerent metric a diﬀerent version of ReliefF-MI is proposed. In continuation the metric and diﬀerential function in diﬀerent versions is shown. In all cases Ri is considered the bag selected in the current iteration which contains three instances (Ri1 , Ri2 , Ri3 ), Hj is the j th bag of the k nearest hit selected in the current iteration which contains four instances, (Hj1 , Hj2 , Hj3 , Hj4 ) and Mj is the j th bag of the k nearest misses selected in the current iteration which contains six instances, (Mj1 , Mj2 , Mj3 , Mj4 , Mj5 , Mj6 ). ReliefF-MI with maximal Hausdorﬀ distance. This extension of ReliefF for MIL uses the Maximal Hausdorﬀ Distance [11]. This distance is classical Hausdorﬀ distance and can be speciﬁed as: Hmax (Ri , Hj ) = max{hmax (Ri , Hj ), hmax (Hj , Ri )} where hmax (Ri , Hj ) = maxr∈Ri minh∈Hj ||r − h|| To calculate the diﬀerence between the attributes of two patterns, the instance selected are the maximal distance of the minimal distance between the diﬀerent instances of one bag and another: dif fbag−max (A, Ri , Hj ) = dif finstance (A, Ri3 , Hj4 ) being Ri3 and Hj4 instances which satisfy this condition. Similarly, the dif fbag−max (A, Ri , Mj ) is computed but with the instances of Ri and Mj . ReliefF-MI with minimal Hausdorﬀ distance. This extension of ReliefF for MIL uses the Minimal Hausdorﬀ Distance [8]. This distance instead of choosing the maximum distance, the distance is ranked ﬁrst and the lowest distance value is selected. Formally, it modiﬁes the Hausdorﬀ Distance deﬁnition as follows: Hmin (A, B) = mina∈A minb∈B ||a − b|| To calculate the diﬀerence between the attributes of two patterns, the instance of each bag selected to calculate the distances is the minimal distance between all instances of each bag: dif fbag−min (A, Ri , Hj ) = dif finstance (A, Ri1 , Hj3 ) being Ri1 and Hj3 instances that satisfy this condition. Similarly, dif fbag−min (A, Ri , Mj ) would be computed but with the instances of Ri and Mj . ReliefF-MI with average Hausdorﬀ distance. This extension of ReliefF for MIL uses the Average Hausdorﬀ distance [10] proposed by Zhang and Zhou to measure the distance between two bags. It is deﬁned as follows: Havg (Ri , Hj ) minh∈Hj ||r − h|| + minr∈Ri ||h − r||)/(|Ri | + |Hj |) =( r∈Ri

h∈Hj

38

A. Zafra, M. Pechenizkiy, and S. Ventura

where |.| measures the cardinality of a set. Havg (A, B) averages the distances between each instance in one bag and its nearest instance in the other bag. Conceptually speaking, average Hausdorﬀ distance takes more geometric relationships between two bags of instances into consideration than the maximal and minimal Hausdorﬀ ones. To calculate the diﬀerence between the attributes of two patterns, there are several instances involved to update the weights of the features. If we suppose – d(Ri1 , Hj2 ), d(Ri2 , Hj1 ) and d(Ri3 , Hj4 ) are the minimal distance between each instance r ∈ Ri with respect to instances h ∈ Hj . – d(Hj1 , Ri1 ), d(Hj2 , Ri1 ), d(Hj3 , Ri2 ) and d(Hj4 , Ri3 ), are the minimal distances between each instance h ∈ Hj with respect to the instances r ∈ Ri . The function dif f would be speciﬁed as following, 1 dif fbag−avg (A, Ri , Hj ) = r+h ∗ [dif finstance (A, Ri1 , Hj2 ) + dif finstance (A, Ri2 , 1 3 4 hj ) + dif finstance (A, Ri , hj ) + dif fi nstance(A, Hj1 , Ri1 ) + dif finstance (A, Hj2 , Ri1 ) + dif finstance (A, Hj3 , Ri2 ) + dif finstance (A, Hj4 , Ri3 )]

The same process is used to calculate dif fbag−avg (A, Ri , Mj ), but considering the pattern Mj . ReliefF-MI with adapted Hausdorﬀ distance. This extension of ReliefF for MIL uses the Adapted Hausdorﬀ distance. Due to the particularities of this learning, we propose this new distance combining the previous ones to measure the distance between two bags. This metric represents a diﬀerent calculation depending on the class of the pattern because the information about instances in each pattern depends on the class that it belongs to. Thus, the metric is diﬀerent if we evaluate the distance between two positive or negative patterns or between one positive and one negative pattern. – If both patterns are negative, we can be sure that there is no instance in the pattern that represents the concept that we want to learn. Therefore, an average distance will be used to measure the distance between these bags because all instances are guaranteed to be negative: Hadapted (Ri , Hj ) = Havg (Ri , Hj ). – If both patterns are positive. The correct information is that at least one instance in each of them represents the concept that we want to learn, but there is no information about which particular instance or set represents the concept. Therefore, we use the minimal distance to measure their distance because the positive instance has more probability of being near: Hadapted (Ri , Hj ) = Hmin (Ri , Hj ) – Finally, if we evaluate the distance between patterns where one of them is a positive bag and the other is a negative one, the measurement considered will be the maximal Hausdorﬀ distance because the instances in the diﬀerent classes are probably outliers between the two patterns: Hadapted (Ri , Mj ) = Hmax (Ri , Mj )

Reducing Dimensionality in Multiple Instance Learning with a Filter Method

39

In this case, the calculation of the dif f function also depends on the pattern class. Therefore, the pattern label will determine how the function dif f will be evaluated. – If Ri is positive and Hj is positive, dif fbag−adapted (A, Ri , Hj ) = dif fbag−min (A, Ri , Hj ) – If Ri is negative and Hj is negative, dif fbag−adapted (A, Ri , Hj ) = dif fbag−avg (A, Ri , Hj ) – Finally, if Ri is positive and Mj is negative or viceverse, dif fbag−adapted (A, Ri , Mj ) = dif fbag−max (A, Ri , Mj ) The function dif finstance used in the previous calculations is the diﬀerence between two particular instances for a given attribute. The total distance is simply the sum of distances throughout all attributes (Manhattan distance, [12]). When dealing with nominal attributes, function dif f (A, Ri , Hj ) is deﬁned as: 0; value(A, Ri ) = value(A, Hj ) dif finstance (A, Ix , Iy ) = 1; otherwise and for numerical attributes as: dif finstance (A, Ri , Hj ) =

3

|value(A, Ri ) − value(A, Hj )| max(A) − min(A)

Experimental Results

The experimental study is aimed to evaluate the performance of ReliefF-MI. This evaluation has been broken down into two parts: a comparative study between the results obtained by the diﬀerent versions designed and a comparison between algorithm performance that does or does not use dimensionality reduction to show the relevance of this ﬁlter method in the MIL framework. In the experimentation, seventeen of the most popular proposals in MIL are considered with three applications of categorization of images based on content [13,14], whose names and characteristics are given in Table 1. All the experiments were executed using 10-fold cross validation, and a statistical test was adopted to analyze the experimental results. Table 1. General Information about Data Sets Dataset Elephant Tiger Fox

Bags Attributes Instances Average Positive Negative Total Bag Size 100 100 100

100 100 100

200 200 200

230 230 230

1391 1220 1320

6.96 6.10 6.60

40

3.1

A. Zafra, M. Pechenizkiy, and S. Ventura

Comparison of Diﬀerent Metrics for ReliefF-MI

To determine which metric is more interesting, representative paradigms in MIL have been considered. The diﬀerent paradigms consider seventeen algorithms including: methods based on Diverse Density: MIDD, MIEMDD and MDD; methods based on Logistic Regression: MILR; methods based on Support Vector Machines: SMO and MISMO; distance-based Approaches: CitationKNN and MIOptimalBall; methods based on Rules: such as PART, Bagging with PART and AddaBoost with PART using MIWrapper and MISimple approach (they are diﬀerent adaptations for working with MIL); method based on decision trees: MIBoost and methods based on Naive Bayes. More information about the algorithms considered could be consulted at the WEKA workbench [15] where these techniques have been designed. Table 2. Results with Reduced Feature Data Set Algorithms

Maximal

Minimal

Average

Adapted

Eleph Tiger Fox Eleph Tiger Fox Eleph Tiger Fox Eleph Tiger Fox citationKNN MDD RepTree 1 DecisionStump 1 MIDD MIEMDD MILR MIOptimalBall RBF Kernel2 Polynomial Kernel2 AdaBoost&PART3 Bagging&PART3 PART3 SMO3 Naive Bayes3 AdaBoost&PART4 PART4 RANKING

0.750 0.725 0.825 0.825 0.755 0.725 0.815 0.795 0.765 0.765 0.830 0.830 0.830 0.705 0.655 0.800 0.770

0.830 0.810 0.870 0.800 0.780 0.775 0.855 0.740 0.835 0.825 0.840 0.850 0.815 0.815 0.820 0.855 0.795 2.676

0.615 0.620 0.655 0.655 0.600 0.530 0.600 0.575 0.615 0.620 0.615 0.585 0.580 0.660 0.590 0.570 0.620

0.745 0.710 0.840 0.820 0.750 0.685 0.840 0.765 0.800 0.780 0.830 0.810 0.815 0.715 0.675 0.840 0.765

0.850 0.800 0.845 0.785 0.780 0.720 0.825 0.715 0.865 0.825 0.825 0.865 0.830 0.835 0.815 0.830 0.730 2.520

0.630 0.600 0.665 0.695 0.645 0.605 0.630 0.495 0.655 0.685 0.745 0.595 0.615 0.675 0.650 0.600 0.670

0.745 0.710 0.840 0.820 0.750 0.685 0.840 0.765 0.800 0.780 0.830 0.810 0.815 0.715 0.675 0.840 0.765

0.840 0.790 0.865 0.800 0.780 0.745 0.840 0.735 0.830 0.830 0.820 0.860 0.810 0.830 0.825 0.840 0.740

0.610 0.605 0.700 0.660 0.595 0.530 0.615 0.525 0.655 0.665 0.620 0.610 0.570 0.655 0.585 0.560 0.660

0.745 0.705 0.840 0.830 0.755 0.715 0.835 0.775 0.785 0.770 0.840 0.830 0.835 0.705 0.660 0.830 0.775

2.794

0.815 0.805 0.855 0.805 0.770 0.770 0.875 0.740 0.855 0.820 0.860 0.865 0.840 0.820 0.820 0.845 0.780

0.615 0.660 0.710 0.700 0.695 0.615 0.635 0.535 0.650 0.655 0.665 0.605 0.620 0.690 0.680 0.650 0.665

2.010 1 MIBoost 2 MISMO

3 MIWrapper 4 MISimple

The average results of accuracy for tiger, fox and elephant data sets obtained by the algorithms using the diﬀerent proposals for ReliefF-MI are reported in Table 2. To check which metric achieves the most relevant features for diﬀerent algorithms a statistical test is carried out (Friedman test). This test is a non parametric test that compares the average ranks of the proposals considered, where the metric that achieves the highest accuracy for one algorithm is given a rank of 1, the metric with the next highest accuracy value for this algorithm has the rank of 2, and so on with all algorithms and data sets used. These ranks let us know which metric obtains the best results considering all algorithms and data sets. In this way, the metric with the value closest to 1 indicates that the algorithms considered in this study generally obtain better results using the

Reducing Dimensionality in Multiple Instance Learning with a Filter Method

41

reduced feature set it provided. The ranks obtained by each metric can be seen in Table 2. The Friedman test determines at a conﬁdence level of 95% that there are signiﬁcant diﬀerences in the results when use diﬀerent metrics because its value is 10.965 and the χ2 (n = 3) is 6.251, therefore the null hypothesis is rejected and it is obvious that there are diﬀerences between them. The Bonferroni test is carried out to determine which metric selects the most relevant features. Results of this test show that the methods with a threshold over 2.638 (conﬁdence at 95%) are considered worse proposals than the control algorithm. In this case, this algorithm would be ReliefF-MI with metric designed because it gets the lowest ranking value and therefore it is the best option. Statiscally, the worst proposals would be average Hausdorﬀ distance and a maximum of Hausdorﬀ distance which has a higher ranking than the threshold set by this test. 3.2

Eﬀectiveness of ReliefF-MI

Results obtained by algorithms when they use the reduced data set provided by ReliefF-MI using adapted Hausdorﬀ distance are compared to results using the full data set to show the eﬀectiveness of ReliefF-MI. Table 3 show the results of the seventeen algorithms for diﬀerent data sets in two cases. A statistical study is carried out to check whether the use of ReliefF-MI improves the results of algorithms with respect to results when no dimensionality reduction is done. The Wilcoxon rank sum test is used to ﬁnd whether there are or are not diﬀerences between accuracy values obtained by diﬀerent algorithms using the feature set provided by ReliefF-MI. This test is a non-parametric recommended in Demsar’s study [16]. The null hypothesis of this test maintains that there are no signiﬁcant diﬀerences between accuracy values obtained by algorithms when they use diﬀerent feature sets, while the alternative hypothesis assures that there are. Table 4 shows the mean ranks and the sum of ranks for Table 3. Results with Full Feature Data Set Algorithms citationKNN MDD MIBoost (RepTree) MIBoost (DecisionStump) MIDD MIEMDD MILR MIOptimalBall MISMO (RBF Kernel) MISMO (Polynomial Kernel) MIWrapper (AdaBoost&PART) MIWrapper (Bagging&PART) MIWrapper (PART) MIWrapper (SMO) MIWrapper (Naive Bayes) MISimple (AdaBoost&PART) MISimple (PART)

Reduced Set Eleph Tiger Fox 0.745 0.815 0.615 0.705 0.805 0.660 0.840 0.855 0.710 0.830 0.805 0.700 0.755 0.770 0.695 0.715 0.770 0.615 0.835 0.875 0.635 0.775 0.740 0.535 0.785 0.855 0.650 0.770 0.820 0.655 0.840 0.860 0.665 0.830 0.865 0.605 0.835 0.840 0.620 0.705 0.820 0.690 0.660 0.820 0.680 0.830 0.845 0.650 0.775 0.780 0.665

Full Set Eleph Tiger Fox 0.500 0.500 0.500 0.800 0.755 0.700 0.815 0.825 0.670 0.815 0.780 0.650 0.825 0.740 0.655 0.730 0.745 0.600 0.780 0.840 0.510 0.730 0.625 0.530 0.800 0.795 0.590 0.790 0.785 0.580 0.840 0.790 0.685 0.845 0.810 0.600 0.790 0.780 0.550 0.715 0.800 0.635 0.680 0.760 0.590 0.840 0.795 0.625 0.765 0.765 0.635

42

A. Zafra, M. Pechenizkiy, and S. Ventura Table 4. Sum of Ranks and Mean Rank of the two proposals Mean Rank Sum of Ranks 1

ReliefF-MI Method Not Reducting features 1

57.03 45.97

2908.50 2344.50

Adapted Hausdorﬀ Distance.

each of the two options. The scores are ranked from lowest to highest. Therefore, we can see that algorithms not using feature selection have a lower mean rank than algorithms using the ReliefF-MI method. This information can be used to ascertain a priori that ReliefF-MI is a better proposal. The results of the Wilcoxon statistical test are 2344 and the corresponding z-score is -1.888. According to these values, the results are signiﬁcant (p-value = 0.059 < 0.1) to a 90% level of conﬁdence when we reject the null hypothesis and determine that there are signiﬁcant diﬀerences between the results obtained by algorithms when they use the feature selection method. Consequently, ReliefF-MI has signiﬁcantly higher accuracy values than the option that does not use feature reduction. This conclusion is reached by noting that for ReliefFMI scores, the mean rank is higher in the algorithms using selection feature as pre-processing step (for example, at a value of 57.03) than the other option (at a value of 45.97). In general, we can conclude that the use of this method beneﬁts the results achieved by the algorithms. That is, the usefulness of this method of feature selection in MIL is demonstrated because it manages to optimize the results obtained by diﬀerent algorithms. Thus, the results with a lower number of features minimize the classiﬁcation error, and the eﬃciency of this method is shown in the case of MIL. The study considers seventeen diﬀerent algorithms and three diﬀerent data sets, so it is a representative enough study to reach this conclusion.

4

Conclusions

The task of ﬁnding relevant features for multiple instance data is largely untouched. In this learning framework, the process is more complex because the class labels of particular instances in the patterns are not available. Thus, the classic method of feature selection for traditional supervised learning with single data does not work well in this scenario. For multiple instance data, where information shows uncertainty, only it has been proposed methods related with optimizing a particular algorithm (most of them are wrapper approaches). This paper addresses the problem of feature selection to reduce the dimensionality of data in MIL in a general way for any algorithm using ﬁlter method. Thus, we describe a new eﬃcient algorithm based on ReliefF principles [7] that can be applied to continuous and discrete problems, is faster than wrapper methods and can be applied to any MIL algorithm designed previously because the method is applied as a preprocessing step. Experimental results shows the eﬀectiveness of

Reducing Dimensionality in Multiple Instance Learning with a Filter Method

43

our approach using three diﬀerent applications and seventeen algorithms with the reduced data. First, the diﬀerent metrics are compared to evaluate their effect on the algorithm developed. Results show that the new metric proposed is the metric that statistically achieves the best results. This metric is designed to be adapted to the variable information of each pattern according to the class of that pattern. So, the Wilcoxon test shows the beneﬁts of applying data reduction in MIL and obtains better results in all algorithms in general when they only work with the most relevant features. Thus, the relevance of using feature selection in this scenario is established for improving the performance of algorithms with high-dimensional data. More work that can be done in this area includes the designing of other metrics to measure the distance between bags in order to optimize the performance of this method, as well as the designing of other feature selection methods based on ﬁltering in MIL scenario to study which methods work best in this learning context.

Acknowledgements The authors gratefully acknowledge the ﬁnancial support provided by the Spanish department of Research under TIN2008- 06681-C06-03, P08-TIC-3720 Projects and FEDER funds.

References 1. Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artiﬁcal Intelligence 89(1-2), 31–71 (1997) 2. Zhang, M.L., Zhou, Z.H.: Improve multi-instance neural networks through feature selection. Neural Processing Letter 19(1), 1–10 (2004) 3. Chen, Y., Bi, J., Wang, J.: Miles: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(12), 1931–1947 (2006) 4. Yuan, X., Hua, X.S., Wang, M., Qi, G.J., Wu, X.Q.: A novel multiple instance learning approach for image retrieval based on adaboost feature selection. In: ICME 2007: Proceedings of the IEEE International Conference on Multimedia and Expo., Beijing, China, pp. 1491–1494. IEEE, Los Alamitos (2007) 5. Raykar, V.C., Krishnapuram, B., Bi, J., Dundar, M., Rao, R.B.: Bayesian multiple instance learning: automatic feature selection and inductive transfer. In: ICML 2008: Proceedings of the 25th International Conference on Machine Learning, pp. 808–815. ACM, New York (2008) 6. Herman, G., Ye, G., Xu, J., Zhang, B.: Region-based image categorization with reduced feature set. In: Proceedings of the 10th IEEE Workshop on Multimedia Signal Processing, Cairns, Qld, pp. 586–591 (2008) 7. Kononenko, I.: Estimating attributes: analysis and extension of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

44

A. Zafra, M. Pechenizkiy, and S. Ventura

8. Chevaleyre, Y.Z., Zucker, J.D.: Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem. In: Stroulia, E., Matwin, S. (eds.) Canadian AI 2001. LNCS (LNAI), vol. 2056, pp. 204–214. Springer, Heidelberg (2001) 9. Zhang, D., Wang, F., Si, L., Li, T.: M3IC: Maximum margin multiple instance clustering, pp. 1339–1344 (2009) 10. Zhang, M.L., Zhou, Z.H.: Multi-instance clustering with applications to multiinstance prediction. Applied Intelligence 31, 47–68 (2009) 11. Edgar, G.: Measure, topology, and fractal geometry, 3rd edn. Springer, Heidelberg (1995) 12. Cohen, H.: Image restoration via n-nearest neighbour classiﬁcation. In: ICIP 1996: Proceedings of the International Conference on Image Processing, pp. 1005–1007 (1996) 13. Yang, C., Lozano-Perez, T.: Image database retrieval with multiple-instance learning techniques. In: ICDE 2000: Proceedings of the 16th International Conference on Data Engineering, Washington, DC, USA, pp. 233–243. IEEE Computer Society, Los Alamitos (2000) 14. Pao, H.T., Chuang, S.C., Xu, Y.Y., Fu, H.: An EM based multiple instance learning method for image classiﬁcation. Expert Systems with Applications 35(3), 1468–1472 (2008) 15. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) 16. Demsar, J.: Statistical comparisons of classiﬁers over multiple data sets. Journal of Machine Learning Research 17, 1–30 (2006)

Graphical Exploratory Analysis of Educational Knowledge Surveys with Missing and Conflictive Answers Using Evolutionary Techniques Luciano S´ anchez , In´es Couso, and Jos´e Otero Computer Science and Statistics Departments, Universidad de Oviedo, Campus de Viesques s/n Gijon (Spain) {luciano,couso,jotero}@uniovi.es

Abstract. Analyzing the data that is collected in a knowledge survey serves the teacher for determining the student’s learning needs at the beginning of the course and for ﬁnding a relationship between these needs and the capacities acquired during the course. In this paper we propose using graphical exploratory analysis for projecting all the data in a map, where each student will be placed depending on his/her knowledge proﬁle, allowing the teacher to identify groups with similar background problems, segment heterogeneous groups and perceive the evolution of the abilities acquired during the course. The main innovation of our approach consists in regarding the answers of the tests as imprecise data. We will consider that either a missing or unknown answer, or a set of conﬂictive answers to a survey, is best represented by an interval or a fuzzy set. This representation causes that each individual in the map is no longer a point but a ﬁgure, whose shape and size determine the coherence of the answers and whose position with respect to its neighbors determine the similarities and diﬀerences between the students. Keywords: Knowledge Surveys, Graphical Exploratory Analysis, Multidimensional Scaling, Fuzzy Fitness-based Genetic Algorithms.

1

Introduction

Knowledge surveys comprise short questions that students can answer writing a single line, or choosing between several alternatives in a printed or web-based questionnaire [5]. These surveys can be used for assessing the quality of the learning and they are also meaningful from a didactical point of view. On the one hand, they allow students to perceive the whole content of the course [6]. On the other hand, teachers can use these surveys for deciding the best starting level for the lectures, specially in Master or pre-doctoral lectures [10], where the

This work was funded by Spanish M. of Education, under the grant TIN2008-06681C06-04.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 45–52, 2010. c Springer-Verlag Berlin Heidelberg 2010

46

L. S´ anchez, I. Couso, and J. Otero

proﬁles of the students attending the same course are much diﬀerent. Recently this has also been applied to teacher education and certiﬁcation [11]. When the survey is done at the end of the course, the eﬀectivity of the teaching methodology along with the attitude and dedication of the students is measured. There is certain consensus in the literature in that the relationship between methodology/dedication and scoring is weak [2]. Because of this, a survey (diﬀerent than an exam, designed to score the students) is needed. Finally, elaborating the survey serves by itself to establish the course contents, timeline and the teaching methodology [9]. In this context, this paper is about graphically analyzing the data that is collected in a knowledge survey. We intend to determine the student’s learning needs at the beginning of the course and also to ﬁnd a relationship between these needs and the capacities acquired during the course. To this end, we propose projecting the data in a map, where each student will be placed according to his/her knowledge proﬁle, allowing the teacher to identify groups with similar background problems, segmenting heterogeneous groups and showing the evolution of the abilities acquired during the course. This is not a new technique by itself, since these statistical methods (and, generally speaking, intelligent techniques) for analyzing questionnaires and surveys are part of the common knowledge [9]. Moreover, the proliferation of free data mining software (see, for instance [1]) has driven many advances in the application of Artiﬁcial Intelligence in educational contexts [7]. Indeed, there exist some tools that can generate views of the aforementioned data for easily drawing conclusions and making predictions about the course eﬀectivity. The innovation of our approach is not in the use of graphical techniques but in extending them to data that is possibly incomplete or imprecise. This is rarely done when analyzing surveys and as a matter of fact the extension of graphical exploratory analysis to low quality data-based problems is very recent [3,4]. As far as we know, these last techniques have not yet been applied in an educational context. Notwithstanding, we believe that their use will make possible to solve better two frequent problems: the situation where the student does not answer some questions of the survey and the cases where there are incompatible answers that might have been carelessly answered. Within our approach, we will consider that a missing or unknown answer in the survey is best represented by an interval. For instance, if the answer is a number between 0 and 10, an unanswered question will be associated with the interval [0,10]. We will not try to make up a coherent answer for the incomplete test, but we will carry the imprecision in all the calculations. In turn, an incoherent set of answers will also be represented by an interval. For instance, assume that the same question is formulated in three diﬀerent ways (this can be done for detecting random answers to the tests) and the student answers incoherent results. Let {6, 2, 4} be the diﬀerent answers to the question. With our methodology, instead of replacing this triplet by its mean, we will say that the answer is an unknown number in the range [2, 6] (the minimum and the maximum of the answers).

Graphical Analysis of Surveys with Conﬂictive Answers

47

Using intervals for representing unknown values produces that each individual in the map is no longer a point but a ﬁgure, whose shape and size determine the coherence of the answers and whose relative position determines the similarities between it and the other students. In this paper we will explain how this map can be generated with the help of interval (or fuzzy) valued ﬁtness function-driven genetic algorithms. We will also show the results of this new analysis in three actual surveys, answered by Spanish engineering and pre-doctorate students. The structure of this paper is as follows: in Section 2 we introduce Graphical Exploratory Analysis for vague data and its relation with knowledge surveys. In the same section we explain an evolutionary algorithm for computing these maps, and in Section 3 we show the results of this method in three real-world cases. The paper concludes in Section 4.

2

Graphical Exploratory Statistics

There are many diﬀerent techniques for performing graphical exploratory analysis of data: Sammon maps, Principal Component Analysis (PCA), Multidimensional Scaling (MDS) , self-organized maps (SOM), etc. [3]. These methods project the instances as points in a low dimensional Euclidean space so that their proximity reﬂects the similarity of their variables. However, we have mentioned that the surveys can be incomplete or they possibly contain conﬂicting answers, and also that an incomplete survey can be taken as the set of all surveys with any valid value in place of the missing answer. Observe that, in that case, the projection will not be a point but a shape whose size will be larger as the more incomplete or imprecise the survey is. This extension from a map of points to a map of shapes has already been done for some of the techniques mentioned before. For instance, Fuzzy MDS, as described in [3,4], extends MDS to the case where the distance matrix comprises intervals or fuzzy numbers, as happens in our problem. Crisp MDS consists in ﬁnding a low-dimensional cloud of points that minimizes an stress function. That function measures the diﬀerence between the matrix of distances between the data and the matrix of distances between this last cloud. The interval (or fuzzy) extension of this algorithm deﬁnes an interval (fuzzy) valued stress function that bounds the diﬀerence between the imprecisely known matrix of distances between the objects and the interval (fuzzy) valued distance matrix between a set of shapes in the low-dimensional projection. Let us assume for the time being that the distance between two surveys is an + interval. For two imprecisely measured multivariate values xi = [x− i1 , xi1 ] × . . . × − + − + − + [xif , xif ] and xj = [xj1 , xj1 ] × . . . × [xjf , xjf ], with f features each, the set of distances between their possible values is the interval Dij =

f

k=1 (xik

− xjk

)2

| xik ∈

+ [x− ik , xik ], xjk

∈

+ [x− jk , xjk ], 1

≤k≤f

.

(1)

Some authors have used a distance similar to this before [4], and further assumed that the shape of projection of an imprecise case is a circle. We have found that,

48

L. S´ anchez, I. Couso, and J. Otero

xj

− Rij

xi + Rij

Fig. 1. The projected data are polygons deﬁned by the distances Rij in the directions that pairwise join the examples

in our problem, this last is a too restrictive hypothesis. Instead, we propose to approximate the shape of the projections by a polygon (see Figure 1) whose + − radii Rij and Rij are not free variables, but depend on the distances between the cases. For a multivariate set of imprecise data {x1 , . . . , xN }, let xi be the crisp centerpoint of the imprecise value xi (the center of gravity, if an interval, or the modal point, if fuzzy), and let {(z11 , . . . , z1r ), . . . , (zN 1 , . . . , zN r )} be a crisp + projection, with dimension r, of that set. We propose that the radii Rij and − Rij depend on the distance between xi and xj (see Figure 2 for a graphical explanation) as follows + Rij

= dij

+ δij −1 δij

− Rij

= dij

δij −1 − δij

(2)

r + 2 where dij = k=1 (zik − zjk ) , δij = {D(xi , xj )}, δij = max{D(xi , xj )}, and − δij = min{D(xi , xj )}. We also propose that the value of the stress function our map has to minimize is N N

− − + dH (Dij , [dij − Rij − Rji , dij + Rij + Rji ]+ )2

(3)

i=1 j=i+1

where dH is the Haussdorﬀ distance between intervals. 2.1

Characteristic Points

We also propose adding several prototypic surveys (we will call them “characteristic points”) corresponding to a survey without mistakes, a completely wrong survey, one section well answered but the remaining ones wrong, etc. With the help of these points, the map can be used for evaluating the capacities of a student by comparing it with its closest characteristic point.

Graphical Analysis of Surveys with Conﬂictive Answers

49

− δij = min D(xi , xj )

xj δij = D(xi , xj )

xi

+ δij = max D(xi , xj )

zj

zi + Rij

− Rij

dij

− Rji

+ Rji

− − Fig. 2. The distance between the projections of xi and xj is between dij − Rij − Rji + + and dij + Rij − Rji

2.2

Evolutionary Algorithm

An evolutionary algorithm is used for optimizing the stress function and searching the best map. In previous works we have shown that interval and fuzzy ﬁtness functions can be optimized with extensions of multiobjective genetic algorithms. In this paper we have used the extended NGSA-II deﬁned in [8], whose main components are summarized in the following paragraphs. Coding scheme. Each individual of the population represents a set of coordinates in the plane, thus each chromosome consists of the concatenation of so many pairs of numbers as students, plus one pair for each characteristic point (i.e. “Everything”, “Nothing”, “Only Subject X”, “Every Subject but X”, etc). The chromosome is ﬁxed-length, and real coding is used. Objective Function. The ﬁtness function was deﬁned in eq. (3). Evolutionary Scheme. A generational approach with the multiobjective NSGAII replacement strategy is considered. Binary tournament selection based on the crowding distance in the objective function space is used. The precedence operator derives from the bayesian coherent inference with an imprecise prior, the dominated sorting is based on the product of the lower probabilities of precedence, and the crowding in based on the Hausdorﬀ distance, as described in [8].

50

L. S´ anchez, I. Couso, and J. Otero

Genetic Operators. Arithmetic crossover is used for combining two chains. The mutation operator consists in performing crossover with a randomly generated chain.

3

Results

In this section we will illustrate, with the help of three real-world datasets, how to identify groups of students and how to stack two maps from the same individuals at diﬀerent times, for showing the temporal evolution of the learning. 3.1

Variation of Individual Capacities in the Same Group and between Groups

7

In the left part of Figure 3 a diagram for 30 students of subject “Statistics” in Ingenieria Telematica at Oviedo University, taken at the beginning of the 20092010 course is shown. This survey is related to students’ previous knowledge in other subjects. In particular, this survey evaluates previous knowledge in Algebra (A), Logic (B), Electronics (C), Numerical Analysis (D), Probability (E) and Physics (F). The positions of the characteristic points have been marked with labels. Those points are of the type “A” (all the questions about the subject “A” are correct, the others are erroneous) “NO A” (all the questions except “A” ones are correct, the opposite situation), etc. In the right part of Figure 3 we have plotted together the results of three diﬀerent groups, attending lectures by the same teacher. Each intensiﬁcation

ONLY C

ONLY C

6

8

ONLY D ONLY D ONLY E

NOTHING

NOTHING NO A

6

5

ONLY A ONLY F

NO D

4

4

ONLY B

NO A 2

2

3

ONLY B

NO D

NO E

1

0

ONLY A NO C B NO EVERYTHING

EVERYTHING NO C

NO F 0

NO B

0

2

4

6

8

0

2

4

6

Fig. 3. Left part: Diﬀerences in knowledge of Statistics for students in Ingenieria Telematica. Right part: Diﬀerences in knowledge about Computer Science between the students of Ingenieria Tecnica Industrial specialized in Chemistry, Electricity and Mechanics.

Graphical Analysis of Surveys with Conﬂictive Answers

51

has been coded with a distinctive colour. This teacher has evaluated, as before, the initial knowledge of the students in subjects that are a prerequisite. From the graphic in that ﬁgure the most relevant fact is that the students of the intensiﬁcation coded in red (Ingenieria Industrial) consider themselves better prepared than those coded in blue (Ingeniera Tecnica Industrial Electrica), with the green group in an intermediate position, closer to red (Ingeniera Tecnica Industrial Quimica). All the students of all the groups have a neutral orientation to math subjects, and some students in the blue group think that their background is adequate only in subjects C (Operating Systems) and D (Internet). 3.2

Evaluation of Learning Results

Ten pre-doctoral students in Computer Science, Physics and Mathematics attending a research master were analyzed. The background of these students is heterogeneous. In the survey the students were asked about 36 subjects classiﬁed in “Control Algorithms” (A), “Statistical Data Analysis” (B), “Numerical Algorithms” (C) and “Lineal Models” (D). At the top of the ﬁgure 4 we can see that there is a large dispersion between the initial knowledges. Since the subject had strong theoretic foundations, students from technical degrees like Computer Science evaluated themselves with the lowest scores (shapes in the right part of each ﬁgure). NO A

NO A 6

EVERYTHING e7

2

8

NO B

0

e8 e6

1

2

b9 b2

ONLY C

b5

NOTHING

4

5

-1

0

1

2

ONLY C

b5 b3

NOTHING

ONLY A

ONLY D 3

ONLY B

b8 b6b10

NO B

ONLY A

ONLY D 0

b8 e5 e10 b6b10 e3

NO D

ONLY B

b3

0

NOTHING

ONLY A

-1

b9

NO B

3

b1 b4

4

e2b2

ONLY C

5

e9

2

2

6 10

b7

NO D

ONLY B

2

9

b7 e1 e4 b1 b4

4

4

NO D

EVERYTHING

0

7 1 4

NO A

NO C

6

NO C

6

NO C EVERYTHING

ONLY D 3

4

5

-1

0

1

2

3

4

5

Fig. 4. Evolution of the learning of pre-doctoral students. Left part: Initial survey. Center: superposition of initial and ﬁnal maps. Right part: The displacement has been shown by arrows.

The same survey, at the end of the course, shows that all the students moved to the left, closer to characteristic point “EVERYTHING”. Additionally, the displacement has been larger for the students in the group at the right. This displacement can be seen clearly in the right part of the same ﬁgure, where the shapes obtained from the ﬁnal survey were replaced by arrows that begin in the initial position and end in the ﬁnal center. The length of the arrows is related with the progress of the student during the course.

52

4

L. S´ anchez, I. Couso, and J. Otero

Conclusions

In this work we have extended with the help of a fuzzy ﬁtness-driven genetic algorithm the Multidimensional Scaling to imprecise data, and exploited the new capabilities of the algorithm for producing a method able to process incomplete or carelessly ﬁlled surveys that include conﬂictive answers. The map of a group of students consists on several shapes, whose volume measures the degree to which a survey lacks consistency. We have shown that these maps can help detecting heterogeneous groups and can also be used for assessing the results of a course.

References 1. Alcala-Fdez, L., et al.: KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. Soft Computing 13(3), 307–318 (2009) 2. Cohen, P.A.: Student ratings of instruction and student achievement: a metaanalysis of multisection validity studies. Review of Educational Research 51(3), 281–309 (1981) 3. Denoeux, T., Masson, M.-H.: Multidimensional scaling of interval-valued dissimilarity data. Pattern Recognition Lett. 21, 83–92 (2000) 4. Hebert, P.A., Masson, M.H., Denoeux, T.: Fuzzy multidimensional scaling. Computational Statistics and Data Analysis 51, 335–359 (2006) 5. Knipp, D.: Knowledge surveys: What do students bring to and take from a class? United States Air Force Academy Educator (Spring 2001) 6. Nuhfer, E.: Bottom-Line Disclosure and Assessment. Teaching Professor 7(7), 8–16 (1993) 7. Romero, C., Ventura, S., Garca, E.: Data mining in course management systems: Moodle case study and tutorial. Computers & Education 51(1), 368–384 (2008) 8. Sanchez, L., Couso, I., Casillas, J.: Modeling vague data with genetic fuzzy systems under a combination of crisp and imprecise criteria. In: MCDM 2007, Honolulu, Hawaii, USA (2007) 9. Wirth, K., Perkins, D.: Knowledge Surveys: The ultimate course design and assessment tool for faculty and students. In: Proceedings: Innovations in the Scholarship of Teaching and Learning Conference, April 1-3, p. 19. St. Olaf College/Carleton College (2005) 10. Nagel, L., Kotz, T.: Supersizing e-learning: What a CoI survey reveals about teaching presence in a large online class. The Internet and Higher Education (2009) 11. Zeki Saka, A.: Hitting two birds with a stone: Assessment of an eﬀective approach in science teaching and improving professional skills of student teachers. Social and Behavioral Sciences 1(1), 1533–1544 (2009)

Data Mining for Grammatical Inference with Bioinformatics Criteria Vivian F. L´ opez, Ramiro Aguilar, Luis Alonso, Mar´ıa N. Moreno, and Juan M. Corchado Departamento Inform´ atica y Autom´ atica, University of Salamanca, Plaza de la Merced S/N, 37008. Salamanca {vivian,ramiro,luis,maria,corchado}@usal.es

Abstract. In this paper we describe both theoretical and practical results of a novel data mining process that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics to generate grammatical structures of a speciﬁc language. We used an application of a compilers generator system that allows the development of a practical application within the area of grammarware, where the concepts of the language analysis are applied to other disciplines, such as Bioinformatic. The tool allows the complexity of the obtained grammar to be measured automatically from textual data. A technique of incremental discovery of sequential patterns is presented to obtain simpliﬁed production rules, and compacted with bioinformatics criteria to make up a grammar. Keywords: Grammatical Inference, Bioinformatic, Free Context Grammar, DNA, sequential patterns.

1

Introduction

In recent years many approaches have been introduced as data mining methods for pattern recognition in biological databases. Bioinformatics employs computational and data processing technologies to develop methods, strategies and programs that permit the immense quantity of biological data that has been generated and is currently being generated to be handled, ordered and studied. To this aim, the computational linguistics has received considerable attention in bioinformatics. The study in [18] indicated that a relation exists between formal languages theory and DNA, the linguistic view of DNA sequences being a rich source of ideas to model strings with correlated symbols. Most of the work [7,8] has involved examinations of the occurrences of ”words” in DNA. Searls et al [17] found that such a linguistic approach proves useful not only in theoretical characterization of certain structural phenomena in sequences, but also in generalized pattern recognition in this domain, via parsing. The information represented on sequences involves grammatical inference for pattern recognition. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 53–60, 2010. c Springer-Verlag Berlin Heidelberg 2010

54

V.F. L´ opez et al.

In this work a novel data mining process is described that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics to generate grammatical structures of a speciﬁc language. Subsequently, these structures are converted to Context-Free Grammars (CFG). Initially the method applies to Context-Free Languages with the possibility of being applied to other languages: structured programming, the language of the book of life expressed in the genome and proteome and even the natural languages. We used an application of the compiler generator called GAS 1.0 system [11], which represents an Integrated Development Environment (IDE) that allows a practical application the developed within the area for the automatic generation of language-based tools that starts from the traditional solutions and facilitates the use of formal language theory in other disciplines: Grammar-Based Systems (GBSs) [12]. The tool allows the complexity of the obtained grammar to be measured automatically from textual data. 1.1

Problem of Grammatical Inference

Grammatical Inference (GI) crosses a number of ﬁelds including machine learning, formal language theory, syntactic and structured pattern recognition, computational biology, speech recognition, etc. [6]. The problem of GI is the learning of a language description from language data. The problem of context free language inference involves practical and theoretical questions. Practical aspects include pattern recognition; an approach of pattern recognition is the CFG inference that builds a set of patterns [4].

2

Data Mining Procedure for the Grammatical Inference

The idea considers the experiences acquired [1], the literature and the existing theories [13,9,14], in order to process data that are not structured in relations or tables with diﬀerentiated attributes but are codiﬁed as a ﬁnite succession of sentences. The data mining procedure has the following phases: 1. Language generation by means of an CFG. This language will be the data source. 2. Codiﬁcation of the strings of the language regarding its syntactic categories. 3. Dispensing with the initial grammar, Discovery of Sequential Patterns (DSP) [3] in the codiﬁed language. This discovery, called incremental, is a combination of the operation of the DSP and of the operation of the search for identical sequences. With this, patterns of sequences will be found and will be replaced by an identiﬁer symbol. 4. Replace the discovered sequences by their identiﬁers. This way the identiﬁer is stored and the sequence is stored as a production rule. 5. Repeat the two previous steps until all the sentences of the language are replaced by identiﬁers.

Data Mining for Grammatical Inference with Bioinformatics Criteria

2.1

55

Language Generation

We consider the CF G, Gæ proposed in [9] in the generation of arithmetic expressions; the majority of the programming languages are generated by grammars of this type. We can modify the formalism of this CF G in the following way (Table 1) to add a new syntactic construct to specify and search DNA patterns in data. A DNA molecule can then be represented as a ﬁnite string of symbols from this alphabet; a language, formally, is any set of such string [17]. The new grammar created Gæ does not change in essence the character of the original grammar. Table 1. Modiﬁcation of the Gæ Grammar Gæ Grammar Gæ = (N, T, P, S) N = {Exp, N um, Dig, Op} T = {0, 1, +, ∗} P : Exp → Exp Op Exp| (Exp) | N um N um → Dig + Dig → 0|1 Op → +|∗

S = Exp

Gæ Grammar(modif ied) Gæ = (N , T , P, S) N = {E, d, b, o, a, c} T = {0, 1, +, ∗, (, )} P : E → E o E| aEc | n d → b+ b → 0|1 o → +|∗ a→( c →) S=E

With the previous criteria, a sample of the language generated by Gæ can be seen in the Figure 1, point (i). It is noted that each line corresponds to a sentence accepted by the grammar.

Fig. 1. Language of arithmetic expressions on which its grammar is inferred

56

2.2

V.F. L´ opez et al.

Language Codification

Considering the language that is generated with Gæ , all the symbols of T can be codiﬁed with the symbols of N . For this particular case the symbols to be used are {b, o, a, c} as syntactic categories. See the Figure 1, point (ii). 2.3

Incremental Discovery of Sequential Patterns and Associations

The hybrid discovery of sequential patterns applied to codiﬁed languages seeks key subsequences in the sentences of the language. Each subsequence q has a length Wq that indicates the number of symbols. In this particular case 1 ≤ Wq ≤ 5 and Q is deﬁned as a string of length WQ . By convention, in the codiﬁed language many sentences exist that make up the population of the language. The idea consists of ﬁnding subsequences, identifying them with a symbol and replacing the appearances of the subsequences in the sentences of the population with that symbol, repeating the procedure until each sentence is identiﬁed by a single symbol (see Figure 2). The detailed steps of the general algorithm can be consulted in [2]. With the previous procedure production rules are generated that recognize the sentences of the language. The production rules number can be considerable, so we apply a particular method of simpliﬁcation.

Fig. 2. Hybrid discovery of sequential patterns for the context-free languages

3 3.1

Experiments Rule Similarity

Considering the language Læ of arithmetic expressions, by applying the hybrid algorithm of DSP the production rules of Figure 3 were obtained. With the right hand rules, which form the sequential patterns of the language, a substitution

Data Mining for Grammatical Inference with Bioinformatics Criteria

57

Fig. 3. Production rules generated and some iterations in their simpliﬁcation

matrix is computed, that shows the similarity values between terminal symbols. Similarity between a pair of consecutive symbols is related to the frequency of the symbols in the language (like the BLOSUM matrix) [5]. Subsequently, it is possible to make alignments among those sequences compacting them. In the substitution matrix m(i, j) each row i and each column j corresponds to a nonterminal symbol of the production rules generated. The symbols are ordered by frequency, this is, ﬁrst d, A, C, o and so on. For Læ it generated 19 symbols A, B, ..., S that join with the symbols of the codiﬁcation d, o, c and a they make up 23 non-terminal symbols (in the bioinformatics context, the symbols would correspond to the amino acids). The values of the matrix denote the importance of the alignment among the non-terminal symbols; for example, m(d, d) = 23 denotes a degree of high similarity between both symbols; m(d, A) = −1 denotes a degree of similarity of −1. 3.2

Rules Simplification and Compaction

With the right hand parts of the productions rules (where the ﬁrst rules generated have greater importance) we search for similar sequences to compact them. The detailed steps of the algorithm can be consulted in [2] For example, for the language Læ the rules dod and aAc are not similar since m(dod,aAc) f (dod, aAc) = 0 since m(dod,dod) = −10−2−11 23+20+23 = −0.35 is not greater than 0.40. Nevertheless, the rules dod and doF are similar since, m(dod,doF) = 23+20−6 = 37 = 0.56. m(dod,dod) 23+20+23 66

58

V.F. L´ opez et al.

Fig. 4. Simpliﬁcation and compaction of the production rules generated

This way, the generated rules are simplifed and compacted iteratively (Figures 3 and 4) until a grammar Gæ is built. The grammar Gæ is described in Table 2.

4

Practical Results Using GAS 1.0

GAS 1.0 [11] provides the basis to create new components in application ﬁelds possibly diﬀerent from the traditional ones, more precisely in data mining for discovery of biological data. Following this aim, the languages like Læ created with the grammar Gæ were considered, for automatic design methods to generate analyzers and/or language translators that facilitate this task of parsing a string. In this respect, we used the compiler generator GAS 1.0 to automatically generate a scanner and a parser for the language speciﬁcation. Taking as input the grammar speciﬁcation Gæ , the syntactic analysis tables are created, giving as a result a Decorated Abstract Syntax Tree (DAST), or the syntactic error. The DAST reﬂects the grammar rules applied and gives a kind of structural description of grammatical features in the input string, exactly the kind of output that is desired in describing certain biological sequence data [17]. We used a measurement of complexity to the structure of the grammars that will provide a method to evaluate the syntactic framework of the grammars Gæ . The deﬁnition of the measurement comes from the concepts described in [10], which will be used in the objective evaluation of the quality of the grammars: – Number of non-terminals: This allows the size of a CFG to be measured, by applying fine degree metric whose use in the evaluation of the complexity of programs is focused on the number of procedures.

Data Mining for Grammatical Inference with Bioinformatics Criteria

59

Table 2. The generated Grammar Gæ

Gæ = (N , T , P , S ) N = {S, R, E, D, B, A, d, b, o, a, c} T = {0, 1, +, ∗, (, )} P : S → R|E|D|B|A|d R → DoA E → Cd|CB|CE|CA|CD D → BoB|BoE|BoD C → Ao B → aAC|adc A → dod|doB d → b+ b → 0|1 o → +|∗ a→( c →) S =S

– Ciclomatic complexity: In [15] and [16] the complexity of McCabe is deﬁned, or ciclomatic complexity V of a ﬂow graph G. This complexity is deﬁned as follows: V (G) = A − N + 2 where A the number of edges of the ﬂow graph and N the number of nodes. In the tool these metrics have been implemented. The measurement of the number of non-terminal elements is trivial. With regard to the ciclomatic complexity, in order to ease its understanding the associated graph is constructed. Our approach conﬁrms the idea that the grammar complexity has been applied suc cessfully. For example for the grammars Gæ , the complexity is on the order of 9, which represents the minimum of all complexity of the sequences computed. Its high values conﬁrm that the obtained grammars are good, and they can oﬀer the best results for the analysis of biosequences, providing suﬃcient discrimination.

5

Conclusions

In the experiments, a language Læ generated by predetermined CFG Gæ is considered. But later none of the properties of that grammar were utilized to gen erate the set of production rules that then made up the grammar Gæ . We have proposed a new method of automatic generation of syntactic categories on a codiﬁed language. The approach can be extended to the processing of data that are believed to have a grammatical structure that could be automatically generated. The algorithm can be applied in diﬀerent ﬁelds, and we can imagine ﬁnding something similar for the analysis of biosequences or for the natural languages. The IDE attenuates the complexity of the design of the grammar speciﬁcation, improves the quality of the obtained product and sensibly diminishes the development time and cost. We tried to reduce the learning time for not-expert users

60

V.F. L´ opez et al.

in the area of compiler generation. The tool allows us measure the complexity of the obtained grammar automatically from textual data.

References 1. Aguilar, R.: Miner´ıa de datos. Fundamentos, t´ecnicas y aplicaciones. Salamanca University, Salamanca (2003) 2. Aguilar, R.: Descubrimiento incremental y alineaci´ on de patrones secuenciales en inferencia gramatical. Thesis for the Degree of Doctor in Computer Science. Salamanca University, Spain (2005) 3. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining. MIT Press, Cambridge (1996) 4. Fu, K.S.: Syntactic methods in pattern recognition. Academic Press, London (1974) 5. Henikoﬀ, S., Henikoﬀ, J.G.: Amino acid substitution matrices from protein blocks. Proc. National Academic Science 89, 10915–10919 (1992) 6. Higuera, C.: A bibliographical study of grammatical inference. Pattern Recognition (2004) 7. Jim´enez-Montao, M.A., Feistel, R., Diez - Mart´ınez, O.: On the information hidden in signals and macromolecules I. Symbolic time-series analysis (2003) 8. Jim´enez-Montao, M.A., Ortiz, R., Ramos, A.: Alfabetos reducidos para la compactaci´ on de secuencias de prote´ınas empleando m´etodos de miner´ıa de datos (2003) 9. Louden, K.C.: Compiler construction. Principles and practice. International Thomsom Publishing Inc. (1997) 10. L´ opez, V., Alonso, L., Moreno, M., Aguilar, R.: Aplicaci´ on de las m´etricas de calidad del software en la evaluaci´ on objetiva de gram´ aticas independientes de contexto inferidas. In: Moreno, M.N., y Garc´ıa, F.J. (eds.) Actas del I Simposio Avances en Gesti´ on de Proyectos y Calidad del Software, Salamanca, pp. 209–220 (2004) 11. L´ opez, V., S´ anchez, A., Alonso, L., Moreno, M.N.: A tool to create grammar based systems. In: Corchado, J.M., et al. (eds.) DCAI 2008. ASC, vol. 50, pp. 338–346. Springer, Heidelberg (2009) 12. Mernik, M., Crepinsek, M., Kosar, T., Rebernak, D., umer, V.: Grammar-Based systems: deﬁnition and examples. Univerty of Maribor (2004) 13. Mitra, S., Acharya, T.: Data mining. Multimedia, soft computing and bioinformatics. John Wiley and sons, Chichester (2003) 14. Moreno, A.: Lingu´ıstica computacional. Editorial S´ıntesis, Madrid (1998) 15. Piattini, M., Calvo-Manzano, J., Cervera, J., Fern´ andez, L.: An´ alisis y dise˜ no detallado de aplicaciones inform´ aticas de gesti´ on: una perspectiva de Ingenier´ıa del Software. Edit. Ra-Ma. Madrid (2004) 16. Pressman, R.S.: Ingenier´ıa del software, un enfoque pr´ actico. Quinta edici´ on. Edit. McGraw-Hill, Madrid (2002) 17. Searls, D.B., Dong, S.: A syntactic pattern recognition system for DNA sequences. In: Proc. 2nd Intl. Conf. on Bioinformatics, Supercomputing, and Complex Genome Analysis (1993) 18. Searls, D.B., et al.: Formal language theory and biological macromolecules (1999)

Hybrid Multiagent System for Automatic Object Learning Classification Ana Gil, Fernando de la Prieta, and Vivian F. López University of Salamanca, Computer Science Dpt, Plaza de la Merced s/n, 37007Salamanca, Spain {abg,fer,vivian}@usal.es

Abstract. The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of learning object metadata, which provides learners in a web-based educational system with ubiquitous access to multiple distributed repositories. This article presents a hybrid agent-based architecture that enables the recovery of learning objects tagged in Learning Object Metadata (LOM) and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives. Keywords: learning object metadata, learning object repositories, federated search, e-learning, emerging e-learning technologies, neural networks.

1 Introduction One of the most widely accepted approaches within the context of e-learning is based on fragmenting the content into modular and self-contained units that can be reused in different environments and by different applications. The term learning objects (LO) is used to refer to these units. A LO can be considered a digital resource that is particularly apt for forming part of a course or other type of learning experience. One of the characteristics of LOs is that, by adding metadata to resources, they can be more easily managed. This means that metadata are created independently from the resource to which they are joined, in order to turn them into LOs. LOs are placed inside repositories so that they can be more easily stored and retrieved. The LO Repositories (LOR) are software systems that can store metadata either alone or together with educational resources Generally LORs provide some type of interface that allows for the recovery of LOs. Any interaction involved in the recovery of LOs can be carried out manually or automated across different software, such as an Agent architecture, or even by treating the LOs as Web Semantic Services. LORs have a high degree of heterogeneity in their characterizations with the coexistence of different standards and definitions. This implies a need to formalize the common repositories architecture while making them more flexible. Additionally, LOs have the possibility of being stored with different metadata formats addressing different types of conceptualizations. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 61–68, 2010. © Springer-Verlag Berlin Heidelberg 2010

62

A. Gil, F. de la Prieta, and V.F. López

This paper is structured as follows: section 2 explains the main concepts and characteristics that establish learning objects as the fundamental base within the current context of web-based e-learning. Section 3 introduces the proposed multi-agent architecture and the mechanism applied to retrieve LOs from different repositories using a Service Oriented Architecture (SOA). These LOs will be processed according to certain classification criteria that have been personalized and are considered most appropriate for the user. We conclude with section 4, which explains some of the more relevant aspects and work in progress.

2 The Actual Context of the E-Learning 2.1 Learning Objects The concept of learning objects has evolved into a central component within the current context of e-learning with web-based learning technology. The Learning Technology Standards Committee (LTSC) from the Institute of Electrical and Electronics Engineers defines a learning object as “any entity, digital or non-digital, which can be used, re-used or referenced during technology supported learning”. The generality of the definition can give way to practically any educational resource being considered a LO. As a result, the IEEE´s definition has been severely criticized since it does not clearly distinguish or identify what an LO actually is. By not agreeing on a universally accepted definition for a learning object, there has been a proliferation of ideas to define and delimit the boundaries of the concept [1, 2, 3, 4]. (Chiappe et al., 2007) recently described[5] a learning object as a digital, selfcontained and reusable entity with a clearly instructional content, containing at least three internal and editable components: content, learning activities, and elements of context. Additionally, learning objects should have an external information structure, the metadata, which can facilitate its identification, storage and retrieval. Given all of these possible definitions, it is possible to arrive at a certain consensus regarding LOs: they must be a minimal content unit (self-contained) that intends to teach something (instructional purpose) and can be reused (resability) on different platforms without any compatibility problems. It is essential that the LOs contain information that allows them to be searched for and identified through automatic recovery techinques that facilitate the task for which they were created, enabling a single object to be used at a low cost in different levels and educational disciplines. Existing standards and specifications about learning objects focus on facilitating the search, evaluation, acquisition, and reuse of learning objects so that they can be shared and exchanged across different learning systems. The most notable standards used for tagging LO with metadata are Dublin Core [6], MPEG-7[7] and, most importantly, Learning Object Metadata(LOM)[8]. Since 2002, the IEEE LOM has been the standard for specifying the syntaxis and semantics of learning object metadata. It uses a hierarchical structure that is commonly coded in XML, and includes element names, definitions, data types, taxonomies, vocabularies, and field lengths. LOM is focused on the minimal set of attributes needed to allow these learning objects to be managed, located and evaluated. LOM metadata descriptions support version management and maintenance,

Hybrid Multiagent System for Automatic Object Learning Classification

63

resource storage and recovery (searching, location, instantiation, packaging, editing,etc.) and resource sharing. 2.2 The Learning Objects Repositories In an attempt to facilitate its reusability, LOs are stored in public and private LOR. The previously mentioned LOR are highly heterogeneous, each with a different storage system, access to objects, query methods, etc. The heterogeneity is not in and of itself a problem, since there are currently different systems that are interoperable. [9]. One of the most important systems that has been increasingly used as an interface is SQI (Simple Query Interface) [10], which was normalized by CEN in 2005[11] and has been well defined by three APIs: Learning Objects Interoperability Framework, Authentication and Session Management and Simple Query Interface Specification.

Fig. 1. Learning Objects Interoperability Framework

SQI is an abstraction level between the internal logic of a repository and the different external client; it is a middleware defined generally enough to be used in different fields, independent of technology and protocol. However, this definition is not only generic on a technical level, but on a conceptual level, allowing different types of queries (synchronous and asynchronous) and user requests. Even more important is the fact that it does not define any specific query language or LO packaging. The basic functioning of a SQI interface is trivial; it is based on web services through which a client queries a LOR, usually in Very Simple Query Language (VSQL) [10] or Prolean Query Languaje (PLQL)[12]. The LOR then returns the LOs, usually packaged according to the LOM standard. This simple concept gave way to the birth of new types of applications dedicated to a federated search for learning objects in repositories. This software is used to perform simultaneous queries in different repositories, allowing a greater interoperability and, as a result, a better reusability of the resources where they are stored. As a result of these search applications, the topology of LO search systems has changed drastically. Figure 2 provides a graphical representation of the following classification for search systems:

64

A. Gil, F. de la Prieta, and V.F. López

• • • •

Autonomous repositories. Those that do not have a system allowing external searches and, as a result, require manual searches. Repositories. Those that have an external search interface and can be included in an automatic search system. Repositories with Federated search system. Those that, in addition to performing internal searches, can also perform automatic searches in other repositories. Federated search systems. Systems that can perform federated searches in different repositories; have the advantage of being able to perform filtering, cataloguing, etc. [13].

Fig. 2. Topology of Learning Object Repository

Because of continual research in search systems, the ability to create standardized and interoperability processes that can be applied to recovering LO has made it possible to formalize search and retrieval processes for LO in different repositories. One clear cut example would be the SILO search engine (the current Indexation and Query tool for learning objects) from the ARIADNE infrastructure [14]. Nevertheless, there are still many differences to overcome, among which the most important are: • Excessive response time for the repositories. • Limited number of results. • An elevated percentage of errors when accessing the repositories, which is primarily due to the numerous occasions that the repositories either do not respond to the queries, or are simply not functioning, as shown in figure 3. Beyond just the functioning of the repositories, which is partly due to the fact that they are young systems in constant evolution, is the fact that they lack other basic characteristics that are expected of any general search engine, such as classification

Hybrid Multiagent System for Automatic Object Learning Classification

65

Fig. 3. Faults vs. correct answer in the LOR retrieval

tasks, sorting results, the use of different filtering techniques (such as the collaborative technique), the automated management of repositories and the extraction of statistics that serve to improve the global query process.

3 Hybrid Multiagent-System for Automatic Classification Tools used to search and find Learning Objects in different systems do not provide a meaningful and scalable way to compare, rank and recommend learning material. As a solution to the problems observed with LOR search systems, we propose a hybrid system that integrates an agent based architecture that will serve to solve the issue of the federated search in repositories storing learning objects, and a neural network that sorts the obtained results. The system will be designed with the primary goal of performing simultaneous searches in various LOR, with a subsequent filtering and sorting process based on criteria related to the quality of the recovered elements. The system design will also include secondary objectives such as a series of characteristics that can be found in any search system; the idea being to attempt to homogenize the heterogeneous environment previously presented in this article. The system will establish uniformity in the automated management repositories, incorporating a search history and a statistical system. Each of these functionalities complements the search tool and facilitates its use in the educational sector. The system architecture is composed of two basic blocks: the interface, and the search system. These two blocks comprise the foundation of the system´s functionality. The primary interface is used for the communication between users and the search tool. A critical function in this block is the ability to take statistical data and used it to subsequently sort the results. The search system constitutes the core of the application and simultaneously performs tasks involving communication, extracting metadata, quality control, and sorting the LO.

66

A. Gil, F. de la Prieta, and V.F. López

Both blocks are designed using a hybrid system comprised of agents and a neural network that sorts the results. The agents are responsible for the communication, task flow, quality control, extrapolating statistical data, etc. The agents can modify their behavior to find the best solution for a problem, adjusting their behavior according to the knowledge they have acquired, a series of statistical data that they gather during each interaction with the repository containing learning objects, and what the end-user does with the results that have been provided. The following list explains the pre-defined agents that provide the basic functionalities of the architecture, as illustrated in Figure 4: • Repository agent. This agent is responsible for performing searches with the various repositories, extracting metadata, quality control for the LE received, and optimizing the search system. There will be one agent for each of the repositories so that multiple searches can be performed simultaneously. • Sort agent. Responsible for verifying, controlling and coordinating the results from the neural network, and classifying and cataloguing the results. • Statistical agent. This agent is responsible for gathering the statistical data obtained from the repositories and the interaction between the users and the search tool. It also provides the supervisor agent with the appropriate statistical data needed to effectively coordinate the tasks. • Supervisor agent. Responsible for supervising the other agents, and for coordinating tasks. It obtains data from the statistical agent and adapts the tasks to the system according to different variables, such as the state of communication, the system load, etc. A neural network is used for the sorting process, since it is specially designed for classification tasks such as those involved in the present study. In order to carry out the sorting process, it is necessary to establish a ranking system for the LO, indicating the rank for each LO. According to each position, the sort agent is responsible for presenting the results to the end-user working with the application. The learning process for

Repository 1 Neural Network

Repository 2

Client

Repository n

Fig. 4. Agent-Based architecture

Hybrid Multiagent System for Automatic Object Learning Classification

67

the network is supervised since the data used are gathered from the iteration with the users to evaluate the proposed ranking, and the weight is adjusted for each iteration.

4 Results and Conclusions The search and location services for educational content, and specifically LO, presented in this paper constitute the core of the development of distributed, open computer-based educational systems. For this reason the research in this area has been so active in recent years.

Fig. 5. Query duration and Query results

The design of the agent based architecture that we have constructed is ideal for solving problems that have been noted in LOR. It allows the system to adapt according to the workload and the state of the repositories with respect to the statistics data. One of the most significant advances that has been achieved is the reduction in the response time for the final results of the application, despite the high response time for LORs. Figure 5 clearly demonstrates these results. The sorting system proposed is also very convenient, given that the LOM standard does not define a minimal set of fields that a LO must have; this makes it difficult to evaluate if a LO has a sufficient quality. Using the feedback provided by the users, from the daily use of the application, the network goes through a learning process, which allows it to continually improve its results.

68

A. Gil, F. de la Prieta, and V.F. López

If we consider the results obtained with the proposed system, we are confident that the adaption and learning features of an agent-based architecture makes it ideal for solving federated search problems in heterogeneous repositories. Acknowledgements. This work has been supported by the MICINN TIN 200913839-C03-03 project and funded by FEDER.

References 1. Friesen, N.: Three Objections to Learning Objects. In: McGreal, R. (ed.) Online Education Using Learning Objects (2004), World Wide Web http://learningspaces.org/ n/papers/objections.html (retrieved February 15, 2010) 2. Sosteric, M., Hesemeier, S.: When is a Learning Object not an Object: A First Step towards a theory of learning objects. International Review of Research in Open and Distance Learning 3(2) (October 2002) 3. Wiley, D.A.: Connecting learning objects to instructional design theory: A definition a metaphor, and a taxonomy. In: Wiley, D.A. (ed.) The Instructional Use of Learning Objects. Association for Educational Communications and Technology, Bloomington (2001), World Wide Web http://www.reusability.org/read/ (retrieved March 23, 2009) 4. Polsani, P.: Use and Abuse of Reusable Learning Objects. Journal of Digital Information, Article No. 164, 3(4) (2003) 5. Chiappe, A., Segovia, Y., Rincon, H.Y.: Toward an instructional design model based on learning objects. Educational Technology Research and Development 55, 671–681 (2007) 6. DCMI Specifications, http://dublincore.org/specifications/ (retrieved February 18, 2010) 7. MPEG-7, MPEG Home Page, http://mpeg.chiariglione.org/standards/ mpeg-7/mpeg-7.htm (retrieved March 29) 8. IEEE 1484.12.1-2002, Final Draft Standard for Learning Object Metadata (LOM). The Institute of Electrical and Electronics Engineers, Inc., http://ltsc.ieee.org/wg12/files/LOM_1484_12_1_v1_Final_Draft .pdf (retrieved February 23, 2010) 9. Dagger, D., O’Connor, A., Lawless, S., Walsh, E., Wade, V.P.: Service-Oriented E-Learning Platforms: From Monolithic Systems to Flexible Services. IEEE Internet Computing 11(3), 28–35 (2007) 10. Simon, B., Massart, D., van Assche, F., Ternier, S., Duval, E., Brantner, S., Olmedilla, D.: A Simple Query Interface for Interoperable Learning Repositories. In: Workshop on Interoperability of Web-Based Educational Systems in conjunction with 14th International World Wide Web Conference (WWW 2005), Chiba, Japan (May 2005) 11. European Committe for Standardization, A Simple Query Interface Specification for Learning Repositories (November 2005) 12. Ternier, S., Massart, D., Campi, A., Guinea, S., Ceri, S., Duval, E.: Interoperability for Searching Learning Object Repositories. The ProLearn Query Language. D-Lib Magazine 14(1/2) (January/February 2008) 13. De la Prieta, F., Gil, A.: A Multi-agent System that Searches for Learning Objects in Heterogeneous repositories. In: Demazeau, Y., et al. (eds.) Trends in Practical Applications of Agents and Multiagent Systems: 8th International Conference on Practical Applications of agents and multiagent systems (PAAMS 2010). Advances in Intelligent and Soft Computing, pp. 355–362. Springer, Heidelberg (April 2010) 14. Ternier, S., Verbert, K., Parra, G., Vandeputte, B., Klerkx, J., Duval, E., Ordóñez, V., Ochoa, X.: The Ariadne Infrastructure for Managing and Storing Metadata. IEEE Internet Computing 13(4), 18–25 (2009)

On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms Miguel A. Veganzones and Carmen Hern´andez Computational Intelligence Group, UPV/EHU Facultad Informatica, Paseo Manuel de Lardizabal San Sebastian, Spain www.ehu.es/ccwintco

Abstract. In remote sensing hyperspectral image processing, identifying the constituent spectra (endmembers) of the materials in the image is a key procedure for further analysis. The contrast between Endmember Inductions Algorithms (EIAs) is a delicate issue, because there is a shortage of validation images with accurate ground truth information, and the induced endmembers may not correspond to any know material, because of illumination and atmospheric effects. In this paper we propose a hybrid validation method, composed on a simulation module which generates the validation images from stochastic models and evaluates the EIA through Content Based Image Retrieval (CBIR) on the database of simulated hyperspectral images. We demonstrate the approach with two EIA selected from the literature.

1 Introduction The high spectral resolution provided by current hyperspectral imaging devices facilitates identification of fundamental materials that make up a remotely sensed scene [1,7]. In the field of hyperspectral image processing, identify the constituent spectra (endmember) of the materials in the image is a key procedure for further analysis, i.e., unmixing, thematic map building, target detection, unsupervised segmentation. A library of known pure ground image spectra or laboratory sample spectra could be used. However, this poses several problems, such as the effects of the illumination on the observed spectra, the difference in sensor intrinsic parameters and the a priori knowledge about the material composition of the scene. Besides the methodological questions, this approach is not feasible when trying to process large quantities of image data. Current approaches try to induce automatically the endmembers from the image data itself, the so called Endmember Induction Algorithms (EIA). They try either to select some image pixel spectra as the best approximation to the endmembers in the image (i.e. [5]), or to compute estimations of the endmembers on the basis of the transformations of the image data (i.e. [6,11]). The comparison among the relative performances of these algorithms is a delicate issue. In essence, these algorithms are unsupervised: they explore the data or transformations of the unlabeled data. Therefore, validation approaches based on the quality of some classification performance measure may be inaccurate. Besides, there are big difficulties in obtaining good quality labeled hyperspectral test images. In this work E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 69–76, 2010. c Springer-Verlag Berlin Heidelberg 2010

70

M.A. Veganzones and C. Hern´andez

we propose a hybrid approach for validation. The first part of the approach is a hyperspectral image simulation module based on random field generation approaches. This module is used to generate the test images, with known endmembers and realistic abundance spatial distribution ground truth, that will be used for the comparison between algorithms. The second part of the approach consists of a Content Based Image Retrieval (CBIR) [10,3] scheme based on a distance defined on the set of endmembers induced from the image. We do not impose classification like schemes as the performance measures, but we evaluate the ability of the algorithms to uncover the underlying mixtures. We apply this methodology to compare the Endmember induction Heuristic Algorithm (EIHA) [5], with respect to the well known geometrical algorithm N-FINDER [11]. The structure of the paper is as follows: In section 2 we detail the proposed EIAs contrast methodology based on CBIR systems. Section 3 gives a short review of the the algorithms compared in this demonstration of the approach. In section 4 we define the experiments and present the results. Finally, we give some conclusions in section 5.

2 Contrast of EIAs Based on CBIR In this section we will first describe the details of the simulation module that provides the test images, then we present the similarity measure between hyperspectral images and finally we describe the comparison methodology as a whole. 2.1 Synthetic Hyperspectral Image Module The hyperspectral images used for the algorithm contrast are generated as linear mixtures of a set of spectra (the ground-truth endmembers) with synthesized abundance images. The ground-truth endmembers were randomly selected from a subset of the USGS spectral library. The synthetic ground-truth abundance images were generated in a two-step procedure. First, we simulate each abundance as a gaussian random field with Matern correlation function of parameters θ1 = 10 and θ2 = 1. We applied the procedure proposed by [8] for the efficient generation of a big domain of gaussian random fields. Second, to ensure that there are regions of almost pure endmembers, we selected for each pixel the abundance coefficient with the greatest value and we normalize the remaining coefficients to ensure that the normalized abundance coefficients sum up to one. It can be appreciated on the abundance images that each endmember has several regions of almost pure pixels, viewed as brighter regions in the images. We have synthesized a total of 6000 hyperspectral images divided in three datasets of 2000 images each. Each dataset is defined by the number of endmembers in the repository of ground-truth endmembers. We defined three repositories of ground-truth endmembers with 5, 10 and 20 endmembers each, representing an increasing diversity in the materials present in the dataset. The size of the images is 256x256 pixels with 269 spectral bands each. For each dataset we have generated collections of 500 images by the following procedure:

On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms

71

– First, we randomly decide the number of endmembers in the image, between 2 and 5. – Second, we select the image ground-truth endmembers from the corresponding repository of ground-truth endmembers. – Third, we generate the synthetic abundance images corresponding to each endmember, applying the corrections commented before. Figure 1 shows a subset of the collection of ground-truth endmembers. Figure 2 shows an example of the selected endmembers and the generated images of abundances to synthesize an hyperspectral image.

asphalt gds367 brick gds350 cardboard gds371 cedar gds360 dust debris wtc01 fabric gds436 fiberglass gds374 nylon gds432 particleboard gds364 plasticfilm gds402

0.7

0.6

Reflectance

0.5

0.4

0.3

0.2

0.1

0 0

50

100

150

200

250

Band

Fig. 1. Subset of endmembers selected from the USGS library to synthesize the hyperspectral images datasets

2.2 Dissimilarity between Hyperspectral Images A CBIR system is based on the definition of a similarity measure between the images. For hyperspectral images, two kind of informations can be used to build such a dissimilarity measure: the spectral and the spatial informations. Because we are interested in exploiting the spectral information, each hyperspectral image H is characterized by a set of induced endmembers E. A dissimilarity measure between two hyperspectral images, S (Hξ , Hγ ), is defined in terms of the distances between their corresponding sets of endmembers. Let it be Eξ = eξ1 , eξ2 , . . . , eξpξ the set of endmembers induced from the image Hξ in the database, where pξ is the number of induced endmembers from the ξ-th image. Given two images, Hξ , H γ , we compute the following matrix whose elements are the

72

M.A. Veganzones and C. Hern´andez 0.7

1 0.9

0.6

50

0.8

Reflectance

0.5

0.7 100

0.4

0.6 0.5

0.3

150

0.4 0.3

0.2

200

0.2

0.1

0.1 0

250 0

50

250

200

150

100

50

Band

100

150

200

250

1

1

0.9 50

0.8

0.9 50

0.8

0.7 100

0.6

0.7 100

0.6

0.5 150

0.4

0.5 150

0.4

0.3 200

0.2

0.3 200

0.2

0.1 250 50

100

150

200

250

0

0

0.1 250 50

100

150

200

250

0

Fig. 2. Example of the endmembers and abundance images used to generate a synthetic hyperspectral image. This example corresponds to a 10-endmembers dataset image using 3 endmembers.

distances between the pairs of endmembers built as all the possible combination of endmember from each image: Dξ,γ = [di,j ; i = 1, . . . , pξ ; j = 1, . . . , pγ ] ,

(1)

where di,j is any defined distance between the endmembers eξi and eγj , i.e. the Euclidean 2 distance, di,j = eξi − eγj . Then the dissimilarity between the images is given as a function of the distance matrix (1) by the following equation: S (Hξ , Hγ ) = (mr + mc ) (|pξ − pγ | + 1) ,

(2)

where mr and mc are the vectors built of the minimal values of the distance matrix, Dξ,γ , computed across rows and columns respectively. That is, the elements of the the row vector of minima are computed as follows: mr,i = min {dij } ; i = 1, . . . , pξ . j

Note that the endmember induction algorithm can give different number of endmembers for each image. The proposed dissimilarity function can cope with this asymmetry avoiding the combinatorial problem of trying to decide which endmembers can be matched and what to do in case that the number of endmembers is different from one image to the other. 2.3 Methodology for the Contrast of EIAs We propose here the summary methodology for the comparison among EIAs which hybridizes hyperspectral image simulation and CBIR performance measurements. The

On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms

73

CBIR approach is based on the dissimilarity measure (2) presented in the previous section. The comparison methodology consists on the following steps: 1. Build a database of synthetic hyperspectral images using a set of ground truth endmembers and simulated abundance images. 2. Compute the dissimilarity between each image pair in the database on the basis of the image ground truth endmembers. For each image rank the remaining images in the database with respect to their ground truth dissimilarity to it. 3. For each image in the dataset compute its endmembers using the EIAs. We will obtain as many endmember sets per image as EIAs are to be compared. 4. Compute the dissimilarity between each image pair in the database on the basis of the induced endmembers. For each image rank the remaining images in the database with respect to their induced dissimilarity to it. 5. Compare the rankings obtained by the use of ground truth endmembers and induced endmembers. First step involves the generation of in-lab controlled hyperspectral datasets. Although the ground-truth endmembers used to generate the synthetic images are going to be used to validate the EIA performance, we are not interested in compare them directly with the induced endmembers. The induced endmembers could have great differences respect to the real ones, but still they could retain enough discriminative information for the problem we are trying to solve, being of high relevance. Second step computes the dissimilarity measures between each image in the dataset using the ground-truth endmembers. This provide us the expected results for a given query, and so, the point of reference to define the performance measures. Third step makes use of an EIA to induce the endmembers from each image in the dataset. In the fourth step, those induced endmembers will be used to obtain the disimilarites between each image in the dataset, using the same dissimilarity function than in the step two. In figure 3 we illustrate how the dissimilarity ranking can vary when computed on the ground truth endmembers (blue line) or the induced endmembers (red line). An error measure of the induced endmembers could be the area between both lines, however we are not interested in such kind of measures, we prefer a more qualitative evaluation in terms of the recalling power of the CBIR system built over the above dissimilarity measure. From a CBIR point of view, the objective is to retrieve the k more similar images from a dataset given a query image. This methodology compares the results of a set of queries using the induced endmembers to the results using the ground-truth endmembers. This would indicate the ability of the used EIA to retrieve spectral information relevant for CBIR purposes. Step five use precision and recall measures to compare the EIAs on the basis of CBIR performance. Precision is defined as the fraction of the retrieved images that are relevant to the query, and recall as the fraction of the total number of relevant images (contained in the archive) that are retrieved [2]: | |R∩T | precisionK = |R∩T |T | and recallK = |R| , where T is the set of returned images and R is the set of images relevant to the query of size K.

74

M.A. Veganzones and C. Hern´andez

140

120

Groundtruth endmembers Induced endmembers

Dissimilarity

100

80

60

40

20

0 0

200

400

600

800

1000

Ranking

1200

1400

1600

1800

2000

Fig. 3. Dissimilarity respect to one image in the database, based on ground truth endmembers (blue line) and based on induced endmembers (red line). Images are ordered according to increasing ground truth dissimilarity.

3 Endmember Induction Algorithms Following the definition of the linear mixing model [7], the hyperspectral images are defined as the result of the linear combination of the pure spectral signature of ground components, so-called endmembers. Let E = [e1 , . . . , ep ] be the pure endmember signatures (normally corresponding to macroscopic objects in scene, such as water, soil, vegetation, ...) where each ei ∈ RL is an L-dimensional vector. Then, the hyperspectral p signature r at each pixel on the image is defined by the expression: r = i=1 ei φi + n, where the hyperspectral signature r is formed by the sum of the fractional contributions of each endmember and an independent additive noise component n. φ is the p-dimensional vector of fractional abundances at a given pixel . This equation can be extended to the full image as follows: H = EΦ + n, where H is the hyperspectral image and Φ is a matrix of fractional abundances. Therefore, the linear mixing model assumes that the endmembers are the vertices of a convex set that covers the image data. Because the distribution of the data in the hyperspace is usually tear-shaped most of the geometrical EIAs look for the minimum simplex that covers all the data. The N-FINDER [11] is one of the algorithms following this approach. It works by inflating a simplex inside the data, beginning with a random set of pixels. Previously, data dimensionality has to be reduced to n − 1 dimensions, being n the number of endmembers searched for. The algorithm starts by selecting an initial random set of pixels as endmembers. Then for each pixel and each endmember, the endmember is replaced with the spectrum of the pixel and the volume recalculated. If the volume increases, the endmember is replaced by the spectrum of the pixel. The procedure ends when no more replacements are done. The algorithm needs of some random initializations to avoid local maxima. The second algorithm tested is Endmember Induction Algorithm (EIHA) was fully described in [5], so that here we will only recall some of its main features. The algorithm

On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms

75

is based on the equivalence between Strong Lattice Independence and Affine Independence [9]. Strong Lattice Independence is a concept born in the field of Morphological Associative Memories, which became the field of Lattice Associative Memories. A set of vectors is said to be Lattice Independent if no one of them is a Linear Minimax combination of the remaining ones. It is Strong Lattice Independent if moreover there is min or max dominance defined on the set. One way to find sets of Strong Lattice Independent vectors is to progressively build Lattice Auto-Associative Memories (LAAM) with the detected endmembers. Because of the convergence properties of the Lattice AutoAssociative Memories, lattice dependent vectors will be recall-invariant, so lattice independent vectors can be detected as non-recall-invariant vectors. The EIHA proposed in [5] includes a noise filter that discards candidate vectors which are too close to the already detected endmembers.

4 Experimental Results Figure 4 shows the precisionk (H) and recallk (H) results of the N-FINDER and EIHA (denoted LAM in the figures) algorithms respect to three defined synthetic hyperspectral image databases, generated from a collection of 10 basic endmembers selected from the USGS library of spectral signatures, for all possible values of the size of the response K using the dissimilarity function 2. It can be appreciated that the behavior of both algorithms is quite similar. The recall is very low when the size of ground truth repository is 5, increases with the repository size, meaning that a greater variety of ground truth endmembers improves the probability of recalling relevant images. Contrary to that, the precision is greater for the smaller repository, and increasing the repository size decreases the precision of the responses. The precision of the EIHA in always better for small query size and for very big query sizes. There is some intermediate query size region where the precision of N-FINDER improves that of EIHA. Overall, both 1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

Euclidean Precision Euclidean Recall SAM Precision SAM Recall

0.3

0.2

0.2

0.1

0.1

0 1

Euclidean Precision Euclidean Recall SAM Precision SAM Recall

0.3

3

5

10

20

50

100

K

200

500

1000

0 1

2000

3

5

10

20

50

K

100

200

500

1000

2000

1

0.9

0.8

0.7

0.6

0.5

0.4

Euclidean Precision Euclidean Recall SAM Precision SAM Recall

0.3

0.2

0.1

0 1

3

5

10

20

50

K

100

200

500

1000

2000

Fig. 4. Precision and recall results for each dataset: (a) 5 endmembers dataset (b) 10 endmembers dataset (c) 20 endmembers dataset

76

M.A. Veganzones and C. Hern´andez

algorithms performance is comparable, and the selection of the most appropriate depends on the application setting. The query size may be the criterion for the selection.

5 Conclusions We propose a hybrid approach for the evaluation and comparison of Endmember Induction Algorithms (EIA). First a simulation module generates tailored databases of realistic hyperspectral images. Instead of the conventional classification performances we propose the use of CBIR based performance measures, where the CBIR is based on the spectral information of the images, that is, the dissimilarity between images is computed based on the distances between the sets of endmembers that characterize spectrally the image. We have show some results comparing two EIA from the literature. This comparison allows to identify some problem dependent parameters that would justify the selection of one algorithm over the other: query size, ground truth endmember variety. Further work may be addressed to test new EIA in this framework, and to the consideration of hierachical issues [4].

References 1. Clark, R.N., Roush, T.L.: Reflectance spectroscopy: Quantitative analysis techniques for remote sensing applications. Journal of Geophysics Research 89(B7), 6329–6340 (1984) 2. Daschiel, H., Datcu, M.: Information mining in remote sensing image archives: system evaluation. IEEE Transactions on Geoscience and Remote Sensing 43(1), 188–199 (2005) 3. Datcu, M., Seidel, K.: Human centered concepts for exploration and understanding of satellite images. In: IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, pp. 52–59 (2003) 4. Gra˜na, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 5. Grana, M., Villaverde, I., Maldonado, J.O., Hernandez, C.: Two lattice computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72, 2111–2120 (2009) 6. Ifarraguerri, A., Chang, C.-I.: Multispectral and hyperspectral image analysis with convex cones. IEEE Transactions on Geoscience and Remote Sensing 37(2), 756–770 (1999) 7. Keshava, N., Mustard, J.F.: Spectral unmixing. IEEE Signal Processing Magazine 19(1), 44– 57 (2002) 8. Kozintsev, B.: Computations with Gaussian Random Fields, PhD Thesis. University of Maryland (1999) 9. Ritter, G.X., Gader, P.: Fixed points of lattice transforms and lattice associative memories. In: Advances in imaging and electron physics. Advances in imaging and electron physics, vol. 143, p. 264. Academic press, London (2006) 10. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000) 11. Winter, M.E., Descour, M.R., Shen, S.S.: N-FINDR: an algorithm for fast autonomous spectral end-member determination in hyperspectral data, Denver, CO, USA, October 1999, vol. 3753, pp. 266–275. SPIE, San Jose (1999)

Self-emergence of Lexicon Consensus in a Population of Autonomous Agents by Means of Evolutionary Strategies Dar´ıo Maravall1,2, Javier de Lope2 , and Ra´ ul Dom´ınguez2 1

Dept. of Artiﬁcial Intelligence, Faculty of Computer Science Universidad Polit´ecnica de Madrid 2 Centro de Autom´ atica y Rob´ otica (UPM – CSIC) Universidad Polit´ecnica de Madrid [email protected], [email protected], [email protected]

Abstract. In Multi-agent systems, the study of language and communication is an active ﬁeld of research. In this paper we present the application of evolutionary strategies to the self-emergence of a common lexicon in a population of agents. By modeling the vocabulary or lexicon of each agent as an association matrix or look-up table that maps the meanings (i.e. the objects encountered by the agents or the states of the environment itself) into symbols or signals we check whether it is possible for the population to converge in an autonomous, decentralized way to a common lexicon, so that the communication eﬃciency of the entire population is optimal. We have conducted several experiments, from the simplest case of a 2 × 2 association matrix (i.e. two meanings and two symbols) to a 3 × 3 lexicon case and in both cases we have attained convergence to the optimal communication system by means of evolutionary strategies. To analyze the convergence of the population of agents we have deﬁned the population’s consensus when all the agents (i.e. the 100% of the population) share the same association matrix or lexicon. As a general conclusion we have shown that evolutionary strategies are powerful enough optimizers to guarantee the convergence to lexicon consensus in a population of autonomous agents. Keywords: Multi-agent systems; Evolution of artiﬁcial languages; Computational semiotics; Evolutionary strategies; Self-collective coordination; Evolutionary language games; Signaling games.

1

Introduction

“La langue n’existe qu’en vertu d’un sorte de contrat pass´e entre les membres de la communaut´e.” F. de Saussure (C.L.G., 1916) In a multi-agent system (e.g. a multi-robot team) the obtaining of a common lexicon or vocabulary is a basic step towards an eﬃcient performance of the whole system. In this paper we present the application of evolutionary strategies to the emergence of a common lexicon in a population of autonomous agents. We E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 77–84, 2010. c Springer-Verlag Berlin Heidelberg 2010

78

D. Maravall, J. de Lope, and R. Dom´ınguez

model the vocabulary or lexicon of each agent as an association matrix or lookup-table that maps the meanings (i.e. the objects and states of the environment) into symbols or signals. According to a long and well-stablished line of thought culminating with the work of the Swiss linguist Ferdinand de Saussure [1] and the American philosopher Charles S. Peirce [2], the pioneer of Semiotics, the association of the symbols of a language to their meanings are (1) arbitrary and (2) conventional. In this paper we use arbitrarily (in fact, randomly) initialized association matrices, for each agent and through and evolutionary process based on communicative or linguistic interactions. (Some authors, inspired in the ideas of the Austrian philosopher Ludwig Wittgenstein, have called them language games [3,4,5], although we believe a more founded denomination should be communication or signaling games as deﬁned by David K. Lewis [6].)

2 2.1

Formal Definitions Multi-agent Communication System

We deﬁne a Communication System, CS, in a population of agents as the triple: CS M, Σ, Ai

(1)

where M = {m1 , . . . , mp } is the set of meanings (i.e. the objects or states in the environment that can be of relevance for communication in the population of agents, Σ = {s1 , . . . , sn } is the set of symbols or signals used by the agents in their communication acts and which represent the actual meanings, Ai (i = 1, . . . , N ) are the association matrices of the agents deﬁning their speciﬁc associations between meanings and symbols: Ai = (arj )i

;

i = 1, . . . , N agents

(2)

in which the entries arj of the matrix A are nonnegative real numbers such that 0 ≤ arj ≤ 1, (r = 1, . . . , p; j = 1, . . . , n). These entries a give the strength of the association of meaning mr to symbol sj ; such that arj = 0 indicates no association at all and arj = 1 indicates total association. Note that these quantitative associations have a deterministic and nonprobabilistic nature so that the associations between meanings and symbols are based on the maximum principle, which means that the maximum value of the entries in a row (column) gives the valid association. An ideal, optimum association matrix is purely binary (the entries are either 0 or 1) and also have the additional restriction of having in each row only one 1 (i.e. no sinonyms are allowed) and having a unique 1 in each column, too (no homonyms are allowed).

3

Optimal Sausserean Communication System

As commented above and according to the semiotic tradition, the associations of the symbols of a lexicon to their meanings are arbitrary and conventional.

Self-emergence of Lexicon Consensus in a Population of Autonomous Agents

79

The arbitrariness of the association of a symbol (the signiﬁer) to its meaning (the signiﬁed) means that the entries a for each agent’s association matrix A are arbitrarily assigned. The conventional nature of these assocaitions means that all the agents of the population must have the same ideal and optimum association matrix in order to attain an optimal communication performance (we call Sausserean such an optimal communication system). This situation is called lexicon or vocabulary consensus and is a hard multi-agent coordination problem.

4

Experimental Results

We have conducted several experiments from the simplest case of a 2 × 2 association matrix to a 3 × 3 association matrix. The aim in all these experiments is to check whether it is possible to obtain with evolutionary strategies an optimal Sausserean communication system. We have also investigated the eﬀect of the population size on the convergence results. Concerning the inﬂuence of what can be called agents communication connectivity structure, we have focused in our simulations on the particular case in which each agent communicates with the rest of the population without any restriction. As commented in the introduction above, in this paper we use an arbitrarily — in fact, at random— initialized association matrix for each agent and through an evolutive dynamic process (implemented by means of evolutionary strategies) of communicative interactions of the population of agents we obtain a ﬁnal lexicon consensus, or perfect coordination, such that all the agents converge to the same vocabulary. Once all the agents’ vocabularies or association matrices have been randomly initialized, then starts the interactive communicative process in which each agent communicates with the rest of the population. For each communicative act taking place between two agents there are two possible outcomes: (1) succcess (i.e. both agents are able to understand each other) and (2) failure (i.e. the two agents are unable to understand each other). Whenever a successful communicative act has ocurred the ﬁtness of the two successful communicators are increased by the unity. After all the agents have communicated with all the remaining agents the current generation ends and a new generation is created (see the ﬂow-chart in Fig. 1). Obviously, the higher an agent’s ﬁtness, the higher its probability to pass to the next generation. Each communicative act between two agents is performed as follows: one of the two agents act as the sender (or speaker) and the other one acts as the receiver (or listener). Then, the speaker, using its own association matrix, sends all the existing meanings and the listener, in turn, decodes the symbols sent by the speaker, so that if the meaning decoded by the listener coincides with the meaning sent by the speaker, then a success has occurred. What agent assumes the role of speaker is irrelevant, as the possibilities of success do not change and the reward is the same for both agents.

80

D. Maravall, J. de Lope, and R. Dom´ınguez

Fig. 1. The ﬂow-chart describes the communicative act between agents

Finally, another key point to understand the working of the evolutionary strategies in our experiments is the coding of the association matrix in each individual and how its genes might mutate between generations. Each gene corresponds to a value of a communication matrix, during the mutation process, one of the genes will mutate with a certain probability, i.e. one component of the correspondent communication matrix will change its value (e.g. from 1 to 0). 4.1

The Simplest Case of 2 × 2 Association Matrices

We have started our experimentation with a simple case of two meanings and two symbols. We consider ideal matrices those which, being the same for each communicator, can not lead to misunderstanding (i.e. no homonym exists). For this particular case, there are two ideal association matrices which are the following: 01 10 2 2 M1 = M2 = (3) 10 01 Thus, our aim is to check if by applying evolutionary strategies it is possible that a population of N agents converge to the optimal Sausserean communication system in which all the agents share one of the optimum association matrices given above. To analyze the convergence of the population it is necessary, ﬁrst, to deﬁne numerically the lexicon consensus concept. Thus, we consider that the population has attained lexicon consensus when all the agents (i.e. the 100% of the population) have converged to the same association matrix. If this common lexicon or vocabulary of the population coincides with some of the optimum association matrices given above, then we can say that the population has converged to the optimum Sausserean communication system. Fig. 2(abc) show the average ﬁtness of the population versus the number of generations for diﬀerent population sizes. It can be noticed that in all the cases the average ﬁtness increases progressively towards the maximum ﬁtness for every population size. The lexicon consensus has been achieved in all the cases.

Self-emergence of Lexicon Consensus in a Population of Autonomous Agents

81

Fig. 2(d) shows the evolution of each individual ﬁtness versus the number of generations for a population size of N = 200. Each agent ﬁtness is represented by a dot in the ﬁgure and when several agent share the same ﬁtness their dots are overlapped. The ﬁgure is interesting because it describes how the consensus is achieved. There is an almost random behavior for the initial generations (until the 20th generation aprox.). At that moment, the majority of agents use the same association matrix. As a consecuence, the average population ﬁtness starts to grow up (see Fig. 2(c)). The evolutionary process goes on and the agents that used incorrect or diﬀerent association matrices evolve to the same matrix. This is achieved by all the agents around the 120th generartion.

(a)

(b)

(c)

(d)

Fig. 2. Population average ﬁtness for (a) N = 10, (b) N = 50, and (c) N = 200. (d) Agents ﬁtness evolution for N = 200.

Table 1 presents numerical results concerning the number of generations needed to attain the lexicon consensus. We have run the simulation several times per each population size. The “maximum generation” column indicates the stop condition for the evolutionary process. When that generation is reached, it is considered that the lexicon consensus is not achieved. The “ﬁnal generation” column shows the average generation in which the lexicon consensus has been achieved.

82

D. Maravall, J. de Lope, and R. Dom´ınguez Table 1. Summary of the simulation results for 2 × 2 matrices

Meanings 2 2 2

4.2

Symbols Agents Max. generation 2 10 2000 2 50 20000 2 200 20000

Final generation Std. deviation 78.416 65.128 43.427 33.536 408.834 310.952

A 3 × 3 Association Matrix

The ideal association matrices in the case of three meanings and three symbols result to be: ⎛ ⎛ ⎛ ⎞ ⎞ ⎞ 100 100 010 M13 = ⎝ 0 1 0 ⎠ M23 = ⎝ 0 0 1 ⎠ M33 = ⎝ 1 0 0 ⎠ 001 010 001 (4) ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 010 001 001 M43 = ⎝ 0 0 1 ⎠ M53 = ⎝ 0 1 0 ⎠ M63 = ⎝ 1 0 0 ⎠ 100 100 010 Fig. 3(abc) show the average ﬁtness of the population versus the number of generations for diﬀerent population sizes. It can be noticed in all the cases that the average ﬁtness increases progressively towards the maximum ﬁtness for every population size. As it already occurs in the simulations for two meanings and two symbols the lexicon consensus has been also achieved in all the cases. Fig 3(d) shows the evolution of each individual ﬁtness versus the number of generations for a population of N = 50. It can be observed in this case that all the agents use incorrect or diﬀerent communication matrices. As the average population ﬁtness (see Fig 3(b)) as the individual agent ﬁtness (see Fig 3(d)) get a value around 80, which is aproximately a half of the maximum ﬁtness value. Around the 200th generation, the average population ﬁtness and the number of agents sharing the same optimum communication matrix grows up and the consensus is achieved before the 250th generation. Table 2 presents numerical results concerning the number of generations needed to attain the lexicon consensus. The columns shows the same features than the previous 2 × 2 case. Table 2. Summary of the simulation results for 3 × 3 matrices Meanings 3 3 3

Symbols Agents Max. generation 3 10 2000 3 50 20000 3 200 20000

Final generation Std. deviation 105.472 644.261 683.624 1071.923 5168.819 5516.594

Self-emergence of Lexicon Consensus in a Population of Autonomous Agents

(a)

(b)

(c)

(d)

83

Fig. 3. Population average ﬁtness for (a) N = 10, (b) N = 50, and (c) N = 200. (d) Agents ﬁtness evolution for N = 50.

5

Conclusions and Further Research Work

As a general conclusion, in order to converge to an optimal decentralized communication system in a population of agents, we have shown that evolutionary strategies are powerful enough to solve this hard optimization problem. 5.1

Application of Genetic Algorithms

Evolutionary strategies are mainly based on the mutation operator and for that reason we are interested in investigating the potential of another search operators like cross-over, so that we plan to apply genetic algorithms to the lexicon consensus problem and to compare their results with those obtained with evolutionary strategies. 5.2

On-Line Learning Methods: The Cultural Transmission of Language

We have focused our attention on the evolutionary approach to language and communication development (or more precisely to lexicon consensus). Another diﬀerent approach to lexicon consensus is based on the application of on-line learning algorithms, also known as the cultural (as opposed to evolutionary inheritance) transmission of language [4,7,8,9].

84

D. Maravall, J. de Lope, and R. Dom´ınguez

Diﬀerent algoritms have been proposed so far: e.g. Willshaw associative neural networks [8] as well as heuristic algorithms for the updating of the entries of the association matrices [5,10]. We plan to compare these on-line learning techniques, including stochastic learning automata, with the evolutionary techniques used in this paper. 5.3

Implementation on Physical Robots Using Machine Vision and Sound Synthesizers

The research work described in this paper and the future research lines suggested above are the theoretical prolegomena of a future applied project we plan to develop aimed at building a working multi-robot system based on machine vision for the cognitive part (i.e. for the acquisition of the sensory information related to the meanings of the agents language) and also based on the use of sound synthesizers for the implementation of the symbols and signals used by the agents as words (i.e. our robots will communicate by singing instead of by speaking).

References 1. de Saussure, F.: Cours de Linguistic G´en´eral, Payot, Paris (1916); Ibidem Course on General Linguistics. English Edn. McGraw-Hill, New York (1969) 2. Peirce, C.S.: Selected Writings. Dover, New York (1966) 3. Nowak, M.A.: The evolutionary language game. J. Theor. Biol. 200, 147–162 (1999) 4. Steels, L., Kaplan, F.: Bootstrapping grounded word semantics. In: Briscoe, T. (ed.) Linguistic Evolution Through Language Acquisition, pp. 53–73. Cambridge University Press, Cambridge (2002) 5. Lenaerts, T., Jansen, B., Tuyls, K., et al.: The evolutionary language game: An orthogonal Approach. J. Theor. Biol. 235, 566–582 (2005) 6. Lewis, D.K.: Convention. Harvard University Press, Cambridge (1969) 7. Duro, R.J., Gra˜ na, M., de Lope, J.: On the potential contributions of hybrid intelligent approaches to multicomponent robotic system development. Information Sciences (in press) (2010) 8. Oliphant, M.: Formal approaches to innate and learned communication: Laying the foundation for language. Ph.D. Thesis Diss. U. of California, San Diego (1997) 9. Kaplan, F.: Simple models of distributed coordination. Connection Science 17(3-4), 249–270 (2005) 10. Divina, F., Vogt, P.: A hybrid model for learning word-meaning mappings. In: Vogt, P., Sugita, Y., Tuci, E., Nehaniv, C.L. (eds.) EELC 2006. LNCS (LNAI), vol. 4211, pp. 1–15. Springer, Heidelberg (2006)

Enhanced Self Organized Dynamic Tree Neural Network Juan F. De Paz, Sara Rodríguez, Ana Gil, Juan M. Corchado, and Pastora Vega Departamento de Informática y Automática, Universidad de Salamanca Plaza de la Merced s/n, 37008, Salamanca, España {fcofds,srg,abg,corchado,pvega}@usal.es Department of Computer Science and Automation, University of Salamanca Plaza de la Merced s/n, 37008, Salamanca, Spain

Abstract. Cluster analysis is a technique used in a variety of fields. There are currently various algorithms used for grouping elements that are based on different methods including partitional, hierarchical, density studies, probabilistic, etc. This article will present the ESODTNN neural network, an evolution of the SODTNN network, which facilitates the revision process by merging its operational process with dendrogram techniques, and enables the automatic detection of clusters in an increased number of situations. Keywords: Clustering, SOM, hierarchical clustering, PAM, Dendrogram.

1 Introduction Cluster analysis is a branch of multivariate statistical analysis that is used for detecting patterns in the classification of elements. Cluster analysis is used in a wide variety of fields including bioinformatics [9] [18] and surveillance [13] [14]. The methods used for clustering differ considerably according to the type of data and the amount of available information. Clustering techniques are typically broken down into the following categories [17] [18] hierarchical, which include dendrograms [6], agnes [8], Diana [8], Clara [8]; neural networks [15] [16] such as Self-Organized Maps [2] [20], GCS [3], ESOINN [1] [5]; methods based on minimizing objective functions, such as k-means [10] and PAM [8] (Partition around medoids); or probabilistic-based models such as EM [7] (Expectation-maximization) and fanny [8]. Traditionally the different methods try to minimize the distance that exists between the individuals and the groups. For certain algorithms, this assumes the need to either establish the number of clusters beforehand, or set the number once the algorithm has been completed. In certain cases, neural networks allow the number of clusters to be selected automatically based on the existing elements. The networks typically require a previous adaptation phase for the neurons and the initial data that generates the connections among the neurons. Some neural networks may also require establishing the level of connectivity for the neurons beforehand. This research presents an evolution of the Self Organized Dynamic Tree neural network (SODTNN) [19] called the Enhanced SODTNN (ESODTNN), which allows E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 85–92, 2010. © Springer-Verlag Berlin Heidelberg 2010

86

J.F. De Paz et al.

data to be grouped automatically, without having to specify the number of existing clusters. Additionally, the SODTNN eliminates the expansion phase before dividing and interconnecting the neurons, thus avoiding one of the most costly phases of the algorithm. The SODTNN uses algorithms to detect low density zones and graph theory procedures in order to establish a connection between elements. The SODTNN network presented certain deficiencies in creating groups with data that exhibited particular characteristics, such as elements distributed in parallel form. Furthermore, it did not incorporate a mechanism to facilitate the revision of the results. For these reasons it was necessary to further develop the network. The SODTNN integrates techniques from hierarchical and density-based models that allow the grouping and division of clusters according to the changes in the densities that are detected. The hierarchical process is based on the Kruskal algorithm, which creates a minimum spanning tree containing data for the problem at hand. Based on the information obtained from the minimum spanning tree, low density areas are detected by using a distance matrix for each cluster. The low density areas will allow the clusters to be separated iteratively. Furthermore, the minimum spanning tree determines the network structure and connections so that learning can take place according to the tree’s distribution. In addition to these features, the ESODTNN network incorporates modifications that enable distributed elements to be grouped according to certain condition that previous versions were unable to classify. To accomplish this, the definition of the density functions initially proposed in the SODTNN network was modified. Additionally, a global inheritance hierarchy, which is incorporated in a way similar to how a dendrogram functions, enables a revision of the newly created groups. This article is divided as follows: section 2 describes the SODTNN neural network, section 3 describes the ESODTNN, and section 4 presents the results and conclusions.

2 SODTNN The SODTNN neural network [19] can detect the number of existing groups or classes and, by using the Kruskal algorithm [11], create clusters based on the connections taken from the minimum spanning tree. As opposed to the ESOINN or GCS networks, the SODTNN does not distinguish between the original data and the neurons—during the initial training phase, the latter correspond to the position for each element. This makes it possible to eliminate the expansion phase for a Neural Gas (NG) [4] to adjust to the surface. However, this step can be applied in situations where the number of elements for carrying out the clustering process needs to be reduced. The SODTNN neural network is based on an initial state in which each piece of data contains an associated neuron. In general terms, the SODTNN neural network described in [19] bases its functionality on the Self Organized Map (SOM) networks, but it incorporates a new behavior in all aspects related to the neighborhood relationships that define the weight updates. Additionally, it provides an algorithm that enables the automatic detection of clusters. The neighborhood relationships are defined dynamically so that for every weight update, a minimal spanning tree, which defines the relationships within the

Enhanced Self Organized Dynamic Tree Neural Network

87

neighborhood, is established for each cluster. Based on these relationships, the neurons are updated according to the definition of SOMs, which use the following definitions for neighborhood functions and learning rate:

⎡ ⎢ i g (i, t ) = Exp ⎢− N ⎢ ⎣

( x j1 − x s1 ) 2 + K + ( x jn − nsn ) 2 Max{d ij } i, j

⎤ i ⋅t ⎥ −λ βN ⎥ ⎥ ⎦

⎡ t ⎤ η (t ) = Exp ⎢ − 4 ⎥ ⎣ βN ⎦

(1)

(2)

Where g (i , t ) represents the neighboring function η (t ) the learning rate [12], i represents the distance with the number of neighboring neurons, N is the total number of neurons, xij the coordinates, t the number of iterations and β a constant.

3 ESODTNN This study proposes the ESODTNN, an evolution of the SODTNN neural network [19]. The evolution of the network enables the detection of much more complex geometric forms than what the initial version could handle. Furthermore, it provides the necessary information for generating final trees using the information from the created clusters. The generation of the trees merges the concept of self-organizing maps and dendrograms. The division block was modified to correspond to the division used for generating the final network. It was also necessary to modify the part corresponding to the representation of the internal network information, which made it possible to gather the information associated with the clusters created in successive iterations. The nomenclature defines T as the set of neurons to be classified, A as the minimum spanning tree that contains all of the nodes from T where matrix C defines the connections between the nodes where element cij=1 if node i є T is connected with element j є T., D the distance matrix for T. For the network specifications, it is necessary to introduce the definition of the function fp, which makes it possible to navigate through the cluster tree that is created. 1.

Given ai is the neuron for the tree for which the average distance needs to be p

calculated, with i є T. where f (ai ) is the function that determines the parent node for ai, that is defined by f p :A→ A Where c = 1 and c ∈ C

ai → f p (ai ) = a s

si

si

The following sub-sections provide a general description of how the network functions with the newly applied modifications.

88

J.F. De Paz et al.

3.1 Density: Block 3 One of the main problems when assigning individuals into groups is knowing which divisions cause a significant rise in the density of the resulting clusters. ANN such as SOINN or ESOINN study the length of the links in order to determine if the length is different within the subgroup for each individual. This process requires the creation of subclasses within each cluster, which is done by using a set of functions that determines the threshold on which the creation of the subclasses is based. The ESODTNN searches for cut-off points in areas that produce a significant rise in density. It does so by using the relationship between the total distance calculated from the distance matrix, and the distance from the minimum spanning tree. In the original version of the algorithm, function fT was defined according to real distances, which made it difficult to create a cluster when dealing with data containing elements distributed in parallel form. The matrix for distances D is calculated by a determined measure of distance. In this case, the Euclidean distance was selected so that it would coincide with the measure used in different techniques. 1. Distance from tree f (C , D ) = A

∑d

ij

where cij = 1 , cij ∈ C , d ij ∈ D

i, j

2. Distance

between

neurons

in

f (C , D ) = ∑ d ( f ( as ), as ) + ∑ d ( f (at ), at ) T

p

s∈S

the

t∈T

Where

m−1

64 4744 8 64 4744 8 s = {( f p o K o f p )(as ), ( f p o K o f p )(as )..., f p (as ), as } m

#s = m

with

y

n n −1 64 47 44 8 64 47 44 8 p p p t = {( f o K o f )(at ), ( f o K o f p )(at )..., f p (at ), at }

#t = n

tree

p

with

, n and m are selected so that there cannot exist any value for n or m

n 64 4744 8 64 47 44 8 p p p ( f o K o f )( a s ) = ( f o K o f p )( a s ) m

3. Calculate the final density

f D (C , D) = f T ( D) / f A (C , D)

3.2 Division Algorithm The ESODTNN neural network incorporates the knowledge associated with successive divisions created by a neural network. To do so, the hierarchy of clusters is stored together with the information associated with the cut-off point obtained in the block of divisions. This way, the information from the successive divisions that are created, are stored in parallel. The initial tree node when there is only one cluster is denoted as

d (ak , a s ) = d ks with d ks ∈ D

Enhanced Self Organized Dynamic Tree Neural Network

89

nroot. The final leaf nodes will contains the clusters created, while the rest of the tree nodes will contain the information associated with the divisions. The final algorithms functions as follows: 1. If node nroot has not been initiated, create a new node in the tree and assign all of the elements nroot=T 2. Determine the cut-off point for the elements α , and the cut-off points for distance β . Both values are constant. 3. Recover the node associated to the explored cluster, which represents the cluster in the global hierarchy tree. This node will be represented as nc 4. Initiate i = 1 5. Select the greatest distance i for d jk ∈ D / c jk = 1 and remove the node

ak ∈ A 6. Given A1 , A2 are the remaining trees alter eliminating ak and the from the tree

T1 = {s ∈ T / as ∈ A1} and T2 = {s ∈ T / as ∈ A2 } with T = T1 ∪ T2 , T1 ∩ T2 = φ , C1 , C2 , D1 , D2 for the corresponding link and distance matrixes. 7. If # T1 /# T or # T2 /# T is less than α go to step 18. 8. Calculate the average distance from the node for the tree ak following the p

connection with the parent node f ( ak ) where

average distance algorithm

d amk = f m ( Aaei , D)

9. Determine if the distance from tree node average distance

ak and its parent is less than the

d sk ≤ d amk ⋅ β where s ∈ T and a s = f p (ak ) go to step

18. 10. the density for T ,

T1 and T2 following the density algorithm f D (C, D) ,

f D (C1 , D1 ) , f D (C2 , D2 ) 11. Calculate

the

new

δ (t + 1) = f (C1 , D1 ) + f (C2 , D2 ) δ (t ) = f D (C, D) D

D

density and

threshold the

previous

13. 14. 15.

If the value δ (t ) / δ (t + 1) < 1 /(δ (0) / δ (1) ⋅ ρ ) where ρ is constant, go to 17. Separate nc into nc_left, nc_right Assign the set of elements to each one of the nodes nc_left=T1 y nc_right=T2. Store the distance nc.length= d jk

16. 17. 18.

Finish Re-establish the connection ak with its parent node If i <# T calculate the value of i = i + 1 and go to step 4.

12.

90

J.F. De Paz et al.

4 Results and Conclusions In order to conclude the tests, we used previous data from the study in [19]. In the previous versión, when new clusters were detected, screen shots were taken to analyze the evolution of the system. The system did not provide more information than what could be seen directly in the graphical representation of the data. Figure 1 shows the tree for each of the 6 different levels that the network generated automatically using the data that was generated manually. The color red represents the connections between clusters that were previously joined and the distance that exists between the neurons that joined both sets. The red lines join the root nodes from the minimal spanning tree for the different clusters.

Fig. 1. The global tree displayed according to sublevels

Figure 2 contains the information from the global hierarchy tree along with the distances that produce the divisions. The distances of each branch represent the distances between the clusters, and the color is used to identify each cluster more easily. The size of the boxes represents the number of individuals. Following the information as shown, it is possible to validate the final result obtained by the network for those cases in which it is not possible to represent the information. Figure 2b analyzes another case study in which it is easy to detect that the cutoff point for finalizing the grouping should have been completed after the first division.

Enhanced Self Organized Dynamic Tree Neural Network

91

Fig. 2. Global trees with inheritance hierarchies. Figure a shows the tree used for the case study shown in Figure 1, and figure b corresponds to a different set of data.

The ESOINN neural network facilitates the review process, once the grouping process has completed, by storing an internal representation that contains the global inheritance hierarchy that was generated during the learning process. Although the relevant results have not been incorporated, the modification of the division algorithm also enables the detection of groupings that could not previously be detected, such as with elements distributed in parallel form. Acknowledgements. This development has been supported by the project MICINN reference DPI2009-14410-C02-01.

References [1] Furao, S., Ogura, T., Hasegawa, O.: An enhanced self-organizing incremental neural network for online unsupervised learning. Neural Networks 20, 893–903 (2007) [2] Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics, 59–69 (1982) [3] Fritzke, B.: A growing neural gas network learns topologies. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 625–632 (1995) [4] Martinetz, T., Schulten, K.: A neural-gas network learns topologies. Artificial Neural Networks 1, 397–402 (1991) [5] Shen, F.: An algorithm for incremental unsupervised learning and topology representation. Ph.D. thesis. Tokyo Institute of Technology (2006) [6] Saitou, N., Nie, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. 4, 406–425 (1987) [7] Xu, L.: Bayesian Ying–Yang machine, clustering and number of clusters. Pattern Recognition Letters 18(11-13), 1167–1178 (1997) [8] Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990) [9] Corchado, J.M., De Paz, J.F., Rodríguez, S., Bajo, J.: Model of experts for decision support in the diagnosis of leukemia patients. Artificial Intelligence in Medicine 46(3), 179–200 (2009)

92

J.F. De Paz et al.

[10] Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979) [11] Campos, R., Ricardo, M.: A fast algorithm for computing minimum routing cost spanning trees 52(17), 3229–3247 (2008) [12] Bajo, J., De Paz, J.F., De Paz, Y., Corchado, J.M.: Integrating case-based planning and RPTW neural networks to construct an intelligent environment for health care. Expert Systems with Applications 36(3) (2009) [13] Carbó, J., Molina, J.M., Dávila, J.: Fuzzy Referral based Cooperation in Social Networks of Agents. AI Communications 18(1), 1–13 (2005) [14] García, J., Berlanga, A., Molina, J.M., Casar, J.R.: Methods for Operations Planning in Airport Decision Support Systems. Applied Intelligence 22(3), 183–206 (2005) [15] Pavón, J., Arroyo, M., Hassan, S., Sansores, C.: Agent-based modelling and simulation for the analysis of social patterns. Pattern Recognition Letters 29, 1039–1048 (2008) [16] Pavón, J., Gómez, J., Fernández, A., Valencia, J.: Development of intelligent multisensor surveillance systems with agents. Robotics and Autonomous Systems 55(12), 892–903 (2007) [17] Rung-Wei, P., Yuh-Yuan, G., Miin-Shen, Y.: A new clustering approach using data envelopment analysis. European Journal of Operational Research 199(1), 276–284 (2009) [18] Kerr, G., Ruskina, H.J., Cranea, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008) [19] De Paz, J.F., Rodríguez, S., Bajo, J., Corchado, J.M., López, V.: Self Organized Dynamic Tree Neural Network. In: Cabestany, J., Sandoval, F., Prieto, A., Corchado, J.M. (eds.) IWANN 2009. LNCS, vol. 5517, pp. 221–228. Springer, Heidelberg (2009) [20] Baruque, B., Corchado, E.: A weighted voting summarization of SOM ensembles. Data Mining and Knowledge Discovery

Agents and Computer Vision for Processing Stereoscopic Images Sara Rodríguez, Fernando de la Prieta, Dante I. Tapia, and Juan M. Corchado University of Salamanca, Plaza de la Merced s/n, 37008 Salamanca, Spain {srg,fer,dantetapia,corchado}@usal.es

Abstract. This paper presents a Multi-Agent System (MAS) that implements techniques of Computer Vision for processing stereoscopic images by using stereo cameras. The MAS focuses on detecting people and their behavior through a two-phase method. In the first phase, the MAS creates a model of the environment by using a disparity map. It can be constructed in real time, even if there are moving objects in the area (such as people passing by). In the second phase, the MAS is able to detect people and their behavior by combining a series of techniques such as Sum of Absolute Differences (SAD) or Gradient Orientation Histograms (HOG). The preliminary results and conclusions after several experiments performed on real scenarios are described in this paper. Keywords: Multi-Agent Systems, Computer Vision, Stereo Processing, People Detection.

1 Introduction For many years, stereoscopic vision has received a considerable amount of attention from a psychophysics perspective. More recently the scientific community has demonstrated an increasing interest in computer vision. Image processing applications are varied and include such aspects as remote control, the analysis of biomedical images, character recognition, virtual reality applications, and enhanced reality in collaborative systems, among others. Although image analysis and people detection is a well explored topic, the use of multi-agent technology in this area has become the focal point of important interest [2][15]. The availability of commercial hardware to solve the low-level problems of stereo processing has turned them into an attractive sensor to develop intelligent systems. Stereo vision provides a type of information that offers several advantages in the development of human–machine applications. On one hand, the disparities information is less susceptible to illumination changes than the information provided by a single camera. Furthermore, the possibility of knowing the distance from the camera to the person is highly useful for locating and detecting of people and objects in a determined area. This paper presents a system that is capable of processing stereoscopic images and detecting people with a stereo camera that is placed in an under-head position. In the first phase, the system creates a model of the environment using a disparity map. The E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 93–100, 2010. © Springer-Verlag Berlin Heidelberg 2010

94

S. Rodríguez et al.

model can be constructed in real time, even when there are moving objects present in the area (such as people passing by). For this reason, it is an appropriate tool for using on mobile devices (such as mobile robots). Our system analyzes and detects people by combining a series techniques and raw images detectors such as Sum of Absolute Differences (SAD ) or Gradient Orientation Histograms (HOG). The remainder of this paper is structured as follows. Section 2 explains the basis of the background modeling and the problem description. Section 3 presents the built MAS. Lastly, Section 4 presents some results and conclusions obtained.

2 Background and Problem Description The primary concepts used in the development of this system include multi-agent technology and computer vision, specifically, stereoscopic data handling, as well as all related analysis and detection processes. 2.1 Multi-Agent Systems The use of deliberative BDI (Belief, Desire, Intention) agents [3][15] is essential in the development of the system we are proposing. Apparently, the human visual system deals with a high level of specialization when it comes to classifying and processing the visual information that it receives, such as reconstructing an image by texture, shadow, depth, etc. Computationally, it is difficult to compete with such specialization and separate from an image only the relevant information for any particular purpose. In response to this problem, we propose implementing an algorithm over a distributed agent-based system that will allow visual information contained in an image to be processed in real time. Because the system is capable of generating knowledge and experience, the effort involved in programming multiple tasks will also be reduced since it would only be necessary to specify overall objectives, allowing the agents to cooperate and achieve the stated objectives. 2.2 Computer Vision Traditionally, the use of stereoscopy as a technique for reconstructing images has dealt with two problems. Using a two-dimensional pair of images with spatial coordinates (u,v), the left image (L) and right image (R), the correspondence problem attempts to find which two pixels mL(uL,vL) from the left image and mR(uR,vR) from the right image correspond to the same pixel M in three-dimensional space (X,Y,Z). Once these pixels have been found, the reconstruction problem attempts to find the coordinates for pixel M[9]. Regarding the problem for obtaining correspondence, there are several strategies that can be classified in different ways [6]. The disparity calculation allows us to obtain the depth for each of the pixels on the image, obtaining one single image as the disparity map. Given that there is a direct correlation between the depth of the objects in an image and the disparity with a stereo pair, we can use the information from the disparity map as relative values for the depth of the objects. One of the most simple techniques is the Sum of Absolute Differences (SAD), since it operates

Agents and Computer Vision for Processing Stereoscopic Images

95

exclusively with whole numbers. The bookstore that was used in the project (Triclops SDK [14]) establishes a correspondence between the images using this technique. Regarding the use of stereo vision to detect people, the human form as been shown to be a difficult “object” to detect because of the significant variability in its appearance, clothes, and lighting conditions. The first thing that needs to be done is to detect a set of general characteristics inherent to the human form that can be readily identifiable, even under difficult circumstances with difficult lighting conditions [13] [10]. The present study will use the Histogram of Oriented Gradients (HOG) [4]. The fundamental idea is that the appearance of objects and the shape of an image can be described by the distribution of gradient intensity or direction. The application of these descriptors can be achieved by dividing the image into small connected regions, called cells. A histogram of oriented gradients is compiled for every cell and the pixels contained within each one. The application of these histograms represents the descriptor [5] [4]. The HOG descriptor has several advantages over other descriptor methods. As Dalal and Triggs [5] observed, this descriptor maintain an almost vertical position. The HOG descriptor is then especially suited for detecting humans in images [5] [4]. This paper presents a hybrid system able to analyze sequences of stereoscopic images and detect people. Our approach is built starting from several groups of agents whose properties and missions must be able to: (i) use a stereoscopic camera for inputting data in the system;(ii) obtain images from the stereoscopic camera as well as from a physical storage space;(iii) analyze the images obtained in order to calculate the distance between the camera and the elements found within the area represented by the image; (iv) display the intermediate steps of the analysis. To do so, the images that correspond to these steps will be presented; (v) make a graphical representation of the depth and location of the elements found within the area; (vi) detect human figures in images taken by a stereoscopic camera, using characteristic extraction algorithms; (vii) relate each of these partial results in order to obtain a global response from the multi-agent system.

3 Description of the Multi-Agent System The different process are implemented over a distributed agent-based architecture, which allows it to run tasks in parallel using each service as an independent processing unit. The system would allow a stereoscopic image processing system to carry out its own phases, which could be distributed among the agents. This way each of the tasks, including data gathering, preprocessing, filtering and reconstruction, as well as human form detection, could be carried out. A description and initial proposal for this system can be found in [15]. The system is comprised of a set of agents with defined roles that share information and services. The analysis of images supposes a complex process where each agent executes its task with the information available at each moment. The data obtained from the stereoscopic camera are entered into the system and shared between the agents that will use specific services to process the data (filtering, preprocessing, disparity analysis, etc.) [15]. This section presents the core of stereo processing and detection processing. It is divided into two parts. The first part explains the basis of stereo calculation and how

96

S. Rodríguez et al.

the 3D points captured by the stereo camera are translated to another reference system more appropriate for our purposes. The second part explains how the system is used to extract information about the location of the individuals. Stereo Processing: A commercial stereo camera [15] was employed in this work because it can capture two images from slightly different positions (stereo pair) that are transferred to the computer to calculate a disparity image containing the points matched in both images. Knowing the extrinsic and intrinsic parameters of the stereo camera it is possible to reconstruct the three-dimensional position. The stereo calculation is made with the Triclops library[14], which defaults to a pairing algorithm based on the Sum of Absolute Differences (SAD). As it is the method used in the libraries provided by Point Grey [14], we chose to implement it, along with a proposal for optimizing the algorithm via the parallelization of the tasks by the algorithm [15]. SAD operates exclusively with whole numbers. Given a pixel with coordinates (x, y) in the left image, a correlation index C(x, y, s) is calculated for each displacement s for the correlation window in the right image. To calculate the correlation index, u = w, v = w where 2w + 1 is the size of the window C ( x, y , s ) = | I ( x + u, y + v) − I ( x + u + s, y + v )) |

∑

u = − w, v = − w

l

r

centered on the pixel located at position (x, y) and Il, Ir are the gray values for the pixels in the left and right images respectively. The disparity dl(x, y) between the left and right image pixels is defined as displacement s which minimizes the correlation index: dl(x,y) = arg mins C(x,y,s).

Fig. 1. (a) Image of the right camera captured by stereo system. (b) Three-dimensional reconstruction of the scene showing the reference systems employed.

Fig. 1(a) shows an example of a scene captured with our stereo camera (the image corresponds to the right camera). Fig. 1(b) shows the three-dimensional reconstruction of the scene captured using the points detected by the stereo camera. The ‘‘world’’ and camera reference systems have been superimposed in Fig. 1(b), which shows that the number of points acquired by a stereo camera can be very high (they are usually referred to as point cloud). For that reason, many authors perform a

Agents and Computer Vision for Processing Stereoscopic Images

97

reduction of the amount of information by orthogonally projecting them onto a 2D plan-view map [7]. The image analysis provides a point cloud in which each point represents a pixel in the image that indicates the position of the coordinates XYZ. The starting point of the coordinates used to represent the image is taken from the right-side reference point of the camera. The x-axis is horizontal, i.e., the axis that joins the camera’s two reference points. The y-axis is the vertical axis that follows the camera’s orientation. The z-axis measures the distance to the camera and is the axis that is perpendicular to the reference point. People detection: People detection and stereo processing are treated as separate processes in this study. Every time a new image is captured, the system must first apply stereo processing to obtain the distances of the objects in the image. After that, the system can decide to apply the people detection to the same image. To achieve this goal, the HOG [4][5] algorithm, along with the classifiers used for training and validating the dataset, was applied, in addition to the test case that included the set of images taken with the stereoscopic camera. As explained in section 2, there are two fundamental issues regarding the detection of objects: the extraction of the most relevant characteristics, and the learning obtained from the classification [13]. Instead of using the raw image directly, it is common to use characteristics that are based on points, stains or gauss differences, intensities, gradients, color, texture, or a combination of these methods [8][1][16]. The developed system provides a set of services used specifically for extracting relevant characteristics [8][1][16], that comprise the base of the HOG descriptor and are used as the initial step of detection[15]. The software specifically uses the R-HOG block descriptor, which superimposes a square or rectangular block on the network cells. Each block is normalized independently. • During the first and second phase, the data are input from the camera, and the filtering process required for reducing the lighting and shading effects are applied. The third stage computes first order image gradients. These capture contour, silhouette and some texture information, while providing further resistance to illumination variations. The system provides the most appropriate form detector, according to Sobel, Canny, etc. [16][1][15]. • The fourth stage aims to produce an encoding that is sensitive to local image content while remaining resistant to small changes in pose or appearance. The adopted method combines radient orientation information locally in the same way as the SIFT[10] feature. The image window is divided into small spatial regions, called “cells”. For each cell we accumulate a local 1‐D histogram of gradient or edge orientations over all the pixels in the cell. This combined cell‐level 1‐D histogram forms the basic “orientation histogram” representation. Each orientation histogram divides the gradient angle range into a fixed number of predetermined bins. The gradient magnitudes of the pixels in the cell are used to vote into the orientation histogram. • The fifth stage computes normalization, which takes local groups of cells and contrast normalizes their overall responses before passing to next stage. Normalization introduces better invariance to illumination, shadowing, and edge contrast. It is performed by accumulating a measure of local histogram “energy” over local groups of cells that we call “blocks”. The result is used to normalize each cell in the block.

98

S. Rodríguez et al.

Typically each individual cell is shared between several blocks, but its normalizations are block dependent and thus different. As a result, the cell appears several times in the final output vector with different normalizations. There are two main blocks: rectangular R-HOG blocks and circular C-blocks. The R-HOG blocks are generally square and represented by three parameters: the number of cells per each block, the number of pixels per each cell, and the number of channels per cell histogram. The their experiment on human detection, Dalal and Triggs [4], determined the optimal parameters to be: 3×3 blocks of 6×6 pixel cells, with 10.4% miss rate. Four different block normalization schemes were evaluated for each of the above HOG [5]. If v ix the non-normalized vector that contains all of the histograms for one block, ||v||K the k‐norm for k=1, 2, and ε a small normalization constant to avoid division by zero. The four schemes are: (i) L2‐norm, v ← v / || v ||22 + ε 2 ;(ii) L2‐Hys, L2‐norm followed by clipping (limiting the maximum values of v to 0.2) and renormalizing; (iii) L1‐norm, v ← v /(|| v ||1 +ε ) ;(iv) L1‐sqrt, v ← v / v /(|| v ||1 +ε ) . L1‐norm followed by square root essentially treats the descriptor vectors as probability distributions, using the Bhattacharya distance between them. • The next step collects the HOG descriptors from all blocks of a dense overlapping grid of blocks covering the detection window into a combined feature vector for use in the window classifier. • The final step in recognizing forms using HOG is to feed the descriptors with some recognition system based on supervised learning. The SVM classifier is a binary classifier that looks for the optimal hyperplane as a decision making function. Once it has been trained with the images that are contained in a particular object, the SVM classifier can make decisions with regards to the presence of an object, such as a human being, in the testing images (which are different from the training images). To this end, a modified SVMLight packet from Dalal and Triggs [5] was used, obtaining the results shown in the Figure 2a, in which the human form can be detected in different positions.

4 Results and Conclusions We have conducted and experiment in order to test processing and detection of people under different lighting conditions and distances. We employed 640x480 sized images and sub-pixel interpolation to enhance the precision in the stereo calculation. The operation frequency of our system is near 10 Hz on a 3.2 Ghz Pentium IV computer running with Windows XP. The camera has the following characteristics [14]: 640x480 pixel sensors, monochrome, 3.8mm focal distance, capable of capturing 48 photograms per second, 120mm line base, 6 pine IEEE-1394 (FireWire) interface connection. The images were taken from a height of 1.6m with a 6fps velocity, obtaining approximately 400M coded data in AVI and PGM format (16 bits per image). More than the fifty percent of the computing time is dedicated to image capturing and stereo computation (about 50 ms) and the rest to detection (about 40 ms). This indicates that the proposed system is fast enough to be used in real time applications. In order to evaluate the system’s capacity, a variety of tests were performed. The system improved the processing capability compared to other centralized systems, given that the distributed agents approach makes it possible to carry out processing

Agents and Computer Vision for Processing Stereoscopic Images

99

tasks individually and with different techniques (selection of edge detectors, filters, final objectives: obtaining distance, detecting forms). Because the system is perfectly modulized, the tasks can be carried out simultaneously and or in a distributed manner. For computing disparity, we used a stereo algorithm that allowed for real-time computation of dense disparity maps [15]. Standard stereo calibration and external calibration were applied the first time the system was installed in the environment. Then the system can work without any manual configuration as long as the camera settings (e.g., camera position in the environment, lenses, focal lengths) are not changed. The system has been tested in many situations and different conditions. Here we report a summary of different experimental results performed during the course of our research. To measure the precision of the system we marked 9 positions in the environment at different distances, illumination and angles from the camera, and measured the distance returned by the system of a person standing at these positions. Although this error analysis is affected by imprecise positioning of the person on the markers, the results of our experiments, in Table 1 averaging 40 measurements for each position, show a precision in localization (i.e., average error) of less than 10 cm, but with a high standard deviation, which denotes the difficulty of obtaining a precise measurement. However, this precision is usually sufficient for many applications, like the one considered in domestic scenarios[15]. Table 1. Accuracy results Position

Distance

Angle

P1 P2 P3 P4 P5 P6 P7 P8 P9

3.20 m 3.00 m 2.50 m 3.20 m 3.00 m 2.50 m 3.20 m 3.00 m 2.50 m

00º 00º 00º 15 º 22 º 32 º -15 º -22 º -32 º

Avg. Err Natural 62 mm 60 mm 54 mm 85 mm 81 mm 77 mm 81 mm 81 mm 77 mm

Fluorescent 62 mm 60 mm 54 mm 85 mm 83 mm 83 mm 81 mm 83 mm 83 mm

Std. Dev. Natural 74 mm 58 mm 28 mm 96 mm 82 mm 60 mm 99 mm 90 mm 62 mm

Fluorescent 74 mm 58 mm 28 mm 96 mm 84 mm 64 mm 99 mm 92 mm 66 mm

Fig. 2. (a) Results in the process of detection human forms. (b y c) Distribution of accuracy of 16*16 block and Distribution of accuracy of 40*80 blocks.

Finally, as our main contribution is the integration of the HOG feature into the MAS, we evaluated the importance of using different sized blocks. Figures 2b and 2c show two histograms. Each histogram shows the distribution of the classification accuracy (measured by the same error rate) of blocks using the linear SVM. The first histogram is of all blocks of size 16 × 16 pixels and the second histogram is of blocks of size 40 × 80 pixels. The figure clearly shows that the 16 × 16 blocks are the least informative and that increasing the block population does contribute significantly to the performance of our system.

100

S. Rodríguez et al.

As conclusion, this has presented a hybrid MAS that implements techniques of Computer Vision for processing stereoscopic images by using stereo cameras. The system integrates several capabilities: stereo image processing, distance calculation, real time graphical representation of depth, the identification of elements found within the area and human detection. Experimental results show good performance and high robustness to many problems: shadows, global illumination changes, background changes, people not moving, etc. Acknowledgments. This work has been supported by the JCYL project SA071A08.

References 1. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Analysis and Machine Intelligence 8(6), 679–698 (1986) 2. Castanedo, F., García, J., Patricio, M.A., Molina, J.M.: Designing a Visual Sensor Network Using a Multi-agent Architecture. ASC 978-3-642-00486-5, pp. 430–439 (2009) 3. Corchado, J.M., Glez-Bedia, M., de Paz, Y., Bajo, J., de Paz, J.F.: Replanning mechanism for deliberative agents in dynamic changing environments. Computational Intelligence 24(2), 77–107 (2008) 4. Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: IEEE Conference Computer Vision and Pattern Recognition, USA, pp. 886–893 (2005) 5. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006) 6. Dhond, U.R., Aggarwal, J.K.: Structure From Stereo - A Review. IEEE Trans. on Systems, Man and Cybernetics 19(6) (November/December 1989) 7. Harville, M.: Stereo person tracking with adaptive plan-view templates of height and occupancy statistics. Image and Vision Computing 2, 127–142 (2004) 8. Lindeberg, T.: Feature detection with automatic scale selection. International Journal of Computer Vision 30, 77–116 (1998) 9. López-Valles, et al.: Revista Iberoamericana de Inteligencia Artificial 9(27), 35–62 (2005), ISSN: 1137-3601 10. Lowe, D.G.: Object recognition from local scale-invariant features. In: Int. Conf. on Computer Vision, vol. 2, pp. 1150–1157 (1999), doi:10.1109/ICCCV.1999.790410 11. Marr, D., Poggio, T.: A computational theory of human stereo vision. Proc. R. Soc. Lond., 301–328 (1979) 12. Nazlı y Pınar Histogram of oriented rectangles: A new pose descriptor for human action recognition. Image and Vision Computing 27, 1515–1526 (2009) 13. Pedersoli, M., Gonzàlez, J., Chakraborty, B., Villanueva, J.J.: Enhancing Real-Time Human Detection Based on Histograms of Oriented Gradients. In: Computer Recognition Systems. ASC, vol. 2. Springer, Berlin (2007) 14. Point Grey Research Inc. (2009), http://www.ptgrey.com/ 15. Rodríguez, S., De Paz, J.F., Bajo, J., Tapia, D.I., Pérez, B.: Stereo-MAS: Multi-Agent System for Image Stereo Processing. In: Cabestany, J., Sandoval, F., Prieto, A., Corchado, J.M. (eds.) IWANN 2009. LNCS (LNAI), vol. 5517, pp. 1256–1263. Springer, Heidelberg (2009) 16. Sobel, I., Feldman, G.: A 3x3 Isotropic Gradient Operator for Image Processing, presentado en la conferencia Stanford Artificial Project (1968)

Incorporating Temporal Constraints in the Planning Task of a Hybrid Intelligent IDS Álvaro Herrero1, Martí Navarro2, Vicente Julián2, and Emilio Corchado3 1

Department of Civil Engineering, University of Burgos, Spain C/ Francisco de Vitoria s/n, 09006 Burgos, Spain [email protected] 2 Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022, Valencia, Spain [email protected], [email protected] 3 Departamento de Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008 Salamanca, Spain [email protected]

Abstract. Accurate and swift responses are crucial to Intrusion Detection Systems (IDSs), especially if automatic abortion mechanisms are running. In keeping with this idea, this work presents an extension of a Hybrid Intelligent IDS characterized by incorporating temporal control to facilitate real-time processing. The hybrid intelligent -IDS has been conceived as a Hybrid Artificial Intelligent System to perform Intrusion Detection in dynamic computer networks. It combines Artificial Neural Networks and Case-based Reasoning within a multiagent system, in order to develop a more efficient computer network security architecture. Although this temporal issue was taken into account in the initial formulation of this hybrid IDS, in this upgraded version, temporal restrictions are imposed in order to perform real/execution time processing. Experimental results are presented which validate the performance of this upgraded version. Keywords: Multiagent Systems, Hybrid Artificial Intelligent Systems, Computer Network Security, Intrusion Detection, Temporal Constraints, Time Bounded Deliberative Process.

1 Introduction and Previous Work A wide range of Artificial Intelligence (AI) techniques and paradigms have been used to build Intrusion Detection Systems (IDSs). Previous studies have taken advantage of agents and Multiagent Systems (MASs) in the field of Intrusion Detection (ID) [1], [2], and different machine learning models – including Data Mining techniques and Artificial Neural Networks (ANN) – have successfully been applied to ID, such as [3], [4]. Moreover, AI techniques have been combined (genetic algorithms and KNearest Neighbor (K-NN) [5] or K-NN and ANN [6], among others), in order to approach ID from a hybrid point of view. In some cases they provide intelligence for MAS. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 101–110, 2010. © Springer-Verlag Berlin Heidelberg 2010

102

Á. Herrero et al.

Other approaches involve the application of AI techniques in real-time environments to provide real-time systems with 'intelligent' methods to solve complex problems. There are various proposals to adapt AI techniques to real-time requirements; the most promising algorithms within this field being Anytime [7] and approximate processing [8]. One line of research in Real-Time AI is related to large applications or hybrid system architectures that embody real-time concerns in many components [8], such as Guardian [9], Phoenix [10], or SA-CIRCA (Self-Adaptive Cooperative Intelligent Real-Time Control Architecture) [11]. MOVICAB-IDS (MObile VIsualisation Connectionist Agent-Based IDS) has been proposed [12], [13], [14] as a novel IDS comprising a Hybrid Artificial Intelligent System (HAIS). It monitors the network activity to identify intrusive events. This hybrid intelligent IDS combines different AI paradigms to visualise network traffic for ID at packet level. It is based on a dynamic MAS, which integrates an unsupervised neural projection model and the Case-Based Reasoning (CBR) paradigm [15] through the use of deliberative agents that are capable of learning and evolving with the environment. A dynamic multiagent architecture is proposed in this study that incorporates both reactive and deliberative (CBR-BDI agents [16]) types of agents. The proposed IDS applies an unsupervised neural projection model to extract interesting traffic dataset projections and to display them through a mobile visualisation interface. Response time [17] is a critical issue for most of the security infrastructure components of an organization. The importance of a quick and smart response increases in the case of IDSs. Systems that require a response before a specific deadline, as determined by the system needs, make it essential to monitor execution times. Each task must be performed by the system within a predictable timeframe, within which accurate execution of the given response must be guaranteed. This is the main reason for time-bounding the analytical tasks of MOVICAB-IDS. A key step is the assignation of each pending analysis to available Analyzer agents (see section 2 for further details), which is performed by the Coordinator agent. Accordingly, temporal constraints are incorporated in the Coordinator agent that maintains its deliberative capabilities. With reference to the aforementioned works in the area of Real-Time AI, a Real-Time Agent (RTA) can be defined as an agent with temporal constraints in at least one of its responsibilities [18]. So, an agent assigned to real-time environments must accomplish its goals, responsibilities and tasks with the additional difficulty of temporal constraints. Such agents may have temporal bounded interactions, a modification that will affect all communication processes in the MAS where the RTA is located. These issues are discussed in this research in the case of the MOVICAB-IDS Coordinator agent, which has been modelled as a real-time agent. This paper is organized as follows. Section 2 briefly outlines the architecture of MOVICAB-IDS. Section 3 shows the temporal bounded CBR cycle that is used as the deliberative process in some RTAs. Section 4 then describes the upgrade of the Coordinator agent in MOVICAB-IDS which works when real-time constraints are imposed. Furthermore, some results on CPU utilization and Average Execution Time are presented to show the benefits that arise from subjecting different phases of CBR to temporal constraints. Finally, the conclusions and future work are discussed in Section 5.

Incorporating Temporal Constraints in the Planning Task of a Hybrid Intelligent IDS

103

2 MOVICAB-IDS Architecture MOVICAB-IDS has been designed, on the basis of Gaia methodology [19], [20], as a MAS that incorporates six agents, as can be seen in Fig. 1.

Network Segment #1 S

P

Network Segment #2 A

P

S V

AGENTS ENVIRONMENT V RT

COOR

CONF

P Network Segment #3

A

S Network Segment #n

AGENTS COORDINATOR RT

COOR CONFIGURATION MANAGER

CONF

P

V

PREPROCESSOR

VISUALIZER

S

A

SNIFFER

ANALYZER

Fig. 1. MOVICAB-IDS architecture

The following agents are included in MOVICAB-IDS: • Sniffer: this reactive agent is in charge of capturing traffic data. The continuous traffic flow is captured and split into segments in order to send it through the network for further processing. Then, the readiness of the data for preprocessing is communicated. One agent of this class is located in each of the network segments that the IDS has to cover (from 1 to n). • Preprocessor: after splitting traffic data, the generated segments are preprocessed prior to their analysis. Once the data has been preprocessed, an analysis for this new piece of data is requested.

104

Á. Herrero et al.

• Analyzer: this is a CBR-BDI agent. It has a connectionist model embedded in the adaptation stage of its CBR system that helps to analyze the preprocessed traffic data. The connectionist model is called Cooperative Maximum Likelihood Hebbian Learning (CMLHL) [21]. This agent generates a solution (or achieves its goals) by retrieving a case and analyzing the new one using a CMLHL network. • ConfigurationManager: the configuration information is important as data capture, data splitting, preprocessing and analysis depend on the values of several parameters, such as packets to capture, segment length, features to extract... All this information is managed by the ConfigurationManager reactive agent, which is in charge of providing this information to the Sniffer, Preprocessor, and Analyzer agents. • Coordinator: There can be several Analyzer agents (from 1 to m) but only one Coordinator: the latter being in charge of distributing the analyses among the former. In order to improve the efficiency and perform real-time processing, the preprocessed data must be dynamically and optimally assigned. This assignment is performed taking into account both the capabilities of the machines where the Analyzer agents are located and the analysis demands (amount and volume of data to be analysed). As is well known, the CBR life cycle consists of four steps: retrieval, reuse, revision and retention [15]. The techniques and tools used by the Analyzer agent to implement these steps are described in section 4. • Visualizer: This is an interface agent. At the very end of the process, the analyzed data is presented to the network administrator (or the person in charge of the network) by means of a functional, mobile visualization interface. To improve the accessibility of the system, the administrator may visualize the results on a mobile device, enabling informed decisions to be taken anywhere and at any time.

3 Temporal Constraints for CBR-Based Agents CBR-BDI agents [22], [23] integrate the BDI (Belief-Desire-Intention) software model and the Case-Based Reasoning (CBR) paradigm. They use CBR systems [15] as their reasoning mechanism, which enables them to learn from initial knowledge, to interact autonomously with the environment, users and other agents within the system, and which gives them a large capacity for adaptation to the needs of its surroundings. These agents may incorporate different identification or projection algorithms depending on their goals. In this case, an ANN will be embedded in such agents to perform ID in computer networks. Although plenty of investigative effort has gone into these deliberative agents, only a few of the existing approaches cope with the application to MASs with real-time constraints [24], [25]. To apply the CBR paradigm as a reasoning mechanism in RTAs, it is necessary to adapt the techniques to be executed so that they satisfy realtime requirements. In real-time environments, the CBR stages must be temporal bounded to ensure that the solutions are produced on time; giving the system a temporal bounded deliberative case-based behaviour. Thus, a Temporal Bounded CBR (TBCBR) mechanism [26] is suitable as the basis of the deliberative reasoning of RTAs.

Incorporating Temporal Constraints in the Planning Task of a Hybrid Intelligent IDS

105

The TB-CBR cycle starts at the learning stage, which entails checking whether previous cases are awaiting revision and could be stored in the case-base. The plans provided at the end of the deliberative stage are stored in a solutions list while feedback on their utility is received. This list is accessed when each new TB-CBP cycle begins. If there is sufficient time, the learning stage is implemented for cases where solution feedback has recently been received. If the list is empty, this process is omitted. The next stage to be implemented is the deliberative stage. The retrieval algorithm is used to search the case-base and chose a case that is similar to the current case (i.e. the one that characterizes the problem to be solved). Each time a similar case is found, it is sent to the reuse phase where it is transformed into a suitable plan for the current problem by using a reuse algorithm. Therefore, at the end of each iteration in the deliberative stage, the TB-CBR method is able to provide a solution to the problem at hand, which may be improved in subsequent iterations if there is any time remaining at the deliberative stage. The temporal cost of executing the cognitive task is greater than or equal to the sum of the execution times of the learning and the deliberative stages (as shown in equation 1):

t cognitiveTask ≥ t learning + t deliberative tlearning ≥ (t revise + t retain ) * n

(1)

t deliberative ≥ (t retrieve + t reuse ) * m tcognitiveTask is the maximum time available for the agent to provide a response; t learning and t deliberative are respectively the total execution times of the learning and the deliberative stages; t x is the execution time of phase x; and n and m are the num-

where

ber of iterations of the learning and deliberative stages, respectively. The TB-CBR algorithm can be launched by the RTA when needed and if there is sufficient time to execute it. The maximum time available to complete the execution cycle ( t max , where t max ≥ t cognitiveTask ) must be stated. t max must be split into the learning and the deliberative stages to guarantee the execution of each stage. The timeManager(t max ) function is in charge of completing this task. Through this function, the designer specifies how the agent acts in the environment. The designer can assign more time to the learning stage, if an agent with a greater learning capacity is required. Otherwise, the function can allocate more time to the deliberation stage. Regardless of the type of agent, the timeManage r (t max ) function should allow sufficient time for each deliberative stage to ensure that at least one answer will be given before it ends. Naturally, the greater the time allocated to the deliberative stages, the better the response, using an anytime algorithm that enables RTAs to refine the result of each iteration. The anytime behaviour of the TB-CBR mechanism is achieved through the use of two loop control sequences. The loop condition is built by using the enoughTime function, which determines if a new iteration can be performed according to the total time allocated to each stage of theTB-CBR. The first phase of the algorithm executes the learning stage if the agent has the solutions from previous executions stored in the solutionQueue. The solutions are stored just after the end of the deliberative stage. The deliberative stage is only launched if

106

Á. Herrero et al.

there is a problem in the problemQueue that the agent cannot solve. This configuration allows the agent to launch the TB-CBR so that it only learns (no solution is needed and the agent has enough time to reason previous decisions), only deliberates (there are no previous solutions to consider and there is a new problem to solve) or so that it performs both functions.

4 Time-Bounding MOVICAB-IDS Coordinator Agent The MOVICAB-IDS Coordinator agent, in charge of assigning the pending analyses to the available Analyzer agents, is defined as a Case-Based Planning (CBP-BDI) agent [27]. CBP [28] attempts to solve new planning problems by reusing past successful plans [29]. The Coordinator agent plans to allocate an analysis to one of the available Analyzer agents based on the following criteria: • Location. Analyzer agents located in the network segment where the Visualizer or Pre-processor agents are placed would be prioritised. • Available resources of the computer where each Analyzer agent is running. The computing resources and their rate of use all have to be taken into account. Thus, the work load of the computers is measured. • Analysis demands. The amount and volume of data to be analysed are key issues to be considered. • Analyser agents behaviour. These agents behave in a "learning" or "exploitation" mode. Learning behaviour causes an Analyzer agent to spend more time over an analysis than exploitation behaviour does. Table 1. Coordinator agent - representation of case features. Classes: P (problem description attribute) and S (solution description attribute).

Class P

Feature #packets

Type Integer

P

Analyzers / Array location

P

Analyzers / Array features

P

Analyzers / Array failures

S

Analyzers / Array plans

Description Total number of packets contained in the dataset to be analysed. An array (of variable length depending on the number of available Analyzer agents) indicating the network segment where the Analyzer agent is located. An array (of variable length depending on the number of available Analyzer agents) containing information about the resources, their availability, and pending tasks. An array (of variable length depending on the number of available Analyzer agents) containing information about the number of times each Analyzer agent has stopped working in the recent past (execution failures). An array (of variable length depending on the number of available Analyzer agents) containing the analyses assigned to each Analyzer agent.

Incorporating Temporal Constraints in the Planning Task of a Hybrid Intelligent IDS

107

As a computer network is an unstable environment, the availability of Analyzer agents changes dynamically. Network links may stop working from time to time, so the Coordinator agent must be able to re-assign the analyses previously sent to the Analyzer agents located in the network segment that may be down at any one time. These issues are included in the representation of cases, as indicated in Table 1. As previously explained, this agent is upgraded to become a Temporal Bounded Case-Based Planning (TB-CBP) BDI agent, bringing MOVICAB-IDS closer to realtime ID. In this environment, analysis planning must be completed within a maximum time. For this reason, RTAs, which provide the necessary control mechanisms to carry out this task, are used to complete the analysis on time. Therefore, when a new segment is ready for analysis, the Coordinator agent, which is an RTA, has a limited amount of time to assign the pending analysis to the available Analyzer agents, which have to provide an answer as soon as possible. Therefore, a temporal constraint on the process (starting with a new generated segment and ending with the Analyzer agent generating the projection) is essential to ensure prompt execution. To perform this temporal control, all the steps in the process must be known and must be temporal bounded. Additionally, the system has to be deterministic. To guarantee these conditions, the Coordinator agent takes advantage of the TB-CBP to assign the pending analysis. The four phases of the TB-CBP cycle of the Coordinator agent are re-defined to comply with the temporal constraints. As a solution must be provided within the limited time, the retrieval and reuse stages are initially performed. When a solution for the new problem is obtained, if there is no pending analysis, the coordinator agent executes the revise and retain stages. Consequently, the four phases are defined as follows: • (Plan) Retrieve: when a new pre-processed dataset is ready, an analysis is requested from the Coordinator agent. The most similar plan is obtained by associative retrieval, taking into account the case/plan description shown in Table 1. As the time required to extract a case is predictable, the real-time agent knows how long it takes to get a first solution. Moreover, in the case of the coordinator agent having more time to analysis the problem, it will attempt to improve this first solution within the available time. • Reuse: the retrieved plan is adapted to the new planning problem. The only restriction is that the analyses running at that time (the results of which have not yet been reported) cannot be reassigned. The others (pending) can be reassigned in order to optimize overall performance. This phase is also temporal bounded. The Coordinator agent knows when it will finish the adaptation of the cases to the new planning problem. In this phase, as the Coordinator agent calculates when the analysis agents will finish their tasks, it either knows that it can continue building the plan, because the Analyzer agents will still be executing pending analyses when this phase is completed. Thus, the assignment of an analysis to an Analyzer Agent depends on its work load at that particular point in time. • Revise: the plan revision consists of a two-fold analysis. On the one hand, planning failures are identified by finding under-exploited resources. As an example, the following hypothetical situation is identified as a planning failure: one of the Analyzer agents is not busy performing an analysis while the other ones have a list

108

Á. Herrero et al.

of pending analyses. On the other hand, execution failures are detected when communication with Analyzer agents has been interrupted. Information on these failures is stored in the case base (as shown in Table 1) for future consideration. When an execution failure is detected, the CBP cycle is run from the beginning, which renews the analysis request. • Retain: when a plan is adopted, the Coordinator agent stores a new case containing the dataset-descriptor and the solution (see Table 1). The main advantages of using the TB-CBP with regard to using a CBP without temporal constraints are the maximization of CPU utilization and minimization of the average execution time of the analysis function. A set of tests were performed to validate this claim, the results of which are shown in Table 2. Table 2. TB-CBP vs. CBP

TB-CBP CBP

CPU utilization 97 % 72 %

Average Execution Time 1.6 ms 2.4 ms

5 Conclusions and Future Work This paper has presented a novel version of MOVICAB-IDS which imposes temporal constraints on the deliberative agents within a CBR architecture, which enables them to respond to events in real (both hard or soft) time. In this case, the Deliberative Coordinator agent, working at a high level with Belief-Desire-Intention (BDI) concepts, is temporal bounded by redefining the four stages of its CBP cycle. As a consequence, the Coordinator agent will always give a solution within the available time, thereby maximizing CPU utilization. Acknowledgments. This research is funded through the Junta of Castilla and León (BU006A08); the Spanish Ministry of Education and Innovation (CIT-020000-2008-2 and CIT-020000-2009-12); the Spanish government (TIN2005-03395 and TIN200614630-C03-01), FEDER and CONSOLIDER-INGENIO (2010 CSD2007-00022). The authors would also like to thank the vehicle interior manufacturer, Grupo Antolin Ingenieria S.A. for supporting the project through the MAGNO2008 - 1028.- CENIT Project funded by the Spanish Ministry of Science and Innovation.

References 1. Spafford, E.H., Zamboni, D.: Intrusion Detection Using Autonomous Agents. Computer Networks: The International Journal of Computer and Telecommunications Networking 34(4), 547–570 (2000) 2. Dasgupta, D., Gonzalez, F., Yallapu, K., Gomez, J., Yarramsettii, R.: CIDS: An Agentbased Intrusion Detection System. Computers & Security 24(5), 387–398 (2005) 3. Liao, Y.H., Vemuri, V.R.: Use of K-Nearest Neighbor Classifier for Intrusion Detection. Computers & Security 21(5), 439–448 (2002)

Incorporating Temporal Constraints in the Planning Task of a Hybrid Intelligent IDS

109

4. Sarasamma, S.T., Zhu, Q.M.A., Huff, J.: Hierarchical Kohonenen Net for Anomaly Detection in Network Security. IEEE Transactions on Systems Man and Cybernetics, Part B 35(2), 302–312 (2005) 5. Middlemiss, M., Dick, G.: Feature Selection of Intrusion Detection Data Using a Hybrid Genetic Algorithm/KNN Approach. In: Design and Application of Hybrid Intelligent Systems, pp. 519–527. IOS Press, Amsterdam (2003) 6. Kholfi, S., Habib, M., Aljahdali, S.: Best Hybrid Classifiers for Intrusion Detection. Journal of Computational Methods in Science and Engineering 6(2), 299–307 (2006) 7. Dean, T., Boddy, M.: An Analysis of Time-dependent Planning. In: 7th National Conference on Artificial Intelligence, pp. 49–54 (1988) 8. Garvey, A., Lesser, V.: A Survey of Research in Deliberative Real-time Artificial Intelligence. Real-Time Systems 6(3), 317–347 (1994) 9. Hayes-Roth, B., Washington, R., Ash, D., Collinot, A., Vina, A., Seiver, A.: Guardian: A Prototype Intensive-care Monitoring Agent. Artificial Intelligence in Medicine 4, 165–185 (1992) 10. Howe, A.E., Hart, D.M., Cohen, P.R.: Addressing Real-time Constraints in the Design of Autonomous Agents. Real-Time Systems 2(1), 81–97 (1990) 11. Musliner, D.J., Durfee, E.H., Shin, K.G.: CIRCA: A Cooperative Intelligent Real-time Control Architecture. IEEE Transactions on Systems, Man, and Cybernetics 23(6), 1561–1574 (1993) 12. Herrero, Á., Corchado, E., Sáiz, J.M.: MOVICAB-IDS: Visual Analysis of Network Traffic Data Streams for Intrusion Detection. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 1424–1433. Springer, Heidelberg (2006) 13. Herrero, Á., Corchado, E.: Mining Network Traffic Data for Attacks through MOVICABIDS. In: Foundations of Computational Intelligence. Studies in Computational Intelligence, vol. 4, pp. 377–394. Springer, Heidelberg (2009) 14. Corchado, E., Herrero, Á.: Neural Visualization of Network Traffic Data for Intrusion Detection. Applied Soft Computing Accepted with changes (2010) 15. Aamodt, A., Plaza, E.: Case-Based Reasoning - Foundational Issues, Methodological Variations, and System Approaches. AI Communications 7(1), 39–59 (1994) 16. Carrascosa, C., Bajo, J., Julián, V., Corchado, J.M., Botti, V.: Hybrid Multi-agent Architecture as a Real-Time Problem-Solving Model. Expert Systems with Applications. An International Journal 34(1), 2–17 (2008) 17. Kopetz, H.: Real-time Systems: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, Dordrecht (1997) 18. Julian, V., Botti, V.: Developing Real-time Multi-agent Systems. Integrated ComputerAided Engineering 11(2), 135–149 (2004) 19. Zambonelli, F., Jennings, N.R., Wooldridge, M.: Developing Multiagent Systems: the Gaia Methodology. ACM Transactions on Software Engineering and Methodology 12(3), 317–370 (2003) 20. Wooldridge, M., Jennings, N.R., Kinny, D.: The Gaia Methodology for Agent-Oriented Analysis and Design. Autonomous Agents and Multi-Agent Systems 3(3), 285–312 (2000) 21. Corchado, E., Fyfe, C.: Connectionist Techniques for the Identification and Suppression of Interfering Underlying Factors. International Journal of Pattern Recognition and Artificial Intelligence 17(8), 1447–1466 (2003) 22. Corchado, J.M., Laza, R.: Constructing Deliberative Agents with Case-Based Reasoning Technology. International Journal of Intelligent Systems 18(12), 1227–1241 (2003) 23. Pellicer, M.A., Corchado, J.M.: Development of CBR-BDI Agents. International Journal of Computer Science and Applications 2(1), 25–32 (2005) 24. Carrascosa, C., Terrasa, A., García-Fornes, A., Espinosa, A., Botti, V.: A Meta-Reasoning Model for Hard Real-Time Agents. In: Marín, R., Onaindía, E., Bugarín, A., Santos, J. (eds.) CAEPIA 2005. LNCS (LNAI), vol. 4177, pp. 42–51. Springer, Heidelberg (2006)

110

Á. Herrero et al.

25. Surka, D.M., Brito, M.C., Harvey, C.G.: The Real-time ObjectAgent Software Architecture for Distributed Satellite Systems. In: IEEE Aerospace Conference 2001, vol. 6, pp. 2731–2741 (2001) 26. Navarro, M., Heras, S., Julián, V.: Guidelines to Apply CBR in Real-Time Multi-Agent Systems. Journal of Physical Agents 3(3), 39–43 (2009) 27. Bajo, J., Corchado, J., Rodríguez, S.: Intelligent Guidance and Suggestions Using CaseBased Planning. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 389–403. Springer, Heidelberg (2007) 28. Hammond, K.J.: Case-based Planning: Viewing Planning as a Memory Task. Academic Press Professional, Inc., London (1989) 29. Spalzzi, L.: A Survey on Case-Based Planning. Artificial Intelligence Review 16(1), 3–36 (2001)

HERA: A New Platform for Embedding Agents in Heterogeneous Wireless Sensor Networks Ricardo S. Alonso1, Juan F. de Paz1, Óscar García2, Óscar Gil1, and Angélica González1 1

Department of Informatics, University of Salamanca, Plaza de la Merced, s/n, 37008, Salamanca, Spain 2 School of Telecommunications, University of Valladolid, Paseo de Belén, 15, 47011, Valladolid, Spain {ralorin,fcofds,oscar.gil,angelica}@usal.es, [email protected]

Abstract. Ambient Intelligence (AmI) based systems require the development of innovative solutions that integrate distributed intelligent systems with context-aware technologies. In this sense, Multi-Agent Systems (MAS) and Wireless Sensor Networks (WSN) are two key technologies for developing distributed systems based on AmI scenarios. This paper presents the new HERA (Hardware-Embedded Reactive Agents) platform, that allows using dynamic and self-adaptable heterogeneous WSNs on which agents are directly embedded on the wireless nodes This approach facilitates the inclusion of context-aware capabilities in AmI systems to gather data from their surrounding environments, achieving a higher level of ubiquitous and pervasive computing. Keywords: Ambient Intelligence, Distributed Architectures, Multi-Agent Systems, Service-Oriented Architectures, Wireless Sensor Networks.

1 Introduction The Ambient Intelligence tries to adapt technology to people’s needs by incorporating omnipresent computing elements that communicate ubiquitously amongst themselves [1]. These systems capture and manage relevant information that surrounds them and constitutes the context [2]. Nowadays, there are many small, portable and nonintrusive devices for users [3]. However, the integration of such devices is not a easy task. Therefore, it is necessary to develop innovative solutions that integrate different approaches to create flexible and adaptable AmI-based systems. In this sense, the implementation of distributed architectures is presented as a solution to such problems [5]. One of the most prevalent alternatives in distributed architectures is agent and Multi-Agent Systems (MAS) which can help to distribute resources and reduce the central unit tasks [6]. A distributed agent-based architecture provides more flexible ways to move functions to where actions are needed, thus obtaining better responses at execution time, autonomy, services continuity, and superior levels of flexibility and scalability than centralized architectures [7]. In addition, AmI-based developments require the use of sensors strategically distributed over the environment. In this sense, sensors expand the agents’ context-aware E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 111–118, 2010. © Springer-Verlag Berlin Heidelberg 2010

112

R.S. Alonso et al.

capabilities in order to change their behavior dynamically and personalize their reactions [2]. Wireless Sensor Networks (WSN) are more flexible and require less infrastructural support than wired sensor networks. Although there are plenty of technologies for implementing WSNs (e.g. ZigBee, Wi-Fi or Bluetooth), it is not easy to integrate devices from different technologies into a single network [3]. The lack of a common architecture may lead to additional costs due to the necessity of deploying non-transparent interconnection elements amongst different networks [4]. This paper describes the new HERA (Hardware-Embedded Reactive Agents) platform. HERA is an evolution of SYLPH (Services laYers over Light PHysical devices) [5], which has the ability of using dynamic and self-adaptable heterogeneous WSNs. In HERA, unlike other approaches, agents are directly embedded on the WSN nodes and their services can be invoked from other nodes in the same WSN or other WSN connected to the former one. As SYLPH, HERA focuses specially on devices with small resources to save CPU time, memory size and power consumption. The next section presents the problem description that essentially motivated the development of HERA. Section 3 describes the main characteristics and components of HERA. Finally, section 4 presents the conclusions and future work.

2 Problem Description AmI-based systems must be dynamic, flexible, robust, adaptable to changes in context, scalable and easy to use and maintain. The development of AmI-based systems that integrate different subsystems demands the creation of complex and flexible applications. As the complexity of an application increases, it needs to be divided into modules with different functionalities. Distributed architectures as agents and Multi-Agent Systems comprise one of the areas that can contribute expanding the possibilities of AmI [6] [7]. An agent can be defined as a computational system situated in an environment and is able to act autonomously in this environment to achieve its design goals [6]. Expanding this definition, we have that an agent is anything with the ability to perceive its environment through sensors and respond in the same environment through actuators [7]. A MAS is defined as any system composed of multiple autonomous agents with incomplete capabilities to solve a global problem, where there is no central control system, the data is decentralized and the computing is asynchronous [6] [7]. Nevertheless, MASs do not always cover the actual necessities of distributed systems. Thus, several developments consider the integration between agents and modern functional architectures, such as Service-Oriented Architectures (SOA) [8]. Any AmI-based scenario has to take into account the context information, which can be gathered by WSNs. The context includes information about the people and their environment. In an AmI scenario, nodes must communicate directly with one another in a distributed way [4]. In a centralized architecture, most of the intelligence is located in a central node. That is, the central node is responsible for managing most of the functionalities and knowing the existence of all nodes in a specific WSN. Nonetheless, this model can be improved using a common distributed architecture where all nodes in the system can know about the existence of any other node in the same system regardless of the technology they use or the WSN to which they belong.

HERA: A New Platform for Embedding Agents in Heterogeneous WSNs

113

The fusion of the multi-agent technology and WSNs is not easy due to the difficulty to develop distributed applications for devices with limited resources. The interfaces developed for these distributed applications are too simple or even do not exist, which complicate even more their maintenance. There are researches [9] that implement methodologies for the systematic development of MASs for WSNs. Some researches that relate multi-agent technologies with WSNs [10] state that the combination of such technologies extends the life of wireless sensor nodes through the reduction of the power consumption. ActorNet [11] is a study that describes a mobile agent platform for WSNs. ActorNet provides an abstract environment for mobile code oriented to light objects over WSNs. However, each mobile agent in ActorNet is only centered on a sensor node. Furthermore, other works, such as [12] relate MASs and WSNs talking about Mobile Agents based on WSN (MAWSN). Chen et al. (2009) [12] describe their system, Agilla, as a mobile agent that facilitates the fast development of applications on WSNs and applies it to fire tracking. The HERA multi-agent platform enables an extensive integration of WSNs and optimizes the distribution, management and reutilization of the available resources and functionalities in such networks. Because HERA is based on SYLPH, it allows devices from different radio technologies to coexist in the same distributed network, whilst other approaches do not. As SYLPH, HERA can be executed over multiple wireless devices independently of their microcontroller or the programming language they use. This facilitates the inclusion of context-aware capabilities into AmI-based systems because developers can dynamically integrate and remove nodes on demand.

3 The HERA Platform This section presents the new HERA (Hardware-Embedded Reactive Agents) platform and the WSN-SOA platform on which it is based, SYLPH (Services laYers over Light PHysical devices) [5] [13]. HERA facilitates agents, applications and services communication through the using of dynamic and self-adaptable heterogeneous WSNs. In HERA, agents are directly embedded on the WSN nodes and their services can be invoked from other nodes in the same network or other network connected to the former one. HERA is an evolution of the SYLPH platform [5]. SYLPH follows a SOA model [8] to integrate heterogeneous WSNs in AmI-based systems. In SYLPH, the main aim is to distribute resources over multiple WSNs by modeling the functionalities as independent services. The information gathered by SYLPH nodes can be managed by intelligent agents by means of the integration of SYLPH with FUSION@ (Flexible and User Services Oriented Multi-agent Architecture) [14] [15]. Thus, the agents running on FUSION@ can use reasoning mechanisms to adapt their behavior to the context information obtained through SYLPH nodes. However, HERA takes a step ahead over SYLPH, embedding agents directly on the wireless nodes. SYLPH covers aspects relative to services such as registration, discovering and addressing. Some nodes in the system can integrate services directories for distributing registration and discovering services. SYLPH allows the interconnection of several networks from different wireless technologies, such as ZigBee or Bluetooth. In this case, the WSNs are interconnected through a set of intermediate gateways connected

114

R.S. Alonso et al.

to several wireless interfaces simultaneously. SYLPH implements an organization based on a stack of layers. Each layer in one node communicates with its peer in another node through an established protocol. In addition, each layer offers specific functionalities to the immediately upper layer in the stack. These functionalities are usually called interlayer services. The SYLPH layers are added over the existent application layer of each WSN stack, allowing the platform to be reutilized over different technologies. The SYLPH Message Layer (SML) offers the upper layers the possibility of sending asynchronous messages between two nodes through the SYLPH Services Protocol (SSP), the internetworking protocol of the SYLPH platform. That is, it allows sending packets of data from one node to another node regardless of the WSN to which each one belongs. The SYLPH Application Layer (SAL) allows different nodes to directly communicate with each other using SSDL (SYLPH Services Definition Language) requests and responses that will be delivered in encapsulated SML messages following the SSP. The SSDL is the IDL (Interface Definition Language) used by SYLPH. SSDL has been specifically designed to work with limited computational resources nodes [13]. Furthermore, there are other interlayer services offered by the SAL for registering services or finding services offered by other nodes. In fact, these interlayer services call other interlayer services offered by the SYLPH Services Directory Sublayer (SSDS). The SSDS creates dynamical services tables to locate and register services in the network. Any node that stores and maintains services tables is called SYLPH Directory Node (SDN). The HERA platform adds its own agents layer over the SYLPH stack. Thus, HERA takes advantage of one of the main features of SYLPH: it can be run over any wireless sensor node regardless its radio technology or the used programming language used. This way, HERA agents running on WSNs with different radio technology can communicate amongst them through one or more SYLPH Gateways [13]. The main components added by HERA to the SYLPH stack are the HERA Agents Layer (or just HERA) and the HERA Communication Language Emphasized to Simplicity (HERACLES). HERA agents are specifically intended to run on devices with reduced resources, similar as SYLPH was designed for. Each HERA agent is an intelligent piece of code running over the SAL. As will be explained below, it must be almost one facilitator agent in every agent platform. This agent is the first created in the platform and acts as a directory for searching agents. In HERA, the equivalent of this agents are the HERA-SDNs (HERA Spanned Directory Nodes). HERA agents communicate amongst them through HERACLES, the agent communication language designed for being used under HERA. The HERACLES language is directly based on the SSDL language used in SYLPH. When HERACLES is translated to HERACLES frames, the actual data transmitted amongst nodes, they are encapsulated into simple SSDL frames using their service id field as “HERA”. As SSDL [13], HERACLES has two distinct representations: one human-readable similar to C language and used for services development proposals and one embedded on frames that SYLPH nodes understand. This is done in this way because in nodes with reduced resources is not convenient to overload the microcontroller and the memory space with a heavy parsing method. When developing, programmers use the human-readable representation to define agents' functionalities, similar as follows:

HERA: A New Platform for Embedding Agents in Heterogeneous WSNs

115

request { sender agent1; receiver agent2; content { message msg; }; in_reply_to msg; reply_with response; language HERACLES; ontology HERA_ONTOLOGY; }; As mentioned above, in SYLPH, a node in a specific type of WSN (e.g. ZigBee) can directly communicate with a node in another type of WSN (e.g. Bluetooth). Therefore, several heterogeneous WSNs can be interconnected through a SYLPH Gateway. A SYLPH Gateway is a device with several hardware network interfaces, each of which is connected to a distinct WSN. The SYLPH Gateway stores routing tables to forward SSP packets amongst the different WSNs with which it is interconnected. The information transported in the SSP header is enough to route the packets to the corresponding WSN. If several WSNs belong to the SYLPH network, there is no difference between invoking a service stored in a node in the same WSN or in a node from a different WSN. Figure 1 shows the stack of layers and protocols of HERA/SYLPH running over two nodes from different technologies (ZigBee and Bluetooth). These nodes communicate between them through a SYLPH Gateway that connects both WSNs.

Fig. 1. SYLPH and HERA over a ZigBee and a Bluetooth network through a SYLPH Gateway

As HERA is implemented over SYLPH through the addition of new layers and protocols (HERA agents and HERACLES), HERA can be used over several heterogeneous WSNs in a transparent way. HERA agents are implemented over the SAL layer, so HERA does not mind how many intermediate SYLPH Gateways and different WSNs are between the location of a HERA agent and another. Both HERA agents and HERA-SDNs communicate directly through HERACLES amongst them.

116

R.S. Alonso et al.

Therefore, as other code in the SAL layer, HERA agents use the SML interlayer services, being the SSP protocol the responsible for delivering the data from the source node to the destination node through the necessary SYLPH Gateways. Every agent platform needs some kind of facilitator agent that must be created before other agents are instantiated in the platform [6]. Facilitator agents act as agent directories. This way, every time an agent is created, it is registered on one of the existing facilitator agents, so other agents can request one of the facilitator agents to know where an agent with certain functionalities is located and how to invoke them. As HERA is intended to run on machines that are not more complex than sensor nodes themselves are, it was necessary to design some hardware facilitator agents that did not need more CPU complexity and memory size than what a regular sensor node has. To do this, HERA's facilitator agents, called HERA-SDNs (HERA Spanned Directory Nodes) are based on SYLPH SDNs [13]. This way, any HERA node can perform as a HERA-SDN, just as SDNs do in SYLPH. The first HERA-SDN instantiates itself and starts the HERA platform registering a special SYLPH service called “HERA” on any SDN of the SYLPH network. When a new HERA agent wants to instantiate itself through a HERA-SDN, it looks for the “HERA” service on the SDNs, using a interlayer service of the SAL. When a HERA agent is correctly instantiated, the HERA layer also registers a “HERA” service for the agent in a SDN. This way HERA agents can send HERACLES messages to each other over SYLPH, referring to the “HERA” service. Figure 2 shows the basic operation of HERA agents and HERA-SDNs. To start the HERA platform, a first HERA-SDN must be created. This is HERA-SDN #0 running

Fig. 2. HERA agents and HERA Spanned Directory Nodes (HERA-SDNs) operation

HERA: A New Platform for Embedding Agents in Heterogeneous WSNs

117

on SDN #0. Since that moment, other SYLPH nodes with HERA running on them can instantiate more HERA agents or HERA-SDNs. As HERA is designed to run on wireless devices with low resources, it is very important that the platform does not have to depend on only one HERA-SDN. This way, if the HERA-SDN crashes the HERA platform will not fail. After the creation of the HERA-SDN #0, the SYLPH node #1 instantiates through the HERA-SDN #0 a new HERA-SDN, the HERA-SDN #1, thus increasing the redundancy and the robustness of the platform. The SYLPH node #1 also instantiates, now through the HERA-SDN #1, the HERA agent #2. The SYLPH node #3 instantiates the HERA agent #3 through the HERA-SDN #0, even if they are in distinct WSNs. With SYLPH, this is no longer a problem. At a specific moment, SDN #0 is powered off. After that, the SYLPH node #3 looks for HERA-SDN #0. As the HERA-SDN #0 does not reply, the SYLPH node #3 broadcasts a call-for-proposal HERACLES frame to find a HERA-SDN alive. The HERA-SDN #1 replies, so the HERA agent #4 is created through it. As can be seen, there can be several HERA agents in a single SYLPH node. Moreover, there can be SYLPH nodes with no HERA implementation. A SYLPH Gateway is a clear example of this, as explained above. At a certain moment, the HERA agent #2 wants to look for an agent, but HERA-SDN #0 is not alive again, so HERA agent #2 also has to look for an existing HERA-SDN in the platform, thus storing entries for the two HERA-SDNs.

4 Conclusions and Future Work The HERA multi-agent platform allows wireless devices from different technologies to work together in a distributed way. The HERA model was designed to be implemented on AmI-based systems. However, it can be used by any kind of complex systems as it is capable of integrating almost any desired functionality. In HERA, unlike other approaches, agents are directly embedded on the sensor nodes. Thanks to SYLPH, HERA agents can communicate amongst themselves regardless of the technology or the programming language they use. This makes it easier to enhancing AmI-based systems with context-aware capabilities because developers can dynamically add and remove nodes on runtime. Even though HERA is focused specially on sensor nodes with small resources, it can be implemented on any kind of devices. HERA adds intelligence to sensors by means of light reactive agents, improving the experience of developers and users in context-aware technologies. We are currently working on integrating the HERA platform into FUSION@ [15]. In this new approach, each wireless node implements some HERA agents that work inside FUSION@ as other software agents running on platforms as JADE. This way, there is no difference between a software agent and a hardware agent. Future work consists of implementing this new approach in real scenarios, especially in systems already developed by the BISITE Research Group, as healthcare telemonitoring scenarios where SYLPH and FUSION@ were already tested [14] [13] [5]. Acknowledgments. This project has been supported by the Spanish Ministry of Science and Technology project OVAMAH: TIN 2009-13839-C03-03.

118

R.S. Alonso et al.

References 1. Lyytinen, K., Yoo, Y.: Issues and Challenges in Ubiquitous Computing. ACM Commun. 45(12), 62–65 (2002) 2. Dey, A.K., Abowd, G.D.: Towards a Better Understanding of Context and Contextawareness. In: CHI 2000 Workshop on the What, Who, Where, When, and How of Context-awareness, pp. 304–307 (2000) 3. Marin-Perianu, M., Meratnia, N., Havinga, P., de Souza, L., Muller, J., Spiess, P., et al.: Decentralized Enterprise Systems: A Multiplatform Wireless Sensor Network Approach. IEEE Wireless Communications 14(6), 57–66 (2007) 4. Mukherjee, S., Aarts, E., Roovers, R., Widdershoven, F., Ouwerkerk, M.: Amiware: Hardware Technology Drivers of Ambient Intelligence. Springer, Heidelberg (2006) 5. Corchado, J.M., Bajo, J., Tapia, D.I., Abraham, A.: Using Heterogeneous Wireless Sensor Networks in a Telemonitoring System for Healthcare. IEEE Transactions on Information Technology in Biomedicine. Special Issue: Affective and Pervasive Computing for Healthcare 5518, 663–670 (2009) 6. Wooldridge, M.: An Introduction to MultiAgent Systems, 2nd edn. Wiley, Chichester (2009) 7. Jennings, N.R., Sycara, K., Wooldridge, M.: A Roadmap of Agent Research and Development. Autonomous Agents and Multi-Agent Systems 1(1), 7–38 (1998) 8. Cerami, E.: Web Services Essentials: Distributed Applications with XML-RPC, SOAP, UDDI & WSDL, 1st edn. O’Reilly Media, Inc., Sebastopol (2002) 9. Tynan, R., O’Hare, G., Ruzzelli, A.: Multi-Agent System Methodology for Wireless Sensor Networks. Multiagent and Grid Systems 2(4), 491–503 (2006) 10. Liu, Y., Zhou, C., Wang, K., Li, D., Guo, D.: Multi-agent ERA Model Based on Belief Interaction Solves Wireless Sensor Networks Routing Problem. In: Corchado, E., Abraham, A., Pedrycz, W. (eds.) HAIS 2008. LNCS (LNAI), vol. 5271, pp. 30–37. Springer, Heidelberg (2008) 11. Kwon, Y., Sundresh, S., Mechitov, K., Agha, G.: ActorNet: An Actor Platform for Wireless Sensor Networks. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 1297–1300. ACM, Hakodate (2006) 12. Chen, M., Kwon, T., Yuan, Y., Choi, Y., Leung, V.C.M.: Mobile Agent-based Directed Diffusion in Wireless Sensor Networks. EURASIP J. Appl. Signal Process. 2007(1), 219– 219 (2007) 13. Tapia, D.I., Alonso, R.S., De Paz, J.F., Corchado, J.M.: Introducing a Distributed Architecture for Heterogeneous Wireless Sensor Networks. In: Omatu, S., Rocha, M.P., Bravo, J., Fernández, F., Corchado, E., Bustillo, A., Corchado, J.M. (eds.) IWANN 2009. LNCS, vol. 5518, pp. 116–123. Springer, Heidelberg (2009) 14. Alonso, R.S., García, Ó., Zato, C., Gil, Ó., de la Prieta, F.: Intelligent Agents and Wireless Sensor Networks: A Healthcare Telemonitoring System. In: Proceedings of PAAMS 2010, Salamanca, Spain (2010) 15. Tapia, D.I., Rodríguez, S., Bajo, J., Corchado, J.M.: FUSION@, A SOA-Based Multiagent Architecture. In: International Symposium on Distributed Computing and Artificial Intelligence 2008 (DCAI 2008), pp. 99–107 (2008)

A Genetic Algorithm for Solving the Generalized Vehicle Routing Problem P.C. Pop1 , O. Matei1 , C. Pop Sitar1 , and C. Chira2 1 2

North University of Baia Mare, Str. V. Babes , 430083, Baia Mare, Romania [email protected], [email protected], [email protected] Babes-Bolyai University, Str. M. Kogalniceanu, 400084, Cluj-Napoca, Romania [email protected]

Abstract. The generalized vehicle routing problem is a variant of the well-known vehicle routing problem in which the nodes of a graph are partitioned into a given number of node sets (clusters) and the objective is to ﬁnd the minimum-cost delivery or collection of routes, subject to capacity restrictions, from a given depot to the number of predeﬁned clusters passing through one node from each clusters. We present an eﬀective metaheuristic algorithm for the problem based on genetic algorithms. The proposed metaheuristic is competitive with other heuristics published to date in both solution quality and computation time. Computational results for benchmarks problems are reported and the results point out that GA is an appropriate method to explore the search space of this complex problem and leads to good solutions in a short amount of time. Keywords: generalized vehicle routing problem, genetic algorithms, integer programming.

1

Introduction

Combinatorial optimization problems can be generalized in a natural way by considering a related problem relative to a given partition of the nodes of the graph into node sets (clusters), while the feasibility constraints are expressed in terms of the clusters. In this way, it is introduced the class of generalized combinatorial optimization problems. In the literature a number of well-known combinatorial optimization problems are generalized by introducing cluster of nodes: the generalized minimum spanning tree problem, the generalized traveling salesman problem, the generalized vehicle routing problem, etc. These problems belong to the class of N P-complete problems, are harder than the classical ones and nowadays are intensively studied due to the interesting properties and applications in the real world, even though many practitioners are reluctant to use them for practical modeling problems because of the complexity of ﬁnding optimal or near-optimal solutions. Metaheuristic algorithms are more eﬀective and specialized than the classical heuristic algorithms. They combine more exclusive neighbourhood search, memory structures and recombination of solutions and tend to provide better E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 119–126, 2010. c Springer-Verlag Berlin Heidelberg 2010

120

P.C. Pop et al.

results, e.g. by allowing deterioration and even infeasible solutions. However, their running time is unknown and they are usually more time consuming than the classical heuristics. Furthermore, they involve many parameters that need to be tuned for each problem before they can be applied. For the last years metaheuristics have been researched considerably, producing some eﬀective solution methods for combinatorial optimization problems. The Genetic Algorithms (GA) were introduced by Holland in the early 1970s, and were inspired by Darwin’s theory. The idea behind GA is to model the natural evolution by using genetic inheritance together with Darwin’s theory. In GA, the population consists of a set of solutions or individuals instead of chromosomes. A crossover operator plays the role of reproduction and a mutation operator is assigned to make random changes in the solutions. A selection procedure, simulating the natural selection, selects a certain number of parent solutions, which the crossover uses to generate new solutions, also called oﬀspring. At the end of each iteration the oﬀspring together with the solutions from the previous generation form a new generation, after undergoing a selection process to keep a constant population size. The solutions are evaluated in terms of their ﬁtness values identical to the ﬁtness of individuals. GA have seen a widespread use amongst modern metaheuristics, and several applications to combinatorial optimization problems have been reported, see [1,6]. The Generalized Vehicle Routing Problem (GVRP) is an extension of the Vehicle Routing Problem (VRP) and was introduced by Ghiani and Improta [3]. The GVRP is the problem of designing optimal delivery or collection routes, subject to capacity restrictions, from a given depot to a number of predeﬁned, mutually exclusive and exhaustive node-sets (clusters). As far as we know, the only speciﬁc algorithm for solving the GVRP was developed by Pop et al. [5] and was based on ant colony optimization. The aim of this paper is to describe a new model of the GVRP based on integer programming and to develop a GA in order to solve the problem, which is competitive with the other modern heuristics in terms of computing time and quality solution. Computational results for benchmarks problems are reported.

2

Definition of the Generalized Vehicle Routing Problem

Let G = (V, A) be a directed graph with V = {0, 1, 2, ...., n} as the set of nodes and the set of arcs A = {(i, j) | i, j ∈ V, i = j}. A nonnegative cost cij associated with each arc (i, j) ∈ A. The set of nodes is partitioned into k + 1 mutually exclusive nonempty subsets, called clusters, V0 , V1 , ..., Vk (i.e. V = V0 ∪ V1 ∪ ... ∪ Vk and Vl ∩ Vp = ∅ for all l, p ∈ {0, 1, ..., k} and l = p). The cluster V0 has only one vertex 0, which represents the depot, and remaining n nodes belonging to the remaining k clusters represent geographically dispersed customers. Each customer has a certain amount of demand and the total demand of each cluster can be satisﬁed via any of its nodes. There exist m identical vehicles, each with a capacity Q. The generalized vehicle routing problem (GVRP) consists in ﬁnding the minimum total cost tours of starting and ending at the depot, such that each cluster

A Genetic Algorithm for Solving the Generalized Vehicle Routing Problem

121

should be visited by exactly once, the entering and leaving nodes of each cluster is the same and the sum of all the demands of any tour (route) does not exceed the capacity of the vehicle Q. An illustrative scheme of the GVRP and a feasible tour is shown in the next ﬁgure.

V1 2

3

4

route 1

5

1

route 3

13

V2

0

6

V3

V6 7 11 12

10

9

V4 8

V5 Fig. 1. An example of a feasible solution of the GVRP

The GVRP reduces to the classical Vehicle Routing Problem (VRP) when all the clusters are singletons and to the Generalized Traveling Salesman Problem (GTSP) when m = 1 and Q = ∞. The GVRP is N P -hard because it includes the generalized traveling salesman problem as a special case when m = 1 and Q = ∞. Several real-world situations can be modeled as a GVRP: the post-box collection problem becomes an asymmetric GVRP if more than one vehicle is required, the distribution of goods by sea to a number of customers situated in an archipelago. In this application, a number of potential harbors is selected for every island and a ﬂeet of ships is required to visit exactly one harbor for every island. Furthermore, several applications of the GTSP may be extended naturally to GVRP.

3

A New Integer Programming Model of the GVRP

In order to provide a new model of the GVRP as an integer programming we consider the following binary variables: xvij = 1 if the vehicle v drives from customer i to customer j and 0 otherwise, and zi = 1 if the customer i is selected in the tour and 0 otherwise.

122

P.C. Pop et al.

Then the GVRP can be modeled as an integer programming as follows: cij xvij minimize v∈M (i,j)∈A

subject to

zi = 1, for l = 1, ..., k

(1)

i∈Vl

xvij = zi , ∀i ∈ {1, ..., n}

v∈M j∈V

di

xv0j = 1, ∀v ∈ M

i∈V \{0}

xvik −

i∈V xvij , zi

xvij ≤ Q, ∀v ∈ M

(3)

j∈V

i∈V \{0}

(2)

xvkj = 0, ∀k ∈ V \ {0} and ∀v ∈ M

(4) (5)

j∈V

∈ {0, 1}, ∀(i, j) ∈ A, v ∈ M, ∀i ∈ V

Constraints (1) are to make sure that exactly one customer is visited from each cluster and the constraint (2) is to make sure that each visited customer is assigned to exactly one vehicle. In equation (3) the capacity constraints are stated: the sum over the demands of the customers within each vehicle v has to be less than or equal to the capacity of the vehicle. The ﬂow constraints are shown in equations (4) and (5). Firstly, each vehicle can only leave the depot once. Secondly, the number of vehicles entering every customer k and the depot must be equal to the number of vehicles leaving. Finally, the last constraints are the integrality constraints.

4

The Genetic Algorithm for GVRP

We present in this section a genetic algorithm for solving the GVRP. The initial population is created by randomly generating a pre-speciﬁed number of feasible collection of generalized routes. Before adding a new chromosome to the initial population P (0), we apply local search and add the resulting chromosome as a new population member. Within each generation t, new chromosomes are created from population P (t − 1) using two genetic operators: crossover genetic operator and random mutation. The total number of oﬀspring created using these operators is equal to the number of chromosomes in population P (t − 1), with αP (t − 1) oﬀspring created using the crossover operator and βP (t − 1) oﬀspring created using mutation, where the fractions α and β are going to be determined experimentally. Once the pre-speciﬁed number of oﬀspring is generated, a subset of chromosomes is selected to be carried over to the next generation. The algorithm terminates when the termination condition, a pre-speciﬁed number of generations, is met. We give now a more detailed description of the genetic algorithm.

A Genetic Algorithm for Solving the Generalized Vehicle Routing Problem

123

Representation. We represent a chromosome by an array so that the gene values correspond to the nodes selected to form the collection of generalized routes. In the next two ﬁgures we present an example of a feasible solution of the GVRP and its corresponding chromosome representation. 3

5

route 1

0

6

7 route 2

11

0

13 route 3

Fig. 2. An example of a feasible solution of the GVRP

In ﬁgure 2, we plot an individual representing a possible solution for a GVRP instance with 13 customers partitioned into 5 clusters using 3 vehicles at most. The values {1, ..., 13} represent the customers while {14, 15} are the route splitters. Route 1 begins at the depot then visits customers 3 and 5 belonging to the clusters V1 , respectively V2 and returns to the depot. Route 2 starts at the depot and visits the customers 6-7-11 belonging to the clusters V3 − V4 − V5 . Finally, in route 3 only customer 13 from the cluster V6 is visited. Some routes in the chromosome may cause the vehicle to exceed its capacity. When this happens, and to guarantee that the interpretation is always a valid candidate solution, we perform the following modiﬁcation: the route that exceeds the vehicle capacity is split in several ones. Assuming that the original route is composed by the following ordered set of nodes {i1 , ..., ik , ik+1 , ..., ip }, and that the vehicle capacity is exceeded at node ik+1 , it will be divided in two parts: {i1 , ..., ik } and {ik+1 , ..., ip }. If necessary, further divisions can be made in the second part. Initial population. The construction of the initial population is of great importance to the performance of GA, since it contains most of the material the ﬁnal best solution is made of. In our algorithm, we have produced 20 initial solutions generated randomly: by selecting randomly the nodes from each clusters and the collection of generalized routes. The fitness value. Every solution has a ﬁtness value assigned to it, which measures its quality. In our case the, the ﬁtness value of a GVRP is given by the total cost of traveling for all the vehicles, i.e. the objective function of the integer programming model presented in the previous section. The aim is to ﬁnd a collection of routes with minimum cost. Crossover operator. Two parents are selected from the population by the binary tournament method, i.e. the individuals are chosen from the population at random. The one with better ﬁtness value is chosen as the ﬁrst parents. Oﬀspring are produced from two parent solutions using the following 2-point order crossover procedure described by Matei in [4]: it creates oﬀspring which preserve the order and position of symbols in a subsequence of one parent while preserving the relative order of the remaining symbols from the other parent. It is implemented by selecting two random cut points which deﬁne the boundaries for a series of copying operations. First, the symbols between the cut points

124

P.C. Pop et al.

are copied from the ﬁrst parent into the oﬀspring. Then, starting just after the second cut-point, the symbols are copied from the second parent into the oﬀspring, omitting any symbols that were copied from the ﬁrst parent. When the end of the second parent sequence is reached, this process continues with the ﬁrst symbol of the second parent until all the symbols have been copied into the oﬀspring. The second oﬀspring is produced by swapping round the parents and then using the same procedure. Next we present the application of the proposed 2-point order crossover in the case of the problem presented in ﬁgure 1. We assume two well-structured parents chosen randomly, with the cutting points between nodes 2 and 3, respectively 5 and 6: P1 = 13 0 | 3 5 0 | 11 7 6 P2 = 4 2 | 13 0 11 | 10 6 Note that the length of each individual diﬀers according to the number of routes. According to ﬁgure 1, the cluster representation of the parents is as follows: C1 = 6 0 | 1 2 0 | 5 4 3 C2 = 2 1 | 6 0 5 | 4 3 The sequences between the two cutting-points are copied into the two oﬀspring: O1 = x x | 3 5 0 | x x x O2 = x x | 13 0 11 | x x The nodes of the parent P1 are copied into the oﬀspring O2 if O2 does not contain already nodes in the same clusters as the nodes of P1 . The sequence of the nodes of P1 is 13 − 0 − 3 − 5 − 0 − 11 − 7 − 6, and the clusters are 6 − 0 − 1 − 2 − 0 − 5 − 4 − 3. Note that the cluster 6 is already represented in O2 by the node 13 and the cluster 5 by the node 11. The depot (0) is kept any way as it is a splitter. Therefore the remaining sequence of nodes in P1 is 0 − 3 − 5 − 0 − 7 − 6. Therefore the oﬀspring O2 is: O2 = 0 3 | 13 0 11 | 5 0 7 6 Then the nodes of the parent P2 are copied into the oﬀspring O1 in the same manner. The nodes of the clusters not present in O1 are copied into the remaining positions: O1 = 13 0 | 3 5 0 | 11 10 6 Mutation operator. We use in our GA two random mutation operators: the ﬁrst one (intra-route mutation) selects randomly a cluster to be modiﬁed and replaces its current node by another one randomly selected from the same cluster and the second one (inter-route mutation) is a swap operator, it picks two random locations in the solution vector and swaps their values. The new chromosome is accepted directly if it results in a feasible GVRP, otherwise the route that exceeds the vehicle capacity is split in several ones as describe before. From the construction of the initial population and the deﬁnition of the crossover and mutation operators we can see that the constraints of the GVRP are fulﬁll. The developed GA uses the steady-state approach, in which eligible oﬀspring enter the population as soon as they are produced, with inferior individuals being removed at the same time, so that the size of the population remains constant.

A Genetic Algorithm for Solving the Generalized Vehicle Routing Problem

5

125

Computational Results

The performance of the proposed GA for GVRP was tested on seven benchmark problems drawn from TSPLIB library test problems [7]. These problems contain between 51 and 101 customers (nodes), which are partitioned into a given number of clusters, and in addition the depot. Originally the set of nodes in these problems are not divided into clusters. The CLUSTERING procedure proposed by Fischetti et al. [2] divide data into nodesets. This procedure sets the number of clusters s = [ n5 ], identiﬁes the s farthest nodes from each other and assigns each remaining node to its nearest center. The solution proposed in this paper is able to handle any cluster structure. The next table contains the description of the GVRP instances addressed in this paper, which are the same to those considered by Pop et al. [5]. The meaning associated with the columns in Table 1 is as follows Problem: the name of the test problem contains the number of clusters (ﬁrst digits in the problem name) and the number of nodes (last digits in the problem name), VR: the minimal number of vehicles needed for a route in order to cover even the largest capacity of a cluster (VR=Vehicles/Route) and Q : the capacity Q · V R, where Q is the capacity of a vehicle available at the depot. The testing machine was an Intel Dual-Core 1,6 GHz and 1 GB RAM. The operating system was Windows XP Professional. The algorithm was developed in Java, JDK 1.6. Table 1. Problem characteristics for the ant-based algorithms for GVRP Problem 11eil51 16eil76A 16eil76B 16eil76C 16eil76D 21eil101A 21eil101B

VR 2 2 3 2 2 2 2

Q 160 140 100 180 220 200 112

Q’ No.vehicles No.Routes 320 6 3 280 10 5 300 15 5 360 8 4 440 6 3 400 8 4 224 14 7

Table 2. Best Values and Times - ACS and GA algorithms for GVRP Problem 11eil51 16eil76A 16eil76B 16eil76C 16eil76D 21eil101A 21eil101B

ACS Time ACS 418.85 212 668.78 18 625.83 64 553.21 215.00 508.81 177.00 634.74 72 875.58 8.00

GA Time GA 237.00 7 583.80 18 540.87 95 336.45 50 295.55 12 476.98 38 664.45 55

126

P.C. Pop et al.

The parameters of the GA are critical as in all other metaheuristic algorithms. Currently there is no mathematical analysis developed to give the optimal parameter in each situation. In the genetic algorithm for GVRP, the values of the parameters were chosen as follows: population size 20, the number of oﬀspring 40, the maximum number of generations 100, the intra-route mutation rate 5% and the inter-route mutation rate 5%. In the next tables are shown the computational results obtained for solving the GVRP using the proposed GA algorithm comparing with the ACS algorithm [5]. Analyzing the computational results, it results that overall the proposed GA performs better that the ant colony algorithm developed by Pop et al. [5] in terms of solution quality. The results obtained using the GA are facilitated by a better exploration of the search space.

6

Conclusions

The Generalized Vehicle Routing Problem is an extension of the Vehicle Routing Problem (VRP) and consists in designing optimal delivery or collection routes, subject to capacity restrictions, from a given depot to a number of predeﬁned, mutually exclusive and exhaustive node-sets (clusters). We described a new model of the GVRP based on integer programming and a GA in order to solve the problem. The GA that has been described here performs well in terms of solution quality and running time in comparison with the ant colony system developed by Pop et al. [5]. Therefore, it has been demonstrated that the GA is an eﬀective approach for solving the GVRP. Acknowledgment. This work was partially supported by CNCSIS-UEFISCSU, project number PNII IDEI 508/2007.

References 1. Baker, B.M., Ayechew, M.A.: A genetic algorithm for the vehicle routing problem. Computers & Operations Research 30, 787–800 (2003) 2. Fischetti, M., Salazar, J.J., Toth, P.: A branch-and-cut algorithm for the symmetric generalized traveling salesman problem. Operations Research 45, 378–394 (1997) 3. Ghiani, G., Improta, G.: An eﬃcient transformation of the generalized vehicle routing problem. European Journal of Operational Research 122, 11–17 (2000) 4. Matei, O.: Evolutionary Computation: Principles and Practices, Risoprint (2008) 5. Pop, P.C., Pintea, C.M., Zelina, I., Dumitrescu, D.: Solving the Generalized Vehicle Routing Problem with an ACS-based Algorithm. American Institute of Physics 1117, 157–162 (2009); Conf. Proc. BICS Tg. Mures November 5-7, 2008 6. Wang, L.-Y., Zhang, J., Li, H.: An Improved Genetic Algorithm for TSP. In: International Conference on Machine Learning and Cybernetics, August 19-22, vol. 2, pp. 925–928 (2007) 7. http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/vrp/

Using Cultural Algorithms to Improve Intelligent Logistics Alberto Ochoa1,2, Yazmani García2, Javier Yañez2, and Yaddik Teymanoglu3 1

Juarez City University, Mexico 2 CIATEC, Mexico 3 Edirne University, Turkey [email protected]

Abstract. Today the issue of logistics is a very important within companies to the extent that some have departments devoted exclusively to it. This has evolved over time and today is a fundamental aspect in the fight business seeking to consolidate or remain leaders in their field. With the above we know that logistics can be divided into different classes, however, in this regard, our study is based on the timely distribution to the customer with a lower cost, higher sales and better utilization of space resulting in excellent service. Finally, prepare a comparative analysis of the results with respect to another method of optimization solution space. Keywords: Logistics, Data Mining, Cultural Algorithms, Population Space, Space of Beliefs, Protocol, Bin Packing, Simplex Method.

1 Introduction Within the area of distribution of purified water, there isn’t methodology to what the Service Logistics and space optimization in delivery vehicles, so that the service within the "La Noria" become at the logic or the need. But not have defined a pattern of optimal service logistics. The optimization problems have been attacked widely in the area of evolutionary computation; this has been due largely to the kindness they have shown to solve such problems. This paper addresses the solution of Logistics Service Based on Data Mining in combination with other techniques such as evolutionary algorithms, as is cultural algorithm, once the tool is implemented together with the Bin Packing is to optimize for the space within distribution vehicles purified water. At present there are many optimization techniques, such is the case of the Simplex Method, and Simplex Method is an iterative process that progressively allows an optimal solution to linear programming problems [1], the main feature of this method is that it attacks the problem by restricting its maximum capacity through the vertices of the same. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 127–134, 2010. © Springer-Verlag Berlin Heidelberg 2010

128

A. Ochoa et al.

2 Methodology Data mining is a process that uses several data analysis tools to discover patterns and relationships in data that can be used to make valid predictions [2]. The foundations of data mining is in the artificial intelligence and the statistical and using the models extracted using data mining techniques which addresses the solution to problems of prediction, classification and segmentation. The process for conducting the data mining are: • • • • •

Selecting the dataset. Analysis of the properties of the data. Transforming the input data set. Select and apply the technique of data mining. Evaluate the results. (See Figure 1).

Fig. 1. Diagram of Data Mining

Data mining aims to generate information similar to that which could generate a human analyst: patterns, associations, changes, anomalies and significant structures. The Cultural algorithms (CA's) are a class of computational models derived from observing the process of cultural evolution in nature as proposed by Reynolds in [3]. They consist of a population and a belief space as shown in Figure (2). The selected individuals from the population space contribute to cultural knowledge through the role of acceptance. Cultural knowledge resides in the space of beliefs where it is stored and updated based on individual experiences of either success or failure. The knowledge in the belief space can also be used to influence their individual memories [4]. The (AC's) in addition to space and space population of beliefs, have a third component of importance: communication protocol, describes how knowledge is exchanged between the first two components. The population space can support any population based on a computational model, such as Genetic Algorithms and Evolutionary Programming (See Figure 2). Cultural Algorithms are a dual system of inheritance that characterizes the evolution of human culture in the macro-evolutionary level, which occurs within the space of beliefs, and micro-evolutionary level, which occurs in the area of population. The knowledge produced in the population that the space in the micro-evolutionary level is accepted or to be passed to the belief space and used selectively to adjust the knowledge structures there [5]. This knowledge can then be used to influence the changes made by the population in the next age.

Using Cultural Algorithms to Improve Intelligent Logistics

129

Fig. 2. Diagram Concept of Cultural Algorithms

Cultural algorithms using five kinds of basic knowledge to generate an adequate solution in the search space where the problem is solved. The sources of expertise include regulatory knowledge (ranges of acceptable behavior), situational awareness (or copies of reports of successful or unsuccessful solutions, etc.), domain knowledge (knowledge of objects in the domain of relations they and their interactions), historical knowledge (temporal patterns of behavior), and topographical knowledge (spatial patterns of behavior) [6]. To make your programming is relatively simple, as shown in the pseudocode in (See Fig 3).

Fig. 3. Cultural Algorithm Pseudocode

The main difference between Cultural Algorithms and Evolutionary Computation other techniques lies in the belief space utilization as well as the cultural influence of the same, as are those that guide us in obtaining optimum best.

3 Tools Developed The prototype (see Figure 4) is an intelligent hybrid system developed with the Java programming language that combines data mining techniques to cultural algorithms. He got a map of Fresnillo and divided into 4 zones. To get the coordinates of each colony, then we started building the data warehouse, organized in the following fields: area, district, year, month, day, time and sales.

130

A. Ochoa et al.

It was necessary the creation and implementation of algorithms capable of finding information in n dimensions, as well as a data clustering algorithm called K-Means [7], which generates data pooling, without predefined classes, based on a function of similarity of the values that have different attributes, done in unsupervised [8] (i.e., discover patterns or trends in the data). K-Means is a partitional clustering method (i.e. we start altogether the particular), where partitioning is performed a database of n objects in a set of k groups, seeking to optimize the chosen partitioning criterion. In K-Means each cluster of data is represented by a centroid. K-Means is trying to form k groups with k predetermined before the start of the process. The goal is to minimize the within-group variance [9]. With the use of these tools Data Mining, the prototype is able to determine (given a zone and a particular date) in which colonies are more likely to occur at a given time sale. Once these colonies is generated a population of n agents (an agent is the computer simulation of a person), which form a society based on cultural algorithms, which are responsible for determining, over the ages, the course optimal performance. At the time zero (when initializing the program, and the agents beliefs have an empty space), all agents obtain the information generated from each colony, each propose a route, it will lead to negotiations between the agents to select the best route proposed at a certain time, the belief space will be updated only when the proposed route is better than the previously stored in the belief space, beginning a cycle of improvements that will be interrupted when they occur many times m(iterations in the behavior of agents) without improvements to the paths or when a stop condition is performed. Once the data mining software, proceeded to the development of software for Bin Packing, the prototype is a hybrid intelligent system developed in Java, using the technique of Cultural Algorithms. It began with the taking of measures of distribution vehicle on which the study was conducted to determine the space with which states in m 3 and measuring the various presentations and their respective volume, for in doing so raise the problem and their respective restrictions. Also token into account the demand of different presentations, to thus more accurately determine a product's usefulness. Table 1. Product Description

Once the measurements were made, it was necessary to create an algorithm capable of finding the right mix in terms of cargo is concerned, so as to optimize the gain of the pickup and hence of the company, the algorithm uses a population basis, and initializes the other as an area of belief at that time its value is unknown.

Using Cultural Algorithms to Improve Intelligent Logistics

131

It makes adjusting the demands for this way fine-tune the value, which based on the percentage of sales of products. Initial population is evaluated based on the problem and the same restriction as shown in the table 1.

Where: r: v: m: V:

Profit per unit. Volume of each unit. Are the units of each product type. Maximum volume capacity.

Any condition which violates the restrictions will be penalized so that only the best combinations are obtained and thus we get an average by which we can have the best individuals and thus influence the next generation based on the mean individuals. The result (Epochs) is the proposed solution will stop condition from which is repeated 7 times without change, i.e., be = > to previous.

4 Results The prototype used a database of sales generated at random, with it launched the system functions: information classification, clusters, the generation of routes (see Figure 4).

Fig. 4. Generating optimal routes

The system will determine the colonies in which sales have been registered within the specified date range. Based on the number of colonies and vehicle using the K-Means algorithm clusters to be generated immediately after being delivered to a

132

A. Ochoa et al.

society formed by artificial intelligent agents (representatives of a group of individuals), which will determine the most optimal route for each cluster (see Fig 5).

Fig 5. Graphic Convergence Times represented in époques

Moreover, the software for the Bin Packing use an initial population of 100 individuals and an area of beliefs with the same number of individuals, and that they could launch the system where the initial population is initialized, the Space of Beliefs, evaluated the results thus able to apply variation operators under the influence of belief space and get different times to achieve the status of unemployment or the convergence of results. The program determines the best combination of load so as to maximize his profit, that based on the volumes handled for each presentation and the capacity of trucks in m 3 and the demand for each presentation (see figure 6).

Fig. 6. Generation of Bin Packing

Using Cultural Algorithms to Improve Intelligent Logistics

133

At the time 33 was found the best result after 8 times without change, that the following responses: Table 2. Results

However, making a comparison with what is the Simplex Method, we can see that this method gives us less effective results, as we see in Table 3. Table 3. Result of the Simplex Method

As we can see from the table, the maximum utility proposed by the Simplex Method is $ 615.8346, i.e. less than the $ 237.76 proposed by our program.

5 Conclusions This research is being used within the purified "La Noria" trying to demonstrate that data mining can be used to increase sales and have a better logistics services, this software has a high value added for the generation and analysis logistics coupled with cultural algorithms responsible for the creation of routes, founded as a tool for decision making, based on the data generated daily by mobile sales. This is intended to provide the product to the larger population that requires service. Similarly, using Bin Packing algorithm optimizes the space within the distribution units and thus provide the company a way to optimize new and creative through the use of this heuristic. It concludes with the work that has been done for this kind of logistics in terms of logistics service and space optimization of delivery vehicles is satisfactory and allows the tool to see implemented are good choices. These tools have a high added value because it had not previously been used for this purpose, this further if we consider that until recently there was no practical implementation of these algorithms.

134

A. Ochoa et al.

References 1. Taha, H.: Investigación de Operaciones. Séptima edición, México D.F, pp. 71–90. Prentice Hall, Englewood Cliffs (2004) 2. http://www.daedalus.es/mineria-de-datos/ (last access: 2010) 3. Reynolds, G.R., Sverdlik, W.: Problem Solving Using Cultural Algorithms. In: International Conference on Evolutionary Computation (1994) 4. Ochoa, A., Gonzalez, S.: Simulación Social de una Sociedad Artificial basada en Algoritmos Culturales, IJSA 2011-0626. International Journal of South American Archeology (2009) 5. Reynolds, G.R., Peng, B., Whallon, R.: Emergent Social Structures in Cultural Algorithms (2005) 6. Gill, S., et al.: Data Warehousing. La integración de la información para la mejor toma de decisiones. Prentice Hall, México (1996) 7. Ajith, A., et al. (eds.): Swarm Intelligence in Data Mining, Berlin, Germany. SCI, vol. 34, p. 270. Springer, Heidelberg (2006) 8. Landa-Becerra, R.: Uso de Información del Dominio para Mejorar el Desempeño de un Algoritmo Evolutivo. CINVESTAV PhD Thesis (2007)

A Cultural Algorithm for the Urban Public Transportation∗ Laura Cruz Reyes1, Carlos Alberto Ochoa Ortíz Zezzatti2, Claudia Gómez Santillán1,3, Paula Hernández Hernández1, and Mercedes Villa Fuerte1 1 Instituto Tecnológico de Ciudad Madero, División de Estudios de Posgrado e Investigación Juventino Rosas y Jesús Urueta s/n, Col. Los mangos, C.P. 89440, Cd.Madero,Tamp, México 2 Instituto de Ingeniería y Tecnología, Universidad Autónoma de Ciudad Juárez, Henry Dunant 4016, Zona Pronaf, C.P. 32310, Cd. Juárez, Chihuahua, México 3 Instituto Politécnico Nacional, Centro de Investigación en Ciencia Aplicada y Tecnología Avanzada. Carretera Tampico-Puerto Industrial Altamira, Km.14.5. Altamira,Tamp., México [email protected], {megamax8,cggs71,paulahdz314,pvmercedes}@hotmail.com

Abstract. In the last years the population of Leon City, located in the state of Guanajuato in Mexico, has been considerably increasing, causing the inhabitants to waste most of their time with public transportation. As a consequence of the demographic growth and traffic bottleneck, users deal with the daily problem of optimizing their travel so that to get to their destination on time. To give a solution to this problem of obtaining an optimized route between two points in a public transportation, a method based on the cultural algorithms technique is proposed. Cultural algorithms are used in the generated knowledge in a set of time periods for a same population, using a belief space. These types of algorithms are a recent creation. The proposed method seeks a path that minimizes the time of traveling and the number of transfers. The results of the experiment show that the technique of the cultural algorithms is applicable to these kinds of multi-objective problems. Keywords: Cultural Algorithms, Evolutionary Algorithms, Optimization, Transport, Population, Beliefs, Agent.

1 Introduction The population growth has increased the traffic jam and, as a result, the demand of public services, such as transportation, is rising rapidly. In these circumstances, an effective public transportation is a necessity. To move from one place to another users are faced with the problem of optimizing resources in a transport system with many kinds of goals; for example, to minimize the number of stations, path length, travel mode, and travel time. Many times, solving this routing problem involves conflicts in goals. A solution with the fewer number of visited stations and the highest travel time is an example of conflict in goals. ∗

This research was supported in part by CONACYT, DGEST and IPN.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 135–142, 2010. © Springer-Verlag Berlin Heidelberg 2010

136

L.C. Reyes et al.

Fort the multi-objective nature, the public transportation problem can be solved through evolutionary computation. The use of evolutionary algorithms has increased to solve several problems of multi-objective optimization which occur in the society, among the public transportation optimization. A wide bibliographical revision revealed that Cultural Algorithms (CAs) have not been applied yet to the public transportation problem. Reynolds developed the cultural algorithms as a complement to the metaphor which inspired the evolutionary algorithms (natural selection and genetic concepts) [3]. CAs are an evolutionary computation technique, that uses the knowledge that has been generated in several times (same individuals, but in different time-space), for the same population, using a belief space, this characterizes to CAs, and they make an area of great interest for research. Two problems that have been implemented CAs are: Diorama’s Representation using a Mosaic Image [4] and Public Security System [6]. For the previously exposed, in this article is presented a solution method for the public transportation problem based in CAs. The proposed method looks for a path that minimizes the travel time and the number of transfers, which are objectives in conflict. The Public Transportation Problem is presented as a case of study. On one hand system users save money using the path with less transfers and the other hand, they save time using the path of shortest travel time. When looking for the optimal path, first the algorithm considers the path with the least travel time, and then minimizes the number of transfers.

2 Problem Description Instance: Given a transport network formed by a directed graph G= {V, E}, where V is a set of vertexes that represent the bus stations, and a set of edges E that represent the routes that connect the stations. The V contains the information of the previous and next stop. The E is identified with the route number NR that joins the stations, the sense that follows the route and the time T required between two stations. An initial station v0, an end station vn, a time t0 and a date of consultation is provided. The search begins from v0, covering the edges v0 + i that allows it to arrive vn, accumulating ta= ta+i that took to arrive and the number of route changes or transfers done nr= nr+1. The process finishes when vi = vn. Objective: To Find a set of routes NR and continuous terminals for going from v0 to vn, in such a way, that minimizes ta and NR, to arrive vn.

3 Description of Cultural Algorithms Cultural algorithms employ a basic set of knowledge sources, each related to knowledge observed in various social species. These knowledge sources are then combined to direct the decisions of the individual agents in solving optimization problems [1]. Cultural algorithms consist of two main components: a population space, belief space and a communication protocol [2]. It is the following:

A Cultural Algorithm for the Urban Public Transportation

137

Population space, this space maintains a set of individuals, which represent potential solutions to the problem. Each individual possesses its characteristics, and these characteristics define its fitness in the environment. Through different time periods called generations, individuals can be replaced by their descendants, obtained by means of the application of the operator that somehow affects the population [3]. Belief space is the space where the knowledge acquired by the individuals through the generations will be stored, and this must be accessible to any individual in the population, and can be used to influence its behavior, for example to modify its characteristics and then modify its fitness [3]. Communication protocol describes the way in which the knowledge is exchanged between the population space and belief space [2]. It states the rules about the individuals that can contribute to the belief space with their experiences through the acceptance function. The influence function through the belief space can influence to the new individuals [5]. Algorithm 1. Pseudo code of a cultural algorithm [3]. 1 2 3 4 5 6 7 8 9

initialize_population (P = {x1, . . . , xμ}) evaluate_population (P) initialize_beliefspace(B) repeat update(B, accept(P)) P’ = influence(operators(P)) evaluate_population (P’) P = selection(P’) until the termination condition is achieved

The Algorithm 1 shows the main steps of a Cultural Algorithm. Most of the steps of a cultural algorithm are similar to those of a standard evolutionary algorithm, for example lines 1 and 2. The novel thing of these algorithms is the use of a belief space B. The line 3 is the initialization of the belief space. The details of this procedure depend on the structure of the belief space, and may be different in each case. The update (also called adjust) function of the belief space is in the main loop (line 4 to line 9). The algorithm incorporates new experiences to the belief space, from a selected group of individuals P. In the line 5 the function accept, selects the individuals of this group which are chosen from the population. In the line 6, the variation operators of the algorithm are modified by the function influence. Finally in the line 7, the population influenced by the belief space is evaluated to choose the best option. The selection function generates a solution to the search space. It is influenced from five sources of knowledge basic [7]. The best solution will be part of the belief space; this step is shown in line 8. The Cultural knowledges of any cultural evolution model are: situational, normative, topographic, historical or temporal, and domain knowledge [7]. A representation of the interactions in the elements of Cultural Algorithms is shown as a diagram in Figure 1 [3].

138

L.C. Reyes et al.

Fig. 1. Representation of Cultural Algorithms

4 Experiments and Results The Public Transportation Problem (Characterized by Routing and Scheduling) is presented as a case of study. On one hand system users save money using the path with less transfers and the other hand, they save time using the path of shortest travel time. When looking for the optimal path, first the algorithm considers the path with the least travel time, and then minimizes the number of transfers. Transport net is represented by a graph G, and for modeling had been employed 3 tables described bellow: tbase_rutas: contains general information about the routes. This table is formed by an identifier field that is used to link other tables; a field that contains the route name that is used to indicate the user the number of the route (bus) that he must take; a field that stores the route type as an additional information that identifies the bus for the user and finally a field that contains the route direction that is used in the intern process of solution construction, this information also presented to the user. tparadas: contains information about the realized stations for each route that is important for the tour's construction, from which is taken the specification about the places of the needed stations for the transfers that are suggested to the user. This table is formed by an identifier field of the route that is used to link other tables to obtain additional information; a field that contains the name of each station that the bus does; a field that contains the previous station to the current station; a field that contains the next station to the current station and finally a field that stores the necessary time for arriving from the current station to the next station with the current route. thorarios: contains schedule information of the beginning and end daily departures for each route. This table is formed by an identifier field for the route, with the object to obtain additional information; a field to store the place of the route departure; a field that contains the first departure hour of the route; a field to store the last departure hour of the route; a field to store an interval time for the departure of each route; a field that contains the working days for that route and finally a field to specify if the route departs from the origin or the destination of the route. For the data base development, a route map of the city and the service schedule for the routes are provided; using for the table's filling the information described before. A part of the map that was included in the application is in the Figure 2.

A Cultural Algorithm for the Urban Public Transportation

139

Fig. 2. Map of the used quadrant for the test and prototype elaboration. The brought out lines with dark color are the streets where the public transportation bus passes by.

After we have filled the tables, the next step is the construction of waypoints (routes). "Algorithm 2" shows the main steps to construct a route. Algorithm 2. The construction of a route. GenerateRoute (Origin, Destination, Time, Date) 1 initialize pActual=Origin, pPrior=Origin, Routes="", Stations="" 2 do 3 consult existence of routes that pass through p_Actual and Destination and available for time and date requested. 4 IF exist 5 route=selectQueryPath() 6 ELSE 7 consult if routes that pass through pActual exist and available for time and date requested. 8 route=selectQueryPath() 9 pActual=next(pActual) 10 pPrior_next=next(pPrior) 11 IF(pActual=pPrior_next) 12 route=routePrior 13 IF change of route { 14 Routes=routes+route 15 Stations=Stations+pActual } 16 until pActual=Destination

140

L.C. Reyes et al.

In the algorithm, a consult is made (lines 3 and 5) with the table tparadas to check the existence of any route that passes through the actual stop and that it also passes through destination stop. If several routes exist, you choose one of them; we execute the opposite case (lines 7 and 8), where we verify again the table tparadas to see routes that pass through that stop and we select any of those routes. By means of the table tparadas (lines 9 to 12) a search for the next stop according to the chosen route is done. If in the selected route, its next stop is the same as the next stop from the previous route, then the previous route is selected, this is done to reduce the number of transfers. The solution keeps storing every time a transfer occurs (lines 13 to15); this process repeats itself until we arrive to the destination that checks in line 16. For the elaboration of the tests, we took in count the next requests as instances. In Table 1, we only show an example of a request. The requests were selected for these tests due to the traveled distances for getting to the original destiny are long, besides that in this places many options of routes pass through, this is why that possibilities of arriving to the destination are plenty. We carry on 5 executions of the application per instance. We made requests for 3 different hours so to observe the results accomplished in each execution. Table 1. Instance Request example Origin: Destination: Hour: Date: Agents: No. improvement periods:

EYUPOL.Popular PASEO DE JEREZ 12:00 01/08/2009 3 9

After the execution of the application with the previously described data, we obtain the results shown in Table 2. This results show us estimated time in minutes and the number of used buses, after executing the instance example 5 times. In Table 3 we can observe the evolution of periods and how their solution improves by measuring estimated time in minutes and the number of used buses, for the instance example and only one execution. Table 2. Results of 5 executions of the application for the instance example Origin EYUPOL.Popular EYUPOL.Popular EYUPOL.Popular EYUPOL.Popular EYUPOL.Popular

Destination PASEO DE JEREZ PASEO DE JEREZ PASEO DE JEREZ PASEO DE JEREZ PASEO DE JEREZ

Time estimated in minutes 15 15 15 15 15

Number of used buses 2 2 2 2 2

Table 4 shows a concentrate of all requests executing 5 times each. Using the same Origin and Destination but changing the hour of request. Obtaining the average of time in minutes and used buses. After the result analysis we observe at what hour the request is a meaningful factor for the experiment's efficiency.

A Cultural Algorithm for the Urban Public Transportation

141

Table 3. Results by period of the first execution for the instance example Period

Origin

1

EYUPOL.Popular

PASEO DE JEREZ

39

8

2

EYUPOL.Popular

PASEO DE JEREZ

33

6

3

EYUPOL.Popular

PASEO DE JEREZ

15

2

4

EYUPOL.Popular

PASEO DE JEREZ

15

2

EYUPOL.Popular

PASEO DE JEREZ

15

2

EYUPOL.Popular

PASEO DE JEREZ

15

2

5 6 7

Destination

Time estimated in minutes

Number of used buses

EYUPOL.Popular

PASEO DE JEREZ

15

2

8

EYUPOL.Popular

PASEO DE JEREZ

15

2

9

EYUPOL.Popular

PASEO DE JEREZ

15

2

10

EYUPOL.Popular

PASEO DE JEREZ

15

2

11

EYUPOL.Popular

PASEO DE JEREZ

15

2

Table 4. Test results with different instances and hours Destination

Date

Hour

Time Avg. in minutes

Avg. of used buses

San Pedro

Industrial Delta.Beta

01/08/2009

06:00

14.4

2.6

San Pedro

Industrial Delta.Beta

01/08/2009

12:00

14.6

2.4

San Pedro

Industrial Delta.Beta

01/08/2009

22:00

14.2

2.6

ITL.Prol.Españita

01/08/2009

06:00

26.6

4

ITL.Prol.Españita

01/08/2009

12:00

25.8

4

ITL.Prol.Españita

01/08/2009

22:00

27

3.6

PASEO DE JEREZ

01/08/2009

06:00

15

2

Origin

CERRITO DE JEREZ.Cerrito de Jerez CERRITO DE JEREZ.Cerrito de Jerez CERRITO DE JEREZ.Cerrito de Jerez EYUPOL.Popular EYUPOL.Popular

PASEO DE JEREZ

01/08/2009

12:00

15

2

EYUPOL.Popular

PASEO DE JEREZ

01/08/2009

22:00

16

2.5

DEPORTIVA

DELTA.Estacion

01/08/2009

06:00

9.6

1.4

DEPORTIVA

DELTA.Estacion

01/08/2009

12:00

9.4

1.6

DEPORTIVA

DELTA.Estacion

01/08/2009

22:00

9.2

1.8

5 Conclusions and Future Work For solving the path generating problem of public transportation is proposed a Cultural Algorithm. In general, the Cultural Algorithms consist of three elements: population space, belief space and the communication protocol. The proposal is centered in

142

L.C. Reyes et al.

the adaptation of the belief space. The algorithm takes the best proposed path in each period. The experimentation results show that this path has a big influence in the generation of new paths for subsequent periods. Because of that, we can assure that the belief space is a decisive factor for finding the best solution. The proposed algorithm obtains good results in comparison with handbook estimates done by experts in the area of generating transportation paths. In addition, this analysis confirms that the parameter of application time is a significant factor to generate the problem solution, because when this parameter changes the solution changes considerably too. As future work, it is intended to expand the experimentation with other kind of evolutionary algorithms and other instances, because it was considered only one quadrant of the Leon City, Guanajuato. For this study is necessary to extend the database to include all Public Transportation Infrastructure of the City. Another line of work is the extension of the Cultural Algorithm to incorporate objectives and restrictions that are also present in the system in study.

References 1. Reynolds, R.G., et al.: Cultural Algorithms: Knowledge-driven engineering optimization via weaving a social fabric as an enhanced influence function. In: IEEE Congress on Evolutionary Computation 2008 (2008) 2. Ochoa, A.: Algoritmos Culturales. Gaceta Ideas Concyteg 3(31) (2008) 3. Becerra, R.L.: Doctoral thesis: Use of Domain Information to Improve the Performance of an. In: Center for Research and Advanced Studies of The National Polytechnic Institute of Mexico, Computer Science Department, Mexico City, Mexico (June 2007) 4. Ochoa, A., et al.: Evolving Optimization to Improve Diorama’s Representation using a Mosaic Image. Journal of Computers 4(8) (August 2009) 5. Becerra, R.L., et al.: Use of Domain Information to Improve the Performance of an Evolutionary Computation. In: Genetic And Evolutionary Computation Conference. Proceedings of the 2005 Workshops on Genetic and Evolutionary Computation (2005) 6. Ochoa, A., et al.: Sistema de Seguridad Pública basado en inteligencia artificial. Publicación Trimestral de ADIAT, Año VIII, Núm. 33, Enero –Marzo de (2009) 7. Reynolds, R.G., Kobti, Z., Kohler, T.: Agent-Based Modeling of Cultural Change in Swarm Using Cultural Algorithms. In: Proceedings of SWARMFEST 2004, May 9-11. University of Michigan, Ann Arbor (2004)

Scalability of a Methodology for Generating Technical Trading Rules with GAPs Based on Risk-Return Adjustment and Incremental Training E.A. de la Cal1 , E.M. Fern´ andez1 , R. Quiroga2, 1 J.R. Villar , and J. Sedano3 1

Computer Science Department, University of Oviedo, Campus de Viesques, 33203 Gijn, Spain 2 Cuantitative Economy Department, University of Oviedo, Campus del Cristo, 33006 Oviedo, Spain 3 Instituto Tecnologco de Castilla-Len, Lopez Bravo 70, Pol. Ind. Villalonquejar, 09001 Burgos, Spain

Abstract. In previous works a methodology was deﬁned, based on the design of a genetic algorithm GAP and an incremental training technique adapted to the learning of series of stock market values. The GAP technique consists in a fusion of GP and GA. The GAP algorithm implements the automatic search for crisp trading rules taking as objectives of the training both the optimization of the return obtained and the minimization of the assumed risk. Applying the proposed methodology, rules have been obtained for a period of eight years of the S&P500 index. The achieved adjustment of the relation return-risk has generated rules with returns very superior in the testing period to those obtained applying habitual methodologies and even clearly superior to Buy&Hold. This work probes that the proposed methodology is valid for diﬀerent assets in a diﬀerent market than previous work.

1

Introduction

Most publications on trading rules generation by soft computing techniques, GP systems [1], [2], [3] establish as a unique objective of the genetic algorithm, the maximization of the excess of return over Buy&Hold1 . That is, maximize the return of the rules in the training period without taking into account the investor’s risk. The most complete study including risk in this ﬁeld is that of [4] which studies the performance of the obtained rules minimizing risk.To this aim it applies diﬀerent risk measures (among which is Sharpe’s ratio), but, leaving aside return 1

Strategy which is used as a benchmark in this ﬁeld and which consists in the return which an investor would have obtained if they had bought at the beginning of the reference period and sold at the end. In this way the eﬃciency of the trading undertaken can be measured.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 143–150, 2010. c Springer-Verlag Berlin Heidelberg 2010

144

E.A. de la Cal et al.

as an aim and obtaining very similar results to those published maximizing return. It is the case of [5] where the relation between risk and return in the evolution of the rules is taken into account, using the drawdown (longest losing streak in the period of study) as a measurement of risk but obtaining very varied results. Maximization of return, which is the objective function of most works, leads to trading rules with a very high risk which in the medium/long term may generate losses, also the rules whitch only considere de level of risk cannot generate good returns either, [6]. Therefore, we consider it fundamental to contemplate the relationship between risk and return in the training of the rules This work proposes the extension of the methodology presented in [6] to a group of diﬀerent assets belonging to Spanish IBEX35 index. To check the eﬃciency of the methodology, the obtained results will be compared with those obtained applying the habitual methodologies in the literature. The work is structured as follows: in Section 2 we describe brieﬂy our methodology. In Section 3 a comparative study between the results for the S&P500 index and the new assets presented in the present work will be included. The work ﬁnishes with conclusions and future work in Section 4.

2

A Methodology for Generating Technical Trading Rules

The methodology based on genetic algorithms for the automatic generation of trading rules presented in [6] is composed of two parts: – An algorithm for the search of trading rules of stock market assets in the medium/long term, based on GAPs [7]. The design steps of this algorithm appear in Sections 2.1, 2.2 and 2.3. – An incremental training technique deﬁned in [6]. The result of the search carried out by the GAP algorithm will be a trading rule which provides signs of purchase/sale about a speciﬁc asset. 2.1

Description of the Trading Rules: Fenotype

The trading rules generated by the algorithm are represented as a decision tree composed of arithmetical operators: +, −, ∗,/, a(1/2) , log and ab ; comparative operators: =, <, <=, >, >=; logical operators: and / or; technical stock market indicators which may have one or two parameters, integer or real, which will be tree leaf nodes (rule). A trading rule is applied for each observation of the data series, whether it is training or test. As a result of this application, a TRUE value (which indicates purchase) or a FALSE value (which indicates sale) is obtained. Given that we always invest all the available capital, if we have taken a purchase position, and the rule returns TRUE, we cannot increase the investment, therefore we will do nothing. Likewise, whenever we adopt a sale position we will sell all the capital, so if after a sale the rule returns FALSE, we cannot sell more.

Scalability of a Methodology for Generating Technical Trading Rules

2.2

145

Genetic Formulation: Genotype and Crossover and Mutation Operatores

Each individual GAP will be formed by: – A decision tree (GP) with limited height of 10 and limited size of 50 nodes. – A GA chain with length of s · (n + 1), being the length of each segment of the chain and the number of technical indicators. The GA chain will be divided into n + 1 segments of s length. The ﬁrst segment is dedicated to real coeﬃcients for the arithmetical operations and each of the n following segments will contain the suitable parameters for each type of technical indicator. In the implementation developed for this work will be 3. We have selected a cross operator which crosses either the GP part or the GA chain with the same probability. For the GP part a one-point cross operator [8] will be employed and for the GA part a uniform arithmetical cross operator is used [9]. The selection method for the nodes to be crossed is uniform sampling without replacement. In the case of the mutation operator a coin is tossed to decide whether the GP part or the GA is selected to be mutated. 2.3

Evaluation Function

In this work we have used the yearly Sharpe ratio as a risk measurement, [10]. This index was developed by Sharpe [11] and it measures the relationship between the return and its historic volatility. That is, between the return and the risk assumed to achieve it. The greater the value of this index, the lower the assumed risk, as the annual returns of this rule will be more homogeneous. GAP evaluation function will take into account both return and risk. According to this, two objectives are deﬁned, namely F itnessminrisk and F itnessmaxreturn . A priori, it could be thought that the suitable Fitness is a Pareto multiobjective evaluation function2 , [12], [13]. However, the preliminary tests carried out showed a divergent evolution of both objectives, obtaining high return and high levels of risk. For that reason, a sequential optimization of the objectives was chosen, in which risk is ﬁrst minimized to an intermediate generation (C) and from then on return is maximized. The ﬁrst objective to be optimized will be F itnessminrisk . During this phase, we will minimize the level of risk assumed by the rules, maximizing its Sharpe ratio. The maximum Sharpe ratio obtained until the generation C (RSmax ) will be assumed as the risk threshold for the second phase, during which the optimization of the second objective, F itnessmaxreturn , will be carried out. That is, if during the ﬁrst phase an RSmax of 2.5 is obtained, in the second phase all the rules whose Sharpe ratio is inferior to 2.5 will be discarded. 2

The chosen multi-objetive algorithm was the SPEA - Strength Pareto Evolutionary Algorithm, one of the earliest technique.

146

E.A. de la Cal et al.

The greater C, the greater RSmax will be and, therefore, the risk level will be lower. In practice, we will call S the number of generations during which risk is minimized and R the number of generations during which return is maximized, maintaining RSmax as the level of risk. Varying the values of S and R we will carry out diﬀerent combinations of return-risk and we will see what the inﬂuence is of such variation on the return of the generated rules. By taking into account both factors in the training of the rules, we expect to obtain rules with low risk and high return. The total number of generations of the algorithm will be G = S + R. According to the values of S and R we will deﬁne three ﬁtness functions: – F itnessreturn : S = 0, G = R; Only return is taken into account as an objective. – F itnessrisk : R = 0, G = S; Only risk is taken into account as an objective. – F itnessreturn risk : G = S + R; Function proposed in this work, where both return and risk are taken into account. considering F itnessreturn and F itnessrisk have been proposed in the literature in [1] and in [4] respectively. Here, they are expressed in terms of the proposed ﬁtness F itnessreturn risk . 2.4

Incremental Training

In this work we propose a standard to homogenize the size of the training and testing periods, based on the incremental training technique, which consists in: a) for training we will use periods of ten years. We consider that this number of years provides a great deal of information to the algorithm (to the technical indicators); b) as a test period we use lengths of one year, due to the fact that the stock market is very variable and we consider that a rule may lose eﬀectiveness in a longer period of time. Each year a new rule is generated, considering the ten previous years (the ﬁrst year of the training set is eliminated and the year prior to the test set is added) and in this way the information which is used for the evolution of the rules is updated and adapted to the market variations. Incremental training permits a better adjustment of the rules to the test period, increasing the RM A. However, in standard training the generated rule obtains a lower RM A in the test period. We have taken as a measure of overﬁtting the expression: RM A(T raining) SE = RM A(T est) According to the results shown in the next section, the level of overﬁtting is markedly inferior in the tests carried out using incremental training compared with those using standard training. This is equivalent to an SE value close to 1.

Scalability of a Methodology for Generating Technical Trading Rules

3

147

Numerical Results

The tests have been carried out using as a training period the closing prices of the index and assets3 from 06/01/1988 until 31/12/1997 (ten years) and as a test period from 02/01/1998 until 30/12/2005 (eight years). Each test is repeated ten times. In other words, ten rules are obtained and the mean of the results for each rule is calculated. The algorithm proposed in this work has been implemented under Java with the help of the development environment of Keel software [14]. The trading parameters and GAP conﬁguration are the same as the ones used in [6]. 3.1

Results for the New Assets

This work tries to prove the validity of the methodology presented in [6] for diﬀerent assets from a new market, so a group of representative assets of the Spanish IBEX35 index are selected: TEF, BBVA and BSCH. This section compares the results of applying proposed methodology in three variants (M P1,2,3 )4 , incremental training and grammar directed to the new assets with classical methologies (M C1,2 5 ) Table 1. Comparative study of applying proposed methodology M P3 and its variants M P1,2 with classical methologies M C1,2 on S&500 index (S=30, R=20) and T EF (S=10, R=20). Being r+ : percentage of proﬁtable rules, RS: Sharpe ratio, RM A: percentage of mean annual return, RT : percentage Capitalf inal −Capitalinitial of period return ( .100), SE: Measurement of over-ﬁtting Capitalinitial (—RM A(Training)/RM A(Test)—). Train

MC1 MC2 MP1 MP2 MP3

r¯+ 100 100 100 100 100

¯ RS 0,55 2,34 1,62 1,86 1,92

Test S&P 500 ¯ A r¯+ RM ¯ A RT ¯ RM 27,68 60,00 -6,09 -6,09 7,54 40,00 -14,46 -14,46 10,33 58,80 3,12 26,87 9,43 70,00 4,92 46,21 11,33 68,75 7,30 72,30

Train ¯ SE 1,22 2,91 0,69 0,47 0,36

r¯+ 100 100 100 100 100

¯ RS 0,47 1,52 1,44 1,36 1,24

¯A RM 88,48 7,59 27,39 29,47 36,52

Test T EF ¯ A RT ¯ r¯+ RM 40,00 -0,65 -0,65 10,00 -4,14 -4,14 50,00 5,60 40,14 47,14 6,30 42,70 52,86 7,21 53,54

¯ SE 1,01 1,54 0,79 0,78 0,80

Tables 1 and 2 show the results obtained for S&P500 index and the new assets TEF(Telef´ onica), BBVA y BSCH respectively. 3 4 5

c Obtained through the program Visual Chart T M AG Mercados. See www.visualchart.com M P1 : ﬁtnessrisk , incremental training and non grammar directed ;M P2 : f itnessrisk , incremental training and grammar directed, M P3 : f itnessrisk return . M C1 : return maximization, not-incremental training and non grammar directed;M C2 : risk minization, not-incremental training and non grammar directed.

148

E.A. de la Cal et al.

All the cases of the proposed methodology, M P1,2,3 , overpass highly the return ¯ A) obtained by the classical ones, M C1 y M C2 in the T est period for each (RM new asset. This improvement is due to the incremental adding to methodologies MP of the new techniques presented in work [6]: incremental training, grammar directed trading rules and ﬁtnessreturn risk . Also, the application of the proposed methodology causes a marked decrease of over-ﬁtting (SE) with regard to M C1 ¯ can be observed, in all the methodology y M C2 . Also a decrease of risk ( see RS) compared with M C1 , except in some cases in which the level of risk is lightly increased to get proﬁtable rules (see Tables 1,2 - assets TEF and BBVA - row M P3 ). It can be said that the ﬁnal proposed methodology M P3 enables to adjust the suitable risk level to get the major return for each asset. This optimization of the pair return-risk is the main objective of the work [6]. Table 2. Comparative study of applying proposed methodology M P3 and its variants M P1,2 with classical methologies M C1,2 on BBV A (S = 30, R = 10) and BSCH (S = 20, R = 30). Being r+ : percentage of proﬁtable rules, RS: Sharpe ratio, RM A: percentage of mean annual return, RT : percentage Capitalf inal −Capitalinitial of period return ( .100), SE: Measurement of over-ﬁtting Capitalinitial (—RM A(Training)/RM A(Test)—). Train ¯ RS 0,48 1,26 1,37 1,35 1,31

¯A RM 140,25 24,75 21,84 20,25 22,40

Test BBV A ¯ A RT ¯ r¯+ RM 0,00 -5,55 -5,55 80,00 2,15 2,15 51,43 3,19 22,06 50,00 3,73 26,47 50,00 6,04 45,22

Train ¯ SE 1,03 0,91 0,85 0,81 0,73

r¯+ 100 100 100 100 100

¯ RS 0,57 1,09 1,22 1,45 1,47

Test BSCH ¯ A r¯+ RM ¯ A RT ¯ RM 136,40 0,00 -6,07 -6,07 18,16 90,00 4,61 4,61 28,87 50,00 3,70 19,91 19,11 58,57 4,63 34,93 28,53 62,50 6,76 64,14

MC1 MC2 MP1 MP2 MP3

r¯+ 100 100 100 100 100

3.2

Comparative Study with Buy&Hold Strategy

¯ SE 1,04 0,74 0,87 0,75 0,76

In Table 3 a relation is shown of percentage of period return during all the test period, obtained applying M P3 and Buy&Hold strategies for each new asset. It can be observed that in every asset but T EF , the proposet methodology obtains higher return than Buy&Hold strategy. The evolution of the T EF asset during the reference period was so rare, because of its high proﬁtability. This situation was due to enterprise strategies like capital increase, and also the main reason was that this asset was the most traded asset in the IBEX35 index during the reference period. It is very hard for an automatic trading system to overpass such a high proﬁt [2], because the number of purchase-sale operations would be very high. By other hand this is an adventage of the trading system, since it’s not compulsary to keep the capital invested during all the reference period (eight years in our study).

Scalability of a Methodology for Generating Technical Trading Rules

149

Table 3. Comparative study respect Buy&Hold strategy. RT : percentage of period Capitalf inal −Capitalinitial return ( .100). Capital initial

Asset Buy&Hold RT M P3 RT S&P500 27,10% 72,30% TEF 100,41% 53,54% BBVA 23,45% 45,22% BSCH 53,28% 64,14%

4

Conclusions and Future work

Incremental training diminishes over-ﬁtting and this increases the return of the rules, especially when applied combined with the minimization of risk. However, we are not training the rules so it is possible to generate higher returns. In this work we prove that previously presented methodology can be used to obtain a trading rule for other kind of assets diﬀerent from the one selected for the original methodology (S&P500 index). In the section 3.2 it can be seen that our methodology obtains quite good proﬁts for the new assets, only lightly lower than S&P 500 proﬁt. Also, the proﬁts overpass the Buy&Hold strategy proﬁt for all the asset but T EF . We believe that a variant of the methodology proposed here can be applied to an asset of the derivative market (for example futures on IBEX35). This will allow us to contrast the results obtained in a totally diﬀerent market. This type of assets are less known by the general public and are normally only used by professional investors or by investment funds for hedging operations. However, they have several advantages which make them ideal for trading: lower operational costs, possibility of obtaining return in bear markets and less leverage. The operation with derived assets, especially with futures, permits the carrying out of operations on a nominal amount far above the capital initially invested, which in the case of futures is limited to the deposit of a quantity as a guarantee. In this way, a risk is being assumed proportional to the nominal value of the operation and not to the quantity eﬀectively invested, which gives rise to its high degree of leverage. Financial futures possess a high ﬁnancial leverage as they oﬀer the possibility of obtaining high return without the need for carrying out an initial investment except for the deposit of a guarantee. By other hand, the proposed methodology can be transformed to a GF RBS (Genetic Fuzzy Rule Base System) [15] [3] using the M OGU L methodology [16]. The main novelty in this genetic fuzzy trading system will be the use of the risk like one objective and an incremental training.

Acknowledgements This work has been carried out with the support of the Ministerio de Ciencia y Tecnologia and the FEDER funds, within the project TIN2008-06681-C06-04.

150

E.A. de la Cal et al.

References 1. Allen, F., Karjalainen, R.: Using genetic algorithms to ﬁnd technical trading rules. Journal of Financial Economics (51), 245–271 (1999) 2. Potvin, J.Y., Soriano, P., Vall´ee, M.: Generating trading rules on the stock markets with genetic programming. Computers and Operations Research (31), 1033–1047 (2004) 3. Chavarnakul, T., Enke, D.: A hybrid stock trading system for intelligent technical analysis-based equivolume charting. Neurocomputing 72(16-18), 3517–3528 (2009); Financial Engineering; Computational and Ambient Intelligence (IWANN 2007) 4. Neely, C.J.: Risk-adjusted, ex ante, optimal, technical trading rules in equity markets. Technical report, Federal Reserve Bank of St. Louis (2001) 5. O’Neill, M., Brabazon, A., Ryan, C.: Forecasting market indices using evolutionary automatic programming. a case study. In: Genetic Algorithms and Genetic Programming in Computational Finance. University of Limerick, University College Dublin, Ireland (2002) 6. Fernandez, M.E., de la Cal, E.A., Quiroga, R.: Improving return using risk-return adjustment and incremental training in technical trading rules with gaps. Applied Intelligence, 1–14 (2009) 7. Howard, L., D’Angelo, D.: The ga-p: a genetic algorithm and genetic programming hybrid. IEEE Expert, 11–15 (1995) 8. Koza, J.: Genetic Programming: On the programming of computers by means of Natural Selection and Genetic. MIT Press, Cambridge (1992) 9. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer, Heidelberg (1992) 10. Xufre Casqueiro, P., Rodrigues, A.: Neuro-dynamic trading methods. European Journal of Operational Research (175), 1400–1412 (2006) 11. Sharpe, W.F.: Mutual fund performance. Journal of Business. Supplement on Security Prices (39), 119–38 (1966) 12. Coello-Coello, C., Veldhuizen, V., Lamont, G.B.: Evolutionary Algorithms for solving Multi-objective Problems. Kluwer, Dordrecht (2002) 13. Elaoud, S., Loukil, T., Teghem, J.: The pareto ﬁtness genetic algorithm: Test function study. European Journal of Operational Research (177), 1703–1719 (2007) 14. Alcal´ a-Fdez, J., S´ anchez, L., Garc´ıa, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fern´ andez, J., Herrera, F.: Keel: A software tool to assess evolutionary algorithms to data mining problems. Soft Computing - A Fusion of Foundations, Methodologies and Applications (2008) (Online) 15. Ng, H.S., Lam, K.P., Lam, S.S.: Incremental genetic fuzzy expert trading system for derivatives market timing. In: IEEE International Conference on Computational Intelligence for Financial Engineering, Hong-Kong, pp. 421–428 (2003) 16. Cord´ on, O., Jesus, M.J.D., Herrera, F., Lozano, M.: Mogul: A methodology to obtain genetic fuzzy rule-based systems under the iterative rule learning approach. International Journal of Intelligent Systems 14, 1123–1153 (1998)

Hybrid Approach for the Public Transportation Time Dependent Orienteering Problem with Time Windows Ander Garcia1 , Olatz Arbelaitz2 , Pieter Vansteenwegen3 , Wouter Souﬀriau3 , and Maria Teresa Linaza1

2 3

1 Vicomtech, Spain [email protected] University of the Basque Country, Spain Katholieke Universiteit Leuven, Belgium

Abstract. The Time Dependent Orienteering Problem with Time Windows (TDOPTW) consists of a set of locations with associated time windows and scores. Visiting a location allows to collect its score as a reward. Traveling time between locations varies depending on the leave time. The objective is to obtain a route that maximizes the obtained score within a limited amount of time. In this paper we target the use of public transportation in a city, where users may move on foot or by public transportation. The approach can also be applied to the logistic sector, for example to the multimodal freight transportation. We apply an hybrid approach to tackle the problem. Experimental results for the city of San Sebastian show we are able to obtain valid routes in real-time.

1

Introduction

In the Orienteering Problem with Time Windows (OPTW), several locations with an associated score and a time window can be visited in order to obtain a total trip score. Each location can be visited only once. The objective is to maximize the total trip score without violating a given time restriction. The OPTW extension we present in this paper, the Time Dependent Orienteering Problem with Time Windows (TDOPTW), integrates public transportation. The TDOPTW consists of a number of Point Of Interests (POIs) with a ﬁxed location, opening hours (time windows) and a given score. The public transportation network is composed by a number of ﬁxed stops and diﬀerent lines between these stops, each with a given frequency. Movements between POIs can be made by public transportation and on foot as well. We are interested in Personalized Electronic Tourist Guides (PETs) [13], which are mobile hand-held devices. A PET should create personalized routes that maximize the tourists’ satisfaction in near real-time, taking into account several restrictions, such as opening hours, duration of the visits, entrance fees and travel distances. The OPTW has been successfully used to model the optimization problem of a PET, without the integration of public transportation [5,14]. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 151–158, 2010. c Springer-Verlag Berlin Heidelberg 2010

152

A. Garcia et al.

Apart from tourism, the system presented in this paper can also be applied to the logistic sector. For example, Yamada et al. [15] applied a genetic algorithm based heuristic to model the strategic level of multimodal transport planning, particularly in freight terminal development and freight transport network design. In the context of multimodal freight transportation, each transportation destination would be equivalent to a tourist POI, having an opening and closing time and a beneﬁt (score). A company should select the optimal combinations of customers, it would not be feasible to visit all of them. It would always be possible to transport goods directly by road. Another possibility would be to use the existing multimodal transportation network (train, ship). Nodes of this multimodal network would be equivalent to the public transportation network, with several service frequencies and costs. The objective would be to maximize the beneﬁt. The travel time is the main diﬀerence between the OPTW and the TDOPTW. While the travel time between the POIs is ﬁxed in the OPTW, in the TDOPTW the travel time between POIs varies according to the leave time of the ﬁrst POI and the transportation mode: a tourist can choose between going by public transportation or on foot. Thus, the diﬃculty of the problem increases. Moreover, we require near real-time calculations. We present an hybrid approach to solve the TDOPTW. First we deal with the time dependent travel times using a Time Dependent Shortest Path (TDSP) algorithm. Then, we apply an Iterated Local Search (ILS) heuristic to solve the the TDOPTW. Finally we test the approach on a set of test instances based on real data for the city of San Sebastian. This paper proceeds in Section 2 reviewing existing literature. In Section 3 we describe our approach. In Section 4 we present our experimental results. Conclusions and further work are discussed in Section 5.

2

State of the Art

The TDOPTW is based on extensions of the Orienteering Problem (OP). The generalization of the OP to a multiple-day trip is known as the Team Orienteering Problem (TOP). When POIs have an associated time window, the problem is called TOPTW. Diﬀerent approaches to solve the (T)OPTW have been proposed [6,10,11,14,8]. Formin and Lingas [4] presented the Time Dependent OP (TDOP), an extension of the OP where the time needed to travel from a POI i to a POI j depends on the leave time from i. Although their approach could be adapted to deal with diﬀerent transportation modes, they don’t present an algorithm that can be used in real-time applications and no time windows are considered. The model we propose is the TDOPTW. Combining previous models, we are able to integrate one or more public transportation networks to travel between POIs. The travel time required to move from one POI to the next, varies according to the transportation mode (on foot or by public transportation) and the leave time. Moreover, when a tourist chooses the public transportation the

Hybrid Approach for the Public Transportation TDOPTW

153

waiting time depends on the moment the tourist arrives at the boarding point. We are not aware of approaches able to solve the TDOPTW in near real-time. The available examples about ﬁnding the shortest path either in public transportation networks or time dependent networks, focus on solving individual queries [9,3,2]. Moreover, there are also examples of more advanced applications [17,1,12,16] but they focus on POI-to-POI cheapest cost problems. On the other hand, we focus on a higher level selection and routing problem between multiple POIs. We explain our hybrid approach in Section 3.

3

Hybrid Approach

In order to solve the TDOPTW, we need to evaluate several insertions of POIs before obtaining the route that maximizes the obtained score. Each time we evaluate the insertion of a candidate POI in a route, we need to ﬁnd the shortest path between at least two POIs, i.e. we need to solve a Time Dependent Shortest Path (TDSP) problem. Solving this TDSP problem directly inside the TDOPTW algorithm each time a a query distance between POIs is executed, would violate the real-time constraint. For example, in a random OPTW test instance of 50 locations, the OPTW algorithm executes the distance calculation operation more than 200,000 times. If instead of simple Euclidean distances (ﬁxed travel times), each of these calculations involves solving a TDSP problem, each TDSDP calculation should not take more than 0.025 milliseconds, in order to obtain a real time calculation. This time is beyond the fastest examples available in the literature. Thus, in order to handle the diﬃculty of the TDOPTW, we apply an hybrid approach combining two diﬀerent heuristics. Each of them is focused on a diﬀerent aspect of the problem. First we explain how we handle the time dependency introduced by the inclusion of the public transportation. Next we focus on the Iterated Local Search (ILS) heuristic tackling the OPTW. We ﬁnalize explaining the integration of both heuristics. First, in order to avoid calculating the required TDSP in real-time, we precalculate the average travel time for each pair of POIs. This way, we handle the extra diﬃculty added by the public transportation without the real-time requirement. We solve in batch the TDSP problems between all the possible pairs of POIs with leave time steps of 1 minute. Then, we obtain an average travel time for each pair of POIs. As test results show, in case of high frequency public transportation, these average travel times are accurate in practice. We store these travel times on a database for their later retrieval. Eliminating the real time requirement, any suitable algorithm can be used to solve the TDSP problems. We have implemented a time dependent Dijkstra’s Shortest Path algorithm modiﬁed to cope with public transportation. Delling [2] details the latest algorithms fulﬁlling this task. Based on the conclusions of Pyrga et al.[9], we have applied a time dependent approach with simple transfer times due to its better performance. We assign

154

A. Garcia et al.

to each node (POI/stop) of the system a group of labels with its arrival time, its penalized arrival time (both of them initialized to inﬁnite) and the path followed to reach it. The algorithm has two sets of nodes: the set of settled nodes, S (nodes whose shortest distances from the source have been found), and the set of unsettled nodes, Q. The algorithm has four diﬀerent steps, summarized in Algorithm 1, which are executed until the shortest distance to the destination node is found. N represents the set of nodes, d the shortest distance vector and u the actual node where the step of the algorithm is focused. Algorithm 1. Diagram of the time dependent algorithm d = ∞; u = startLocation; Q = (N − u); S = (u); while u not equals destination do ﬁnd shortest distance to neighbors (u); u = extractMinimum (Q); S = S + u; Q = Q − u;

If the transportation mean changes, a time penalization is added to the shortest distance time. If this penalized time improves the previously existing shortest distance, the shortest distance and the path to arrive to the node are updated. The inclusion of a time penalization of 3 minutes avoids choosing routes which make users change transportation for a time saving of only some minutes. In order to limit the number of edges of the system, we have limited the maximum walking distance between bus stops to 300 meters, the maximum walking distance between POIs to 2 km and the maximum distance between POIs and stops to 1 kilometer. These are reasonable assumptions for the algorithm and scenario we consider. The complete calculation of the average travel time matrix (50x50) takes around 90 minutes on a PC Intel Core 2 Quad with 2.40 GHz processors and 2 GB Ram. The average travel times are stored on the database for a later fast retrieval (They are available from authors upon request). Due to the oﬀ-line precalculation, this approach is highly scalable. For scenarios with a higher number of POIs, only the precalculation should take longer. Therefore, the scalability of the proposed algorithm only depends on the scalability of the OPTW algorithm. Test instances having up to 288 POIs have been solved in near real-time (less than 5 seconds [14]). Once we have calculated these average travel times, we solve the TDOPTW in real-time as a regular OPTW. Our design and implementation is based on the algorithm proposed by Vansteenwegen et al. [14] for the (T)OPTW. In this section, we give a general description of the OPTW algorithm. For more details we refer to the (T)OPTW article.

Hybrid Approach for the Public Transportation TDOPTW

155

The heuristic is based on Iterated Local Search (ILS) [7]. ILS is a metaheuristic method based on iteratively building sequences of solutions generated by an embedded heuristic called local search. This leads to much better solutions than repeating random trials of the same heuristic. The heuristic perturbs the solution found by the local search to create a new solution. Then, it takes the best solution as the new starting solution for the local search. The process is repeated until a termination criterion is met. The local search heuristic inserts new visits to a route, one by one. For each visit i that can be inserted, the cheapest insertion time (Shif ti ) is determined. For each of these visits the heuristic calculates a ratio, which relates the score of the POI to the time required to visit it. Among them, the heuristic selects the one with the highest ratio for insertion. This process is repeated until no more POIs can be inserted. The perturbation phase removes consecutive POIs from a route. After the removal, the heuristic shifts all visits following the removed visits towards the beginning of the route as much as possible, in order to avoid unnecessary waiting. The perturbation procedure and the local search heuristic are executed until a termination criterion is met. The heuristic returns the incumbent solution as the result. Finally, once we ﬁnd a ﬁnal solution, we run a repair procedure (Algorithm 2) introducing the real travel times between the POIs. Starting from the ﬁrst POI of the route, we compare the average and the real travel time, taking into account the real leave time. If the real travel time is smaller than the average one, the travel time is adapted by advancing the arrive time to the visit towards the beginning of the route as much as possible. The waiting time and the leave time of the visit are also updated. Otherwise, if the real travel time is larger than the average one, we arrive later to the visit. If this causes a visit to become unfeasible, we remove it from the route and we update the route, moving the rest of the visits forward. Algorithm 2. Diagram of the repair procedure for i = 1 to routeLength do averageDistance = averageDistance(i − 1, i); realDistance = realDistance(i − 1, i, li−1 ); ai = li−1 + realDistance; if (averageDistance < realDistance) and (visit i unf easible)) then remove i from route; else W aiti = max(0, Open − ai ); li = ai + W aiti + Ti ;

We have used this hybrid approach to calculate benchmark results for TDOPTW instances. One way to evaluate our approach is to verify how many POIs are removed to restore feasibility. If we need to remove POIs in only a few cases (or none), the approach will work well.

156

4

A. Garcia et al.

Results

In order to test the strength of our approach, we have generated 32 test instances using real data from San Sebastian. San Sebastian is a beautiful city located at the North of Spain, just 20 kilometers away from France. San Sebastian has around 200,000 inhabitants and it is best visited combining public transportation with short walks. The city has around 50 POIs, 26 public transportation lines and 467 stops. We have varied two diﬀerent criterions: – Starting POI of each day: We have chosen 8 diﬀerent starting POIs spread over the city. – Length of each day: We have established four values for the maximum length: 2, 4, 6 and 8 hours. Table 1 summarizes the results of the tests. The ﬁrst column indicates the identiﬁcation of the start POI. Then, we present the results according to the length of each day in hours (2, 4, 6 and 8). Each group of results includes the score, the number of POIs visited during the route (excluding starting and ending POIs) and the calculation time in seconds. For the number of POIs of San Sebastian, our approach is valid to calculate trips in real-time (worst calculation time is less than 0.25 seconds). Calculation times have the same magnitude as those obtained by Vansteenwegen et al. [14] solving regular OPTW problems. Analyzing current calculation times, we can expect that our approach can handle instances with a higher number of POIs. Regarding the obtained scores, a tourist would need around 20 hours to visit all the POIs, excluding travel times. Taking into account one day routes tend to include main POIs, which have a higher score, and skip secondary POIs, it seems reasonable that an 8 hour route collects around half of the maximum possible score (3195). An analysis of the detailed results conﬁrms that the integration of public transport in the routes works correctly. Besides, the average travel time based approach we propose to solve the time dependent problem works correctly, since in the San Sebastian’s test cases, no removals are required. Due to the proximity of most POIs, public transportation is mainly worth to arrive to or leave from Table 1. Summary of the results startId 1 2 3 4 5 6 7 8

2 hours score # CPU(s) 235 3 0.0 530 7 0.0 600 8 0.0 605 8 0.0 530 7 0.0 750 10 0.0 500 7 0.0 650 9 0.0

4 hours score # CPU(s) 1035 14 0.1 1070 14 0.1 1115 15 0.1 1145 15 0.1 1070 14 0.1 1195 16 0.1 1035 14 0.1 1115 15 0.1

6 hours score # CPU(s) 1415 19 0.1 1485 20 0.1 1485 20 0.1 1485 20 0.1 1470 20 0.1 1485 20 0.1 1455 20 0.1 1485 20 0.1

8 hours score # CPU(s) 1745 23 0.2 1795 24 0.2 1715 23 0.2 1800 24 0.2 1810 24 0.2 1790 24 0.2 1770 24 0.2 1775 24 0.2

Hybrid Approach for the Public Transportation TDOPTW

157

the city center, and also to reach distant POIs. Detailed results, including POIs’ details and average travel times, are available from the authors upon request.

5

Conclusions and Future Work

In this paper we propose an hybrid approach to tackle the Time Dependent Orienteering Problem with Time Windows (TDOPTW). Our objective is to integrate public transportation in personalized tourist routes’ planning. The approach can also be applied to the logistic sector, for example to the multimodal freight transportation. Our approach is based on two heuristics. The ﬁrst one involves a precalculation step, where we calculate the average travel times between all pairs of POIs. With these average travel times, the second heuristic solves the TDOPTW as a regular OPTW. Then, we adapt the arrival and leave times of the visits according to the diﬀerences between the average travel times and the (time dependent) real travel times. If due to the adaptation some visits become unfeasible, we remove them from the ﬁnal route proposed to the tourist. We have applied and tested the approach for a set of 32 test instances based on San Sebastian, a medium size city with around 50 POIs, 26 public transportation lines and 467 stops. We have been able to solve these TDOPTW instances in real-time. Our hybrid approach can also be applied to other scenarios with time dependent travel times. The next step is to implement the algorithm for multiple days. We also plan to test the routes with real tourists, in order to check the tourist’s quality perception of the routes. Regarding the algorithm, our intention is to apply it in other cities, with diﬀerent public transportation network topologies and POI distributions. For bigger cities, the Dijkstra’s Shortest Path algorithm should be replaced by more optimized algorithms.

Acknowledgments The authors would like to thank the Basque Government for partially funding this work through the neurebide and etourgune projects and to the Centre for Industrial Management of the Katholieke Universiteit Leuven for hosting Ander Garcia as a guest researcher during 2009. Pieter Vansteenwegen is a postdoctoral research fellow of the ”Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO)”.

References 1. Chiu, D.K.W., Lee, O.K.F., Leung, H.-F., Au, E.W.K., Wong, M.C.W.: A multimodal agent based mobile route advisory system for public transport network. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, HICSS 2005, p. 92.2 (2005) 2. Delling, P.: Engineering and Augmenting Route Planning Algorithms. PhD thesis, Fakultat fur Informatik Universitat Fridericiana zu Karlsruhe (2009)

158

A. Garcia et al.

3. Ding, B., Yu, J.X., Qin, L.: Finding time-dependent shortest paths over large graphs. In: EDBT 2008: Proceedings of the 11th International Conference on Extending Database Technology, pp. 205–216. ACM, New York (2008) 4. Fomin, F.V., Lingas, A.: Approximation algorithms for time-dependent orienteering. Information Processing Letters 83(2), 57–62 (2002) 5. Garcia, A., Linaza, M.T., Arbelaitz, O., Vansteenwegen, P.: Intelligent routing system for a personalised electronic tourist guide. In: Information and Communication Technologies in Tourism 2009, Amsterdam, The Netherlands, pp. 185–197. Springer, Wien (2009) 6. Gra˜ na, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 7. Louren¸co, H.R., Martin, O., Stutzle, T.: Iterated local search. In: Glover, F., Kochenberger, G. (eds.) Handbook of Metaheuristics, pp. 321–353. Kluwer Academic Publishers, Dordrecht (2003) 8. Montemanni, R., Gambardella, L.M.: Ant colony system for team orienteering problems with time windows. Foundations of Computing and Decision Sciences (accepted, 2009) 9. Pyrga, E., Schulz, F., Wagner, D., Zaroliagis, C.: Eﬃcient models for timetable information in public transportation systems. Journal of Experimental Algorithmics 12, 1–39 (2008) 10. Righini, G., Salani, M.: Decremental state space relaxation strategies and initialization heuristics for solving the orienteering problem with time windows with dynamic programming. Computers & Operations Research 36(4), 1191–1203 (2009) 11. Tricoire, F., Romauch, M., Doerner, K.F., Hartl, R.F.: Heuristics for the multiperiod orienteering problem with multiple time windows. Computers & Operations Research 37(2), 351–367 (2010) 12. Tumas, G., Ricci, F.: Personalized mobile city transport advisory system. In: Information and Communication Technologies in Tourism 2009, Amsterdam, The Netherlands, pp. 173–183. Springer, Wien (2009) 13. Vansteenwegen, P.: Planning in tourism and public transportation. PhD thesis, Centre for Industrial Management, Katholieke Universiteit Leuven, Belgium (2008) 14. Vansteenwegen, P., Souﬀriau, W., Vanden Berghe, G., Van Oudheusden, D.: Iterated local search for the team orienteering problem with time windows. Computers & Operations Research 36(12), 3281–3290 (2009) 15. Yamada, T., Russ, B.F., Castro, J., Eiichi, T.: Designing multimodal freight transport networks: A heuristic approach and applications. Transportation Science 43(2), 129–143 (2009) 16. Zenker, B., Ludwig, B., Schrader, J.: Rose - assisting pedestrians to ﬁnd preferred events and comfortable public transport connections. In: ACM Mobility Conference 2009. ACM, New York (accepted 2009) 17. Zografos, K.G., Androutsopoulos, K.N.: Algorithms for itinerary planning in multimodal transportation networks. IEEE Transactions on Intelligent Transportation Systems 9(1), 175–184 (2008)

A Functional Taxonomy for Artifacts Sergio Esparcia and Estefan´ıa Argente Grupo de Tecnolog´ıa Inform´ atica - Inteligencia Artiﬁcal Departamento de Sistemas Inform´ aticos y Computaci´ on Universidad Polit´ecnica de Valencia Camino de Vera, s/n, 46022 Valencia, Spain {sesparcia,eargente}@dsic.upv.es

Abstract. Artifacts are reactive entities located into the environment of a Multi-Agent System and are used by agents in order to reach their goals. They can be classiﬁed using diﬀerent criteria, such as their number of users. This work presents a new taxonomy based on the function the artifacts develop within the environment of the Multi-Agent System. This taxonomy facilitates the identiﬁcation of the functionality of the diﬀerent artifacts, so a designer can be able to easily ﬁnd and adopt the most useful artifact for his needs. Keywords: artifacts, environment, agent-oriented software engineering.

1

Introduction

The modeling of the environment is an important challenge for the Agent Oriented Software Engineering community of researchers. It has become a ﬁrst class abstraction inside a Multi-Agent System (MAS), and many approaches have focused on modeling it [1]. One of the most interesting advances in this ﬁeld is the Agents & Artifacts (A&A) conceptual framework [2]. This proposal compares the agents’ world with the human cooperative environments and it is characterized by three abstractions: (i) agents, the proactive elements of the system, (ii) artifacts, passive elements of the system that agents use as a support for their activities and (iii) workspaces, a container of agents and artifacts that deﬁnes the topology of the system and the space where agents and artifacts can operate. Nowadays, diﬀerent types of artifacts have been developed and provide a wide range of functionalities to MAS developers, such as coordination; and more concrete functions, like an agenda or a calendar. The most extended taxonomy of types of artifacts was deﬁned in [3,4], which is mainly based on how artifacts are used (by one or more agents), i.e. whether they can be employed only by one agent, by a group of agents or as wrappers to external resources. However, it would be useful for designers to have a functional taxonomy in which artifacts were grouped by their functionality, i.e. what they can be employed for. Thus, this work presents a new taxonomy of artifacts, named ’functional taxonomy’, in which existing artifacts are grouped by means of their functionality. This taxonomy is ﬂexible enough in order to include new future E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 159–167, 2010. c Springer-Verlag Berlin Heidelberg 2010

160

S. Esparcia and E. Argente

types of artifacts, that have not yet being developed. This taxonomy is aimed to help MAS developers to select the most appropriate artifacts to use if they decide to enhance their systems with them. The rest of this work is structured as follows. Section 2 describes the artifact concept. Section 3 deﬁnes our new taxonomy for artifacts and depicts the relationships between the categories that build this taxonomy. Finally, section 4 gives our conclusions on this proposal.

2

Artifacts

Artifacts [2] are non-proactive, but reactive entities that agents employ to achieve their goals. As artifacts do not have assigned goals, they are associated to the goals of the agent that uses the artifact. To accomplish these goals, artifacts provide a function, which is partitioned into some operations that agents can execute when interacting with the artifact. These operations are part of the usage interface of the artifact, that is completed with the observable properties that agents can check without performing operations to the artifact. Artifacts provide a second group of operations, called link operations (accessible through a link interface) that enables composition of artifacts and realizing distributed artifacts by linking artifacts, that can be located at the same or diﬀerent workspaces [2], which is the portion of environment that is perceived by an agent, who is able to interact with. Every workspace contains a set of artifacts; and the set of workspaces composing the environment are used to deﬁne its topology. Finally, artifacts are enhanced with a function description (which acts as a manual) and a set of operating instructions, an essential feature when dealing with open systems, since external agents can discover artifacts and evaluate whether they could be useful to reach their expected goals. Since artifacts are very malleable components from the environment of a multiagent system, designers can develop new types of artifacts that are aimed to carry out with diﬀerent operations according to the system needs. Moreover, it is possible to use the CArtAgO framework [5] to implement artifacts, engineered upon the principles of the A&A conceptual framework [2]. The classiﬁcation of artifacts provided by Omicini et al. in [3] is the most extended one. This classiﬁcation is focused on ”who” is using the artifact, and distinguishes between personal artifacts, that provide functionalities to one agent only (e.g. an agenda); social artifacts, that provide functionalities for a group of agents in order to improve the communication and coordination between them (e.g. a blackboard); and boundary artifacts, that wrap the interaction with external systems and resources of a MAS (e.g. a database). This taxonomy is very useful to identify whether an artifact is aimed to provide its functionalities to a single agent or a group of agents. However, a MAS designer could need to know the functionality oﬀered by an artifact, in order to select it to be implemented on his/her system. The following section describes our proposal, a new taxonomy for artifacts based on the function that they develop.

A Functional Taxonomy for Artifacts

3

161

Functional Taxonomy of Artifacts

In this work, we propose a new taxonomy to classify existing artifacts, which is also compatible with the classiﬁcation proposed in [3]. In this new taxonomy, an artifact can be classiﬁed into one or more of these categories, depending on its functionality: (i) basic artifacts, which comprises artifacts that give information of very general world features (for example, clocks, calendars and timetables); (ii) coordination artifacts, that improve the coordination between agents in a MAS; (iii) reputation artifacts, that manage reputation values of agents in an organization; (iv) cognitive stigmergy artifacts, which provide information about an agent or a society of agents that can be useful to other agents or groups; (v) organizational artifacts, which are used to manage an organization; and (vi) argumentation artifacts, that manage arguments between agents. Following, these types of artifacts are explained in detail; and table 1 depicts examples for all these types of artifacts. Basic artifact. The ﬁrst, and most generic, group of our taxonomy are the basic artifacts, which provide general usage features to the system. They are able to provide information about time, by modeling a clock or a calendar, for example, as artifacts; and they are also aimed to provide functions to help agents (e.g. a calculator [6]). Additionally, the resources populating the system’s environment can be modeled using artifacts, facilitating its access to the agents. For example, a database can be easily modeled using an artifact [6]. Coordination artifact. This type of artifacts are employed for enabling and/or improving coordination between agents. Coordination artifacts [7] encapsulate coordination services as a basic building blocks for creating eﬀective shared collaborative working environments. Some of these coordination services are the Follow Me service [7] or the public competitive tender [8]. The coordination service provided by a coordination artifact can be exploited by agents in a social context. Being artifacts, they have an usage interface and a set of operating instructions, but additionally they provide a coordination behavior speciﬁcation, used by artifacts to formally inform about their coordination behavior, in terms of coordination rules. The objective of coordination artifacts is to coordinate actions between agents, by giving them knowledge about actions. For example, agents can know when an action is being executed, the time when the action ﬁnishes or what is the next action to do. Finally, coordination artifacts establish guidelines for the action/communication protocols. A clear example of a coordination artifact could be an information panel located on a highway, which acts in a similar way as the Follow Me service. In this case, when policemen need to advise drivers about accidents or weather incidences, they publish this useful information into panels, so then drivers might follow their recommendations. Thus, these artifacts will achieve the coordination the policeman agents are looking for. Cognitive stigmergy artifact. The objective of this kind of artifacts is to promote awareness into a MAS by giving agents knowledge about the state

162

S. Esparcia and E. Argente

of the system entities in three ways using stigmergy [9]. Awareness is a key aspect to support emergent forms of coordination. It is not necessary to deﬁne coordination in an explicit way. Artifacts for Cognitive Stigmergy give support to some forms of awareness and other features that are related with cognitive stigmergy. Awareness is supported in one of these ways: (i) in a personal way, informing about the state of a concrete agent, (ii) in a social way, providing information about the state of a group of agents, and (iii) from the system’s point of view, giving information about the events that happen in the system. Diﬀerent examples of artifacts for cognitive stigmergy have been deﬁned [9]: (i) a dashboard acts as an interface employed by an agent to represent his intention to use another speciﬁc artifact, to interact with that other artifact and to take, observe or manage annotations; (ii) a log keeps track of events as long as they happen; (iii) a diary keeps track of annotations made by an agent; and (iv) a note-board is used to trace individual agent actions and annotations. Stigmergic mechanisms like diﬀusion, aggregation, selection and ordering can be embedded into artifacts to support stigmergic processes. A clear example of usage of these artifacts can be found in social networks, such as Facebook 1 , where every user has its own proﬁle page and all his/her information is stored. Modeled as an artifact, the proﬁle page of a user will carry out operations with Cognitive stigmergy artifacts. The dashboard artifact would represent the Facebook interface, that can take diﬀerent shapes, such as the Facebook website, the Facebook iPhone app or the Xbox Live application for Facebook. The log artifact would represent all the events related to a user, including their date and time, such as ’User X and user Y are now friends’, ’User X has joined a group’ or ’User X wrote in user Y’s wall’. The annotations (states) made intentionally by a user would be stored into a diary artifact. Finally, the note-board artifact is a compilation of all the information stored in dashboard, log and diary artifacts related to a proﬁle page artifact, along with the information stored in the proﬁle pages of other agents that can support stigmergy. For example, if a user updates his/her state with a sentence like ’I’m going to the cinema’, some of his/her friends can coordinate between them to join together to watch the same movie. This is an important feature of stigmergy. Coordination and organization between users (agents) are not imposed by them, but appear in an emergent way after using the information provided by the artifacts for cognitive stigmergy. Additionally, the existence of emergency in a MAS can produce that a MAS turns self-organizing. In the last years, introducing self-* properties into MAS to make them self-organizing is a key research ﬁeld. Therefore, artifacts can be used to enhance MAS with self-* properties [10] by adding a new layer on top of existing environmental resources. In order to manage this kind of artifacts to achieve the desired self-* properties, the notion of environmental agents [10] is introduced, which act upon artifacts through a management interface. Organizational artifact. These artifacts are used to manage an agent organization; they are aware of being inside the organization and their purpose is 1

http://www.facebook.com

A Functional Taxonomy for Artifacts

163

to help the organization to reach its goals from a global, social level. Therefore, organizational artifacts are aimed to inform the agents of the organization about norms governing the organization, the structure of the organization and actions that are available within the organization. An example containing organizational artifacts is the ORA4MAS [11] organizational infrastructure. Artifacts in ORA4MAS are part of the organizational infrastructure, encapsulating functionalities concerning the management of the organization. ORA4MAS organizational artifacts limit the autonomy for some organizational actions that the agents posse. Moreover, ORA4MAS presents four examples of organizational artifacts: (i) OrgBoard, that keeps track of the organization in a global view; (ii) GroupBoard, that manages the life-cycle of a group of agents; (iii) SchemeBoard, that supports and manages the execution of a social scheme; and (iv) NormativeBoard, that maintains information about the norms of the organization. In [12], authors present another example of organizational artifact, the role evolution coordination artifact, that can deal with organizational concepts properly. The aim of this artifact is to build and evolve a role specialization taxonomy, which consists on a set of roles with a concrete order, over time; and make this information available to the agents. Reputation artifact. These artifacts are used to establish an evaluation for each agent, which is used by the rest of the agents populating the system to establish their own reputation about an agent. For example, the reputation artifact designed in ORA4MAS works as follows [13]: GroupBoard and SchemeBoard artifacts from ORA4MAS collect all the violated norms and send them to the reputation artifact, which is responsible of computing an evaluation for the agents into the organization, that will be featured as an observable property of the artifact. The evaluation provided by this reputation artifact is an instrument to inﬂuence the reputation of the agent. Reputation artifacts can be used, for example, to compute scores for a conference, such as the Estimated Impact of Conference (EIC), like Computer Science Conference Ranking 2 does. After a score to a conference is given by the artifact (i.e. an evaluation is done), the community of researchers (i.e. agents) will be inﬂuenced by this score and their previous knowledge of the conference, so then they will build their own reputation value about this conference. Argumentation artifact. It is employed for evaluating social acceptability of arguments in order to help the decision making process between a group of agents about actions that they need to carry out. The main objective of this kind of artifacts is to provide coordination in the argumentation process that agents must do to reach an agreement. This artifact collects the arguments given by the agents participating in the argumentation process and evaluates them to provide two types of acceptance values (social acceptability and social behavior), thus proposing a feasible agreement for the agents’ society. The ”social acceptability” refers to the admissibility that the arguments of an agent produced into the 2

http://www.cs-conference-ranking.org

164

S. Esparcia and E. Argente Table 1. Examples of artifacts grouped by their type

Agenda [6]

Personal

Counter [6]

Social

Examples of artifacts Obs. properties Operations Basic Artifacts next todo, last todo setTodo, cancelTodo count inc, reset

Flag [6]

Social

state

Database [6]

Boundary n records, table names

Name of the artifact Class

Link operations -

switch

-

createTable, addRecord, query

-

Coordination Artifacts Follow Me [7]

Social

act(a): action a is done do(a): execute action a

-

get: next action to execute Pub. Comp. Tender [8]

Social

Not specified

Not specified

Not specified

Argumentation Artifact Argument Acceptance

Social

Artifact [15]

Social Acceptability

Write Arguments

-

Social Behavior

Artifacts for Cognitive Stigmergy Dashboard [9]

Social

Annotations

takeAnnotations, manageAnnotations

traceOperations

Log [9]

Social

Events

inspectEvents, orderEvents

traceOperations

Diary [9]

Personal

Annotations

makeAnnotation

-

Note-board [9]

Social

Annotations

makeAgentAnnotation

makeArtifactAnnotation

Organizational Artifacts OrgBoard [11]

Social

OrgSpecification

getOrgAgents

Group, Scheme and

getMemberAgents

registerOrgArt

Normative Boards GroupBoard [11]

Social

OrgBoard, SchemeBoards

adoptRole

addScheme

Type, PlayableRoles

leaveRole

removeScheme

PlayersOfRole SchemeBoard [11]

Social

isMember

OrgBoard, GoalsState

commitMission

NormativeBoard

leaveMission

ResponsibleGroupBoards

setGoalAchieved

-

Type, PlayableMissions PlayersOfMissions NormativeBoard [11]

Social

OrgBoard, NormStatus

-

updateAgentStatus

Role Evolution Coordi-

Social

Taxonomy

getBestRolesForInteraction

-

nation Artifact [12]

getAgentsForRoles, getRolesForAgent communicateTrust

Alice’s MSc course [13]

Social

Reputation Artifact Evaluation -

informAboutOrganization

rest of the society. The ”social behavior” provides an overview of the overall behavior of the society of agents in the system. Both types of acceptability can be consulted by agents at their own discretion. Thus, this artifact not only helps for coordination, but also gives assistance to argumentation processes and agreement technologies [14]. A clear example of this type of artifacts is the Co-Argumentation Artifact (CAA) [15], which is used to manage arguments and also encapsulates a coordination service. The CAA acts as a mediator of agent interaction and supports a simpliﬁed implementation of a MAS argumentation system. Another example of an argumentation artifact can be found in an interactive TV show in which a discussion with two possible answers (’yes’ or ’no’) is being carried out. Viewers are asked to send a SMS text message with their answers (’yes’ or ’no’) followed by an argument supporting it. These arguments are sent to an argumentation artifact that evaluates them and provides an answer, which is not only focused on percentages, but also on the arguments that support these ’yes’ or ’no’ answers. Moreover, viewers are able to know how much their personal arguments were accepted by the rest of the audience of this show.

A Functional Taxonomy for Artifacts

165

As previously stated, this functional taxonomy is not closed and it is our intention to extend it with the addition of new types of artifacts that the community of researchers can present in the future. 3.1

Relationships between Categories

Link operations allow artifacts to use some other artifacts, so relationships between artifacts belonging to diﬀerent categories of our taxonomy can appear, as depicted in Fig. 1.

Fig. 1. Relationships between functional taxonomy of artifacts’ categories

For example, one of the objectives of an argumentation process is to reach an agreement between agents so as to coordinate agents. Therefore, it could be necessary to use a coordination artifact while using an argumentation artifact. Similarly, organizational artifacts can explode coordination artifacts in order to reach the organizational objectives of the system. Coordination artifacts can also be employed by reputation artifacts, since agents may be interested in using a coordination service encapsulated into a coordination artifact, and recommended by an agent, if the recommender agent has a good reputation. To compute this reputation, the agent that received the recommendation can request the reputation artifact for an evaluation on the recommender agent. Organizational Artifacts are also useful for other categories. In this way, reputation artifacts will need organizational artifacts in order to keep track of the events that happen on an organization. Additionally, organizational artifacts will help argumentation artifacts to reach an agreement between agents by providing them with information about organizational aspects to the argumentation artifacts. Finally, reputation artifacts can be employed by argumentation artifacts, since an argument provided by an agent can be more reliable if that agent has a good reputation. Additionally, if it is necessary to improve the awareness of a reputation artifact, it can make use of a cognitive stigmergy artifact.

4

Conclusions and Future Work

Originally presented as one out of the two main concepts of the Agents & Artifacts (A&A) framework, artifacts are non-proactive, reactive entities that help

166

S. Esparcia and E. Argente

agents to reach their goals. Artifacts can be classiﬁed in several ways, but currently the most extended taxonomy classiﬁes artifacts by means of who makes use of them. However, this classiﬁcation does not provide any information on the purpose of the artifacts, i.e. what are they for. Therefore, in this work we presented a new taxonomy focused on the purpose and functionality of artifacts. This taxonomy will help MAS developers to know about the functions that artifacts are able to do, so then they can select those artifacts that best ﬁt their system needs. This functional taxonomy takes into account all developed artifacts up to now. However, since artifacts are still an ongoing approach, new examples of artifacts might appear in the future, which may provide important functionalities not yet considered. These functionalities may suppose new categories into our taxonomy, making it richer. Additionally, we are working on formally deﬁning the basic properties and operations of each type of artifact to extend the formalization of artifacts presented in [7]. Acknowledgments. This work is supported by TIN2009-13839-C03-01 and PROMETEO/2008/051 projects of the Spanish government and CONSOLIDERINGENIO 2010 under grant CSD2007-00022.

References 1. Parunak, H.V.D., Weyns, D.: Special issue on environments for multi-agent systems. Auton. Agents Multi-Agent Syst. 14(1), 1–4 (2007) 2. Ricci, A., Viroli, M., Omicini, A.: Give agents their artifacts: the A&A approach for engineering working environments in MAS. In: Proc. AAMAS, p. 150 (2007) 3. Omicini, A., Ricci, A., Viroli, M.: Agens Faber: Toward a theory of artefacts for MAS. Electronic Notes in Theoretical Computer Science 150(3), 21–36 (2006) 4. Molesini, A., Omicini, A., Denti, E., Ricci, A.: SODA: A roadmap to artefacts. In: Dikenelli, O., Gleizes, M.-P., Ricci, A. (eds.) ESAW 2005. LNCS (LNAI), vol. 3963, pp. 49–62. Springer, Heidelberg (2006) 5. Ricci, R., Viroli, M., Omicini, A.: CArtAgO: An Infrastructure for Engineering Computational Environments. In: E4MAS 2006, pp. 102–119 (2006) 6. Ricci, A., Piunti, M.: Designing and Programming Agents’ Environments in Multiagent Systems. In: Proc. 11th EASSS, pp. 3–31 (2009) 7. Omicini, A., Ricci, A., Viroli, M., Castelfranchi, C., Tummolini, L.: Coordination artifacts: Environment-based coordination for intelligent agents. In: Proc. AAMAS, pp. 286–293 (2004) 8. Rubino, R., Omicini, A., Denti, E.: Computational institutions for modelling norm-regulated MAS: An approach based on coordination artifacts. In: Boissier, O., Padget, J., Dignum, V., Lindemann, G., Matson, E., Ossowski, S., Sichman, J.S., V´ azquez-Salceda, J. (eds.) ANIREM 2005 and OOOP 2005. LNCS (LNAI), vol. 3913, pp. 127–141. Springer, Heidelberg (2006) 9. Ricci, A., Omicini, A., Viroli, M., Gardelli, L., Oliva, E.: Cognitive stigmergy: Towards a framework based on agents and artifacts. In: Weyns, D., Van Dyke Parunak, H., Michel, F. (eds.) E4MAS 2006. LNCS (LNAI), vol. 4389, pp. 124– 140. Springer, Heidelberg (2007)

A Functional Taxonomy for Artifacts

167

10. Gardelli, L., Viroli, M., Casadei, M., Omicini, A.: Designing self-organising environments with agents and artefacts: a simulation-driven approach. International Journal of Agent-Oriented Software Engineering 2(2), 171–195 (2008) 11. Hubner, J.F., Boissier, O., Kitio, R., Ricci, A.: Instrumenting multi-agent organisations with organisational artifacts and agents. In: Auton. Agents Multi-Agent Syst. (2009) 12. Hermoso, R., Billhardt, H., Ossowski, S.: Role Evolution in Open Multi-Agent Systems as an Information Source for Trust. In: Proc. AAMAS (2010) 13. Hubner, J.F., Boissier, O., Vercouter, L.: Instrumenting multi-agent organisations with reputation artifacts. In: H¨ ubner, J.F., Matson, E., Boissier, O., Dignum, V. (eds.) COIN@AAMAS 2008. LNCS, vol. 5428, pp. 96–110. Springer, Heidelberg (2009) 14. Luck, M., McBurney, P.: Computing as interaction: agent and agreement technologies. In: Proc. DHMS, pp. 1–6 (2008) 15. Oliva, E., McBurney, P., Omicini, A.: Co-argumentation artifact for agent societies. In: Rahwan, I., Parsons, S., Reed, C. (eds.) Argumentation in Multi-Agent Systems. LNCS (LNAI), vol. 4946, pp. 31–46. Springer, Heidelberg (2008)

A Case-Based Reasoning Approach for Norm Adaptation Jordi Campos1 , Maite López-Sánchez1 , and Marc Esteva2 1

Universitat de Barcelona, 585 Gran Via, 08007 Barcelona, Spain {jcampos,maite}@maia.ub.es 2 IIIA - CSIC, Campus UAB, 08193 Bellaterra, Spain [email protected]

Abstract. Existing organisational centred multi-agent systems regulate agents’ activities. However, population/environmental changes may lead to a poor fulfilment of system’s goals, and therefore, adapting the whole organisation becomes key. In this paper, we propose to use Case-Based Reasoning learning to adapt norms that regulate agents’ behaviour. Moreover, we empirically evaluate this approach in a P2P scenario.

1

Introduction

Developing Multi Agent Systems (MAS) is a complex task due to the diﬃculties of having ﬂexible and complex interactions among autonomous entities. Organising such systems to regulate agent interactions helps to predict/regulate the system’s evolution within certain bounds. However, certain environmental or population changes may decrease its ability to achieve its organisational goals. Thus, adapting such an organisation is now becoming an important topic [10]. Concerning this adaptation, we propose to add an Assistance layer [6] in charge of it, instead of expecting agents to increase their behaviour complexity —it is relevant in open MAS, since there is no control over agent code. In particular, we use this layer to adapt norms that are part of system’s organisation. Notice, that these norms regulate agents’ activity by bounding their actions, but agents still keep its degree of freedom to choose their actual actions. Thus, this additional layer can adapt the organisation while preserving agents’ autonomy. However, the relationship between these norms and system’s outcomes makes the adaptation process very complex, since there is no direct mapping among them. Such a process can be coded by the system designer or learnt. Though, due to diﬃculties to deﬁne an optimal mechanism, we advocate by the learning approach. In this paper, we use a Case-Based Reasoning method, which faces new situations based on past experience [2]. As an illustration, we present a Peer-to-Peer sharing network (P2P) scenario where computers contact among them to share some data. In such a scenario, the relationship among computers’ activity, the network traﬃc and the time required to share a datum is complex. Next section 2 provides details on related previous work. Afterwards, our proposal is described in sections 3-4 and evaluated in section 5. Finally, our conclusions are presented in section 6. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 168–176, 2010. c Springer-Verlag Berlin Heidelberg 2010

A Case-Based Reasoning Approach for Norm Adaptation

2

169

Related Work

Within MAS area, organisation-centred approaches regulate open systems by means of persistent organisations —e.g. EI [11]. Even more, several of these approaches oﬀer mechanisms to update their organisational structures at run-time —e.g. Moise+ [4]. However, most work on adaptation maps organisational goals to tasks and look for agents with capabilities to perform them —e.g. OMACS [10]. Consequently, these approaches cannot deal with scenarios that lack of this goal/task mapping, like our case study. In order to deal with this sort of scenarios, our approach uses norms to inﬂuence agent behaviour, instead of delegating tasks. Speciﬁcally, our approach uses a norm adaptation mechanism based on social power. In this sense, there are other works that also use the leadership of certain agents to create/spread norms —e.g. the role model based mechanism [9]. Besides, most works on norm emergence are agent-centred approaches that depend on participants’ implementation and they rarely create/update persistent organisations —e.g. infection-based model [15]. Relating norms and overall system behaviour is a complex issue that increases its intricacy when there is no control over participant’s implementation. In our approach, this task is distributed among some empowered agents that ﬁnally reach an agreement about norm updates. Currently, they use a voting scheme to agree on actual norms, but they could use some other agreement mechanisms present in the literature —e.g. argumentation protocols [3]. In particular, these agents take their local decisions using the Case-Based Reasoning (CBR) learning technique described in [2], which faces new situations based on past experience. The Autonomic EI [5] also use CBR to adapt their organisation, taking a centralised approach. On the contrary, we take a distributed approach both at the processing and knowledge levels as deﬁned in [14]. Regarding our P2P scenario, there are network management perspective approaches that try to promote local communications but they cannot directly act on network consumption to balance net capacity and traﬃc —e.g. P4P [16]. From a MAS angle, there are works where agents adapt local norms using local information but they cannot reason/act at an organisational level —e.g. [12].

3

Assistance in P2P Scenario

Our approach consists in providing support to the coordination of agents —see coordination support in [6]. In fact, we proposed a generic Two Level Assisted MAS Architecture (2-LAMA [8]) to help agents to participate in Table 1. Results in P2P scenario time cNet h data cML BT 941.2 205344.1 3.4 11.0 2L.a 834.9 293526.7 2.9 35.9 5133.3 2L.b 741.5 292357.7 3.0 33.8 4694.1

J. Campos, M. López-Sánchez, and M. Esteva

#

"

!

"

!

"

!

170

Fig. 1. 2-LAMA in P2P scenario

organisational-centred MAS. We use this generic architecture to develop systems that self-adapt their organisation depending on their evolution. In particular, we model our Peer-to-Peer sharing network case study (P2P) as a MAS with two level of organised agents —see Figure 1. Both levels share the same goal, which is that all participant agents obtain the data by consuming minimum time. On the one hand, we model the set of computers that share some data as agents (AgDL = {P1 . . . Pn }) within a domain-level (DL). Its single role peer and the relationships among them (i.e. the overlay network) conform the social structure of their organisation. This organisation also has its own social conventions: a sharing protocol derived from standard BitTorrent [8] and two norms (N ormDL ). First norm limits agents’ network usage in percentage of its nominal bandwidth: normBWDL =“a peer cannot use more than maxBW bandwidth percentage to share data”. This way, it prevents peers from massively using their bandwidth to send/receive data to/from all other peers. Notice that a massive network use, may saturate it, and therefore, it may delay all communications. Second norm limits the number of peers to whom a peer can send the data: normF RDL =“a peer cannot simultaneously send the data to > maxFR peers”. On the other hand, in order to support the coordination of previous agents, we add an Assistance layer to the described MAS. Currently, this support consists in adapting domain-level’s organisation to changing circumstances. More precisely, it consists in adapting two DL’s organisational components: norms –see section 4– and part of the social structure —by suggesting social relationships among pairs of DL agents (rel_sugg), see [8]. These adaptations are performed by a meta-level (M L) set of agents (AgML = {A1 . . . Am }) that play the role assistant. Each assistant is in charge of a disjoint subset of AgDL (cluster). In fact, assistants use an interface among both levels to collect local information about connection bandwidths and communication latencies (i.e. environment observable properties, EnvP ) and about who has the datum (i.e. agent

A Case-Based Reasoning Approach for Norm Adaptation

171

observable properties, AgP ). In particular, each assistant counts on both detailed information from its cluster and aggregated information from other clusters supplied by other assistants. They weight them to combine the information before starting its own decision process —in current tests, each assistant gives the same importance to its local information than to remote one.

4

Learning Norm Adaptation

As we mentioned, in P2P scenario, adapting norms to obtain desired system outcomes is a complex task. Mainly, because there is no direct mapping between norms and system’s behaviour. For instance, when updating a norm (e.g. increasing maxFR ), it is diﬃcult to foresee the eﬀect of organisational changes (e.g. how many data messages will be transmitted), and it is even more complex to anticipate system’s outcomes (e.g. the total time required to spread data). In order to face this complexity, the meta-level uses a learning technique to decide how to adapt domain-level norms depending on current system status. In particular, we apply a CBR [2] learning approach, to suggest norm updates (solution) to a new system status (problem) based on similar previous situations (previous cases). Our CBR approach is based on a heuristic that tries to align the amount of serving/receiving capacity —see [7]. In fact, the heuristic itself is used by our CBR to suggest a solution when no similar cases are found. Case description. The description of a problem and its solution conforms a case that can be stored as a previous case in a case base. The former (P rob) is described by a set of attributes (Attribs) derived from measures perceived through the interface —as they derive from observable measures, there are not unknown attributes. In particular, we use the following discretised attributes to describe a problem: srvCap, it indicates if there is enough serving capacity to serve all receiving peers by comparing the bandwidth of all serving peers with the bandwidth of all receiving peers (rcvBW); netSat, it estimates the network saturation by comparing the actual receiving bandwidth of receiving peers and their expected bandwidth (rcvExpBW = rcvBW · maxBW , since maxBW limits the data injected towards receiving peers); wait, it reﬂects the amount of peers that lack the datum and are not receiving it currently; sRatio, it indicates sources’ maximum ratio to spread the datum, thus it is derived from current friends’ norm; bwUsg, it indicates the bandwidth used by peers in their communications, thus it is derived from current bandwidth limit norm. Besides, a solution is described by two discrete attributes: vFR, it indicates how to update maxFR by increasing one unit, decreasing one unit, keeping the same value or avoiding inﬂuencing it (i.e. a blank ballot-paper ); vBW, it deﬁnes how to adapt maxBW by setting it to 100%, keeping its value or dividing it by two. CBR Cycle. There are four main phases [2]: retrieve, reuse, revise and retain. The ﬁrst phase (retrieve) fetches the most similar cases (retrCases) from the case base (caseBase) as illustrated in left side of Algorithm 1. It starts with an empty list of cases and a minimum reference similarity (bestS) —see line 2. Then, it traverses the case base –line 3– computing the similarity (σ, see

172

J. Campos, M. López-Sánchez, and M. Esteva

Algorithm 1. Retrieve(left&top-right) & Reuse(bottom-right) phases. 01 def retrieve( newCase ): 02 retrCases = {} ; bestS = 0 03 foreach prevCase in caseBase: 04 s = σ ( prevCase.prb, newCase.prb ) 05 if ( s > MIN_SIM ): 06 case ( s > bestS ): 07 retrCases = { prevCase } 08 bestS = s 09 case ( s bestS ): 10 retrCases=retrCases∪{prevCase}

11 12 13 14

if ( retrCases is empty ): heuCase = Heuristic.solve(newCase) retrCases = { heuCase } return retrCases

01 def reuse( retrCases, newCase ): 02 if ( δ(retrCases) > MAX_DIV ) 03 heuCase = Heuristic.solve(newCase) 04 retrCases = { heuCase } 05 sol = adapt( retrCases, newCase ) 06 return Case( newCase.prb, sol )

below) of each previous case’s problem description (prevCase.prb) with the new problem (newCase.prb) —line 4. In case this similarity is greater than a minimum trusted similarity (MIN_SIM) the case is a candidate to be retrieved —line 5. In particular, if this similarity is greater than any previous one –line 6– then the previous case is the one to be retrieved —line 7. Alternatively, if the similarity is equal to previous greatest one –line 9– then current previous case is collected with the rest of similar ones —line 10. In other words, it tries to return the most similar previous case, although it can return more cases when they have nearly the same similarity. However, if no previous case has the minimum trusted similarity to consider it is representative enough to adapt its solution to the new problem –line 11–, the algorithm executes the heuristic –line 12– to solve this unknown problem.Finally, in both cases the cases are returned — line 14. The case similarity function (σ) among two problems (px , py ∈ P rob) consists on computing the attribute similarity function (ςi ) among correspondpy px ing values of the same attribute them in a i ) to aggregate (ai , ai ∈σ Attrib p σ wi = 1. In orweighted manner: σ(px , py ) = i∈Attribs wi · ςi (api x , ai y ) , der to compute this attribute similarity function (ςi ), we deﬁne a label distance function (λi ) that provides a numeric distance among two discrete labels (e.g. λwaiting (NONE, NONE) = 0, λwaiting (NONE, FEW) = 1, λwaiting (NONE, A_LOT) = 2). In fact, we regard discrete labels as an ordered set of equidistant values. Then, we deﬁne ςi as an inverse mapping from labels’s distance [0..λMAX ] to the [0..1] i p

px

λ (a

py

,a

)

i i interval: ςi (api x , ai y ) = 1 − i λM . In sum, in both similarity functions (σ, ςi ), AX i a 0 means no coincidence at all and a 1 means that the items are equal. From retrieved cases, the second CBR phase (reuse) employs their solutions to build a new one for the current case. In case there is more than one similar case, we count on a divergence function (δ) to compute the divergence among them —it is the standard deviation of vFR discrete values converted into integers, since in our experiments vBW was correlated with it. Thus, reuse phase starts by checking if the divergence of retrieved cases is greater than a maximum trusted divergence (MAX_DIV) —see line 2 in right side of Algorithm 1. In such a case, it considers that previous cases’ solutions are too contradictory to provide a good single solution. Hence, the heuristic is used –lines 3-4. Once there is a set of slightly divergent previous cases –notice that a single previous case has

A Case-Based Reasoning Approach for Norm Adaptation

173

no divergence– it adapts their solution to the current problem —line 5. This task can take into account (i) all retrieved solutions but also (ii) the diﬀerences between the retrieved problems and the current one. In current implementation, our adapt function uses only the former (i). In particular, it returns a solution composed by the most frequent vFR and the most frequent vBW. In case there is a tie, the less conservative actions (i.e. change values) have priority over the more conservative ones (i.e. keep the same values) —since they may make the system evolve in a diﬀerent way and avoid a tie in a subsequent adaptation process. Next, the third phase (revise) requires a way to evaluate the solution, but current performance measure (total time) is unknown until the end of execution —i.e. there is a credit assignment problem [13]. As we are working on this topic in the P2P scenario, current implementation has only the fourth phase (retain). It consists on storing only the new previous cases returned by the heuristic. This way, the case base grows every time the heuristic is used —when we implement the third phase, the system will revise its adapted solutions and retain them if they are representative enough. After each assistant computes a convenient update for norm parameters using CBR, all of them agree on their actual modiﬁcation using a voting approach —in case there is a tie, parameters are not modiﬁed. Finally, each assistant sends to its domain-level agents the norms if they have been modiﬁed —in current implementation peers do not violate norms but they adapt their behaviour when receiving a new norm speciﬁcation. As applying norm changes has an associated cost –e.g. cancelling some started data transmissions–, the norm adaptation process is performed at an empirically tested time interval (adaptinterv ) speciﬁed in next section.

5

Empirical Evaluation

In order to test our approach, we have implemented a P2P MAS simulator. This simulator is implemented in Repast Simphony [1] and provides diﬀerent facilities to execute tests and analyse results. As it simulates both agents and network components, it allows to execute diﬀerent sharing methods with identical initial conditions. Thus, we have performed several tests on BitTorrent and 2-LAMA approaches to empirically evaluate our proposal’s performance. The evaluated approaches in this work are: a single-piece version of the standard BitTorrent protocol (BT, it is detailed in [8]), our architecture using always an heuristic to adapt norms (2L.a) and our approach using learning techniques (2L.b). In order to make a fair comparison –see [8]– among BT and 2-LAMA, we have used the following initial norm parameters: maxBW = 100%, maxFR = 3. These norms are adapted at intervals of adaptinterv = 50 time steps. The learning approach, 2L.b, uses 0.8 as the minimum similarity threshold (i.e. MIN_SIM=0.8) and 1 as the maximum divergence threshold (i.e. MAX_DIV=1) —both values come from an empirical study. Notice that in this approach, assistants start with an empty case base and use the heuristic to generate an initial case. Later, if a problem is similar to previous ones, they reuse their knowledge instead of using the heuristic.

174

J. Campos, M. López-Sánchez, and M. Esteva

We have tested all three methods by varying the peer that initially has the datum —we call round to a single execution with the data in a certain initial position. In subsequent rounds, 2L.b’s assistants already know some previous cases since case base is kept when sharing more data among the same agent community. This process is repeated until the data has been initially in all peers (multiple-round ). Due to the random nature of the BT –some served peers are selected haphazardly–, the results show the average of executing a multiple-round 50 times (i.e. 12 × 50 = 600 rounds, where the 12 corresponds to all possible initial data positions in a round, and the 50 corresponds to repeat the unique multiple-round). In contrast, a multiple-round does not need to be repeated when using 2-LAMA methods, because they do not present random issues. However, as assistants in 2L.b learn at each round, the order of initial data positions inﬂuences this approach. Thus, 2L.b results show the average of executing this alternative on 50 random multiple-rounds (i.e. 12 × 50 = 600 rounds, where the 50 corresponds to diﬀerent multiple-rounds with distinct order of 12 initial data positions). Table 1 shows the average per round of the following metrics: time as the total time required to spread the datum among all peers; cNet which is the network cost consumed by all messages (each message cost is computed as its length times the number of links it traverses); h as the average number of links traversed by each message (hops); data as the total number of sent data messages; and cML that is the cost of all messages related with the meta-level —i.e. all messages sent to or by assistants. Notice that the data metric refers to all data messages, although some of them may not be totally transmitted if: (i) a destination peer sends a cancel message to its source peer because it found a better source or (ii) a peer stops sending data to fulﬁl an updated normF RDL . If we compare the performance of both BT and 2-LAMA approaches, we see that our proposals require less time to share the datum. This means that it takes longer when there is no assistance despite the additional communication with the meta-level required by the assisted approach. In contrast, the network cost (cNet ) is larger in 2-LAMA. This means that, in our approaches network is intensively used along the whole execution without achieving saturation —otherwise, time would increase. Our proposal requires more communication because: (i) it has extra communications due to the meta-level, (ii) it sends more data messages, and (iii) it initially measures latencies to adapt DL’s social structure. Having a meta-level (i) implies that coordination messages are exchanged among domainlevel agents and assistants and also between assistants. However, the derived network overload (cML) is small since these control messages are very small compared with data messages. On the contrary, (ii) having more data messages (data) consume a signiﬁcant amount of network resources. These extra data messages are created because 2-LAMA peers compare data sources by retrieving some data from them —they replace their current data source whenever they ﬁnd a faster one. Thus, we expect to minimise this network consumption when dealing with more than one piece of data, since peers could compare sources depending on previous retrieved pieces. Besides, latency measurements (iii) represent up to a 20% of the network cost increment. Notice, though, that these measures are

A Case-Based Reasoning Approach for Norm Adaptation

175

used to improve system-wide data-paths —by suggesting certain neighbours to each domain-level agent. Regarding the number of links traversed by messages (h), our approaches have more local communications than BT. This is convenient because local messages have lower latencies and costs. Overall, results show that our learning approach (2L.b) outperforms our heuristic approach (2L.a) and the BT one, since it requires less time. This means that our heuristic performs a good estimation of the mapping between system status, norms and outcomes, but it can be enhanced. In fact, our current CBR implementation is already improving this estimation.

6

Conclusions

Our vision is to endow the system with adaptation capabilities instead of expecting its agents to increase their behaviour complexity. Thus, we propose to add a meta-level that adapts a MAS organisation as a type of assistance to the coordination of its participants. Particularly, this paper applies CBR to perform such a task. It illustrates this approach in a P2P scenario, providing in-depth details about the adaptation of norms in this scenario. Moreover, it empirically compares this approach to the BitTorrent protocol —widely used in this scenario. As future work, we plan to go further in CBR methodology (e.g. evaluating solutions in a revise phase) and open MAS issues (e.g. entering/leaving agents). Acknowledgements. This work is partially funded by IEA (TIN2006-15662C02-01), EVE (TIN2009-14702-C02-01 / TIN2009-14702-C02-02) and AT (CONSOLIDER CSD2007-0022) projects, EU-FEDER funds, the Catalan Gov. (Grant 2005-SGR-00093) and M. Esteva’s Ramon y Cajal contract.

References 1. Repast Simphony, http://repast.sourceforge.net 2. Aamodt, A., Plaza, E.: Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Commun. 7(1), 39–59 (1994) 3. Artikis, A., Kaponis, D., Pitt, J.: Dynamic Specifications of Norm-Governed Systems. In: MAS: Semantics and Dynamics of Organisational Models (2009) 4. Boissier, O., Gâteau, B.: Normative multi-agent organizations: Modeling, support and control. In: Normative Multi-agent Systems (2007) 5. Bou, E., López-Sánchez, M., Rodríguez, J.A.: Autonomic Electronic Institutions’ Self-Adaptation in Heterogeneous Agent Societies. In: Vouros, G., Artikis, A., Stathis, K., Pitt, J. (eds.) OAMAS 2008. LNCS (LNAI), vol. 5368. Springer, Heidelberg (2009) 6. Campos, J., López-Sánchez, M., Esteva, M.: Assistance layer, a step forward in Multi-Agent Systems Coordination Support. In: Autonomous Agents and Multiagent Systems, pp. 1301–1302 (2009) 7. Campos, J., López-Sánchez, M., Esteva, M.: Norm Adaptation using a Two-Level Multi-Agent System Architecture in a Peer-to-Peer Scenario. In: Proceedings of COIN at AAMAS 2010 (to appear 2010)

176

J. Campos, M. López-Sánchez, and M. Esteva

8. Campos, J., López-Sánchez, M., Esteva, M., Novo, A., Morales, J.: 2-LAMA Architecture vs. BitTorrent Protocol in a Peer-to-Peer Scenario. In: Artificial Intelligence Research and Development - CCIA 2009, vol. 202, pp. 197–206. IOS Press, Amsterdam (2009) 9. Cranefield, B.S.S., Purvis, M., Purvis, M.: Role model based mechanism for norm emergence in artificial agent societies. In: Sichman, J.S., Padget, J., Ossowski, S., Noriega, P. (eds.) COIN 2007. LNCS (LNAI), vol. 4870, pp. 203–217. Springer, Heidelberg (2008) 10. Deloach, S.A., Oyenan, W.H., Matson, E.T.: A capabilities-based model for adaptive organizations. Autonomous Agents and Multi-Agent Systems 16(1), 13–56 (2008) 11. Esteva, M.: Electronic Institutions: from specification to development. IIIA PhD, vol. 19 (2003) 12. Grizard, A., Vercouter, L., Stratulat, T., Muller, G.: A peer-to-peer normative system to achieve social order. In: Noriega, P., Vázquez-Salceda, J., Boella, G., Boissier, O., Dignum, V., Fornara, N., Matson, E. (eds.) COIN 2006. LNCS (LNAI), vol. 4386, pp. 274–289. Springer, Heidelberg (2007) 13. Jones, J., Goel, A.: Revisiting the Credit Assignment Problem. In: Challenges of Game AI: Proceedings of the AAAI, vol. 4, p. 4 (2004) 14. Plaza, E., McGinty, L.: Distributed case-based reasoning. The Knowledge Engineering Review 20(03), 261–265 (2006) 15. Salazar-Ramirez, N., Rodríguez-Aguilar, J.A., Arcos, J.L.: An infection-based mechanism for self-adaptation in multi-agent complex networks, pp. 161–170 (2008) 16. Xie, H., Yang, Y.R., Krishnamurthy, A., Liu, Y., Silberschatz, A.: P4P: Provider portal for applications (2008)

An Abstract Argumentation Framework for Supporting Agreements in Agent Societies Stella Heras, Vicente Botti, and Vicente Juli´an Departamento de Sistemas Inform´ aticos y Computaci´ on Universidad Polit´ecnica de Valencia Camino de Vera s/n. 46022 Valencia (Spain) Tel.: (+34) 96 387 73 50 - Fax. (+34) 96 387 73 59

Abstract. In this paper we present an abstract argumentation framework for the support of agreement processes in agent societies. This framework takes into account arguments, attacks between them and the social contex of the agents that put forward arguments. Then, we define the semantics of the framework, providing a mechanism to evaluate arguments in view of other arguments posed in the argumentation process. Finally, the framework is illustrated with an example in a real domain of a water-rights transfer market.

1

Introduction

Research on argumentation is at its peak in the Multi-Agent Systems (MAS) community, since it has proven very successful to implement agents’ internal and practical reasoning and to manage multi-agent dialogues [1]. MAS requires agents to have a way of reaching agreements that harmonise the conﬂicts that come out when they have to collaborate or coordinate their activities. Also, gents in MAS can form societies that link them via dependency relations that emerge from agents’ interactions or are predeﬁned by the system. These dependencies deﬁne the agents’ social context, which has an important inﬂuence in the way agents can argue and reach agreements with other agents. Nowadays, argumentation is an active research area in AI and MAS [2]. However, little work has been done to study the eﬀect of the social context of agents in the way that they argue and analyse arguments. Commonly, the term agent society is used in the argumentation and AI literature as a synonym for an agent organisation [3] or a group of agents that play speciﬁc roles, follow some interaction patterns and collaborate to reach global objectives [4]. In addition to the dependency relations between agents, we also consider values as an important element of their social context. These values can be individual values that agents want to promote or demote (e.g. solidarity, peace, etc.) or also social values inherited from the agents’ dependency relations. Thus, we endorse the view of abstract value-based argumentation frameworks [5], which stress the importance of the audience in determining whether an argument is persuasive or not. To our knowledge, no research is done to adapt abstract argumentation E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 177–184, 2010. c Springer-Verlag Berlin Heidelberg 2010

178

S. Heras, V. Botti, and V. Juli´ an

frameworks to represent and manage arguments in agent societies taking into account their social context. Nevertheless, this social information plays an important role in the way agents can argue and in the acceptability semantics of arguments. Depending on their social relations with other agents, an agent can accept arguments from a member of its society that it would never accept before acquiring social dependencies with this member. In this paper we present an abstract argumentation framework for the support of agreement processes in agent societies. This framework takes into account arguments, attacks between them and the social contex of the agents that put forward them. Then, we deﬁne the semantics of the framework, providing a mechanism to evaluate arguments in view of other arguments posed in the argumentation process. Finally, an example is provided.

2

Agent Society

In this work, we follow the approach of [6] and [7], who deﬁne an agent society in terms of a set of agents that play a set of roles, observe a set of norms and a set of dependency relations between roles and use a communication language to collaborate and reach the global objectives of the group. This deﬁnition can be adapted to any open MAS where there are norms that regulate the behaviour of agents, roles that agents play, a common language that allow agents to interact deﬁning a set of permitted locutions and a formal semantics for each of these elements. Moreover, the set of norms in open MAS deﬁne a normative context (covering both the set of norms deﬁned by the system itself as well as the norms derived from agents’ interactions)[8]. However, we consider that the values that individual agents or groups want to promote or demote and preference orders over them have also a crucial importance in the deﬁnition of an argumentation model for agent societies. These values could explain the reasons that an agent has to give preference to certain beliefs, objectives, actions, etc. Also, dependency relations between roles could imply that an agent must change or violate its value preference order. For instance, an agent of higher hierarchy could impose their values to a subordinate or an agent could have to adopt a certain preference order over values to be accepted in a group. Therefore, we endorse the view of [9], [10] and [5], who stress the importance of the audience in determining whether an argument (e.g. for accepting or rejecting someone else’s beliefs, objectives or action proposals) is persuasive or not. Thus, we have included in the above deﬁnition of agent society the notion of values and preference orders among them. Next, we provide a formal deﬁnition for the model of society that we have adopted: Definition 1 (Agent Society). An Agent society in a certain time t is deﬁned as a tuple St = < Ag, Rl, D, G, N, V, Role, DependencySt , Group, Values, V alprefq > where Ag = {ag1 , ag2 , ..., agI } is the set of I agents of the society at the time t, Rl = {rl1 , rl2 , ..., rlJ } is the set of J roles of the society, D = {d1 , d2 , ..., dK } is the set of K possible dependency relations over roles, G = {g1 , g2 , ..., gL } is the set of groups that the agents of the society form in t, where

An Abstract Argumentation Framework for Supporting Agreements

179

each gl = {a1 , a2 , ..., aM }, M ≤ I consist of a set of agents ai ∈ A of the society, N is the normative context of St (the deﬁned set of norms that aﬀect the roles that the agents play in the society St ), V = {v1 , v2 , ..., vP } is the set of P values predeﬁned in the society, Role : A → 2R is a function that assigns an agent the role(s) that it plays in the society in this moment, DependencySt :<SDt ⊆ RxR deﬁnes a partial order relation over roles (∀r1 , r2 , r3 ∈ R, r1 <Sd t r2 <Sd t r3 implies that r3 has the highest rank with respect to the dependency relation d in the society St . Also, r1 <Sd t r2 and r2 <Sd t r1 implies that r1 and r2 have the same rank with respect to d in the society St ), Group : A → 2G is a function that assigns an agent to the groups that it belongs to, V alues : A → 2V is a function that assigns an agent the set of values that it has and V alprefq ⊆ V xV , where q = Ag ∨ q = G, deﬁnes a irreﬂexive, antisymmetric and transitive relation <Sq t over the values.

3

Framework Formalisation

Most abstract argumentation frameworks (AF) are based on Dung’s framework [11], which is deﬁned as a pair < A, R > where A is a set of arguments and R ⊆ AxA is a binary attack relation on A. For two arguments A and B, R(A, B) means that the argument A attacks the argument B. AF abstract the structure and meaning of arguments and attacks between them and focus their research eﬀorts on analysing generic properties and argumentation semantics. This semantics is the formal deﬁnition of the method by which arguments are evaluated in view of other arguments [12]. Semantics can be extension-based, which determine the extensions or sets of arguments that can be collectively acceptable or labelling-based, which label each argument of A with a speciﬁc state in a predetermined set of possible states of an argument. Based on Dung’s AF, we deﬁne an Argumentation Framework for an Agent Society (AFAS) as: Definition 2 (Argumentation Framework for an Agent Society). An argumentation framework for an agent society is a tuple: AF AS = where A is a set of arguments, R is a abstract attack relation as deﬁned in Dung’s framework and St is a society of agents as deﬁned in Deﬁnition 1. Then, we specialise AFAS considering them for an speciﬁc agent, since each agent of an open MAS can have a diﬀerent preference order over values. Thus, an audience is deﬁned as a preference order over values. For the deﬁnition of our Agent speciﬁc Argumentation Framework for Agent Societies we start from the deﬁnition of Audience speciﬁc Value-based Argumentation Frameworks (AVAF) [5]. This is also based on Dung’s and we will extend and adapt it to take into account the social context of agents. An audience-speciﬁc value-based argumentation framework is a 5-tuple: AV AFa =< A, R, V, val, V alprefa > where A, R , V and val are as deﬁned for a Value-based Argumentation Framework (VAF) [5], a ∈ P is an audience of the set of audiences P and V alprefa ⊆ V × V is a transitive, irreﬂexive and asymmetric preference relation that reﬂects the value

180

S. Heras, V. Botti, and V. Juli´ an

preferences of the audience a. We extend AVAFs and deﬁne our abstract Agent Speciﬁc Argumentation Framework in an Agent Society (AAFAS) as follows: Definition 3 (Agent specific Argumentation Framework in an Agent Society). An agent speciﬁc argumentation framework in an agent society is a tuple: AAF AS =< Ag, Rl, D, G, N, A, R, V, Role, DependencyS , Group, V alues, val, V alprefagi > where Ag, Rl, D, G, N , A, R, V , Role, DependencySt , Group and V alues are deﬁned as in Deﬁnition 1; val(ag, a) : Ag × A → 2V is a function that assigns an agent’s argument the value(s) that it promotes and V alprefagi ⊆ V × V , deﬁnes a irreﬂexive, antisymmetric and transitive relation <Sagt i over the agent’s agi values in the society St . The aim of AFAS is to determine which agent’s argument attacks other agent’s argument in an argumentation process performed in a society of agents and in each case, which argument would defeat the other. To do that, we have to consider the values that arguments promote and their preference relation as in AVAFs, but also the dependency relations between agents. These relations could be stronger than value preferences in some cases (depending on the application domain). For the time being, as in [6], we only consider the following dependency relations: a) Power : when an agent has to accept a request from other agent because of some pre-deﬁned domination relationship between them (e.g. in a society S that manages the water-rights transfer of a river basin (as explained in the example of section 4), F armer <SPtow BasinAdministrator, since farmers must comply with the laws announced by the basin administrator);b) Authorisation: when an agent has committed itself to other agent for a certain service and a request from the latter leads to an obligation when the conditions are met t F armerj , if F armerj has contracted a (e.g. in the society St , F armeri <SAuth service that oﬀers F armeri ) and Charity: when an agent is willing to answer a request from other agent without being obliged to do so (e.g. in the society St , t t by default F armeri <SCh F armerj and F armerj <SCh F armeri ). Thus, we can now deﬁne the agent-speciﬁc defeat relation of our AAFAS as: Definition 4 (Defeat). An agent’s ag1 argument a1 ∈ AAF AS put forward in the context of a society St def eatsag1 other agent’s ag2 ∈ AAF AS argu/ V alprefag1 ) ∧ ment a2 iﬀ attack(a1 , a2 ) ∧ (val(ag1 , a1 ) <Sagt 1 val(ag2 , a2 ) ∈ St St / DependencySt ) (Role(ag1 )

An Abstract Argumentation Framework for Supporting Agreements

181

Definition 5 (Conflict-free). A set of arguments ARG ∈ A is conf lict − f reeag1 for an agent ag1 in the society St if (a1 , a2 ∈ ARG / (attacks(a1 , a2 ) ∨ attacks(a2 , a1 )) ∨ ((val(ag1 , a1 )<Sagt 1 val(ag2 , a2 )∈ / V alprefag1 ) ∧ (Role(ag1) t <SPtow Role(ag2 ) ∨ Role(ag1 )<SAuth Role(ag2 ) ∈ / DependencySt )). That is, if there is no pair of arguments that attack each other and or, otherwise, there is a value preference relation and a dependency relation that invalidates the attack. Definition 6 (Acceptability). An argument a1 ∈ A is acceptableag in a society St wrt a set of arguments ARG ∈ A iﬀ ∀a2 ∈ A ∧ def eatsag (a2 , a1 ) → ∃a3 ∈ ARG ∧ def eatsag (a3 , a2 ). That is, if the argument is def eatedag by other argument of A, some argument of the subset ARG def eatsag this other argument. Definition 7 (Admissibility). A conﬂict-free set of arguments ARG ∈ A is admissible for an agent ag iﬀ ∀a ∈ ARG → acceptableag . Definition 8 (Preferred Extension). A set of arguments ARG ∈ A is a pref erred − extensionag for an agent ag if it is a maximal (wrt set inclusion) admissibleag subset of A. Then, for any AAF AS = < Ag, Rl, D, G, N , A, R, V , Role, DependencySt , Group, V alues, val, V alprefagi > there is a corresponding AF AS = , where R = def eatsagi . Thus, each attack relation of AF AS has a corresponding agent speciﬁc def eatagi relation in AAF AS. These properties are illustrated in the example of the next section.

4

Application of the Framework to the Management of Water-Right Transfer Agreements

To exemplify our framework, let us propose a scenario of an open MAS that represents a water market [13], where agents are users of a river basin, they belong to a society St and they can enter or leave the system to buy and sell waterrights. A water-right is a contract with the basin administrator that speciﬁes the volume that can be spent, the water price, the district where the water is settled, etc. Here, suppose that two agents that play the role of farmers F1 and F2 in the river basin RB (group) are arguing to decide over a water-right transfer agreement and a basin administrator BA must control the process and make a ﬁnal decision. The basin has a set of norms NRB and commands a dependency relations of charity (Ch) between two farmers and power (Pow) between a basin administrator and a farmer. In addition, farmers prefer to reach an agreement before taking legal action to avoid the intervention of a jury (J). Also, F1 prefers economy over solidarity (SO <SFt1 J <SFt1 EC), F2 prefers solidarity over economy (J <SFt2 EC <SFt2 SO) and by default, BA has the value preference order of the t t basin, which is (EC <SBA SO <SBA J).

182

S. Heras, V. Botti, and V. Juli´ an

In this scenario, F1 puts forward the argument “ I should be the beneﬁciary of the transfer because my land is adjacent to the owner’s land”. Here, we suppose that the closer the lands the cheaper the transfers between them and then, this argument could promote economy. However, F2 replies with the argument “I should be the beneﬁciary of the transfer because there is a drought and my land is almost dry”. In this argument, we assume that crops are lost in dry lands and helping people to avoid losing crops promotes solidarity. In addition, the BA knows that the jury will interfer if the agreement violates the value preferences of the river basin. Then, they can also put forward the following arguments “F2 should allow me (F1 ) to be the beneﬁciary of the water-right transfer to avoid the intervention of a jury (J)”, “F1 should allow me (F2 ) to be the beneﬁciary of the water-right transfer to avoid the intervention of a jury (J)” and “F1 should allow F2 to be the beneﬁciary of the water-right transfer to avoid the intervention of a jury (J)”. In view of this context, BA could generate a AF AS = as an extension of abstract argumentation frameworks AF =< A, R >. Thus, we have the following arguments in A={A1, A2, A3, A4, A5, A6, A7} (which are all possible solutions for the water-right transfer agreement process): A1 (posed by F1 ): F1 should be the beneﬁciary of the water transfer (F1 w) to promote economy (EC), A2 (posed by F2 ): F1 should not be the beneﬁciary of the water transfer (F1 nw) to promote solidarity (SO), A3 (posed by F2 ): F2 should be the beneﬁciary of the water transfer (F1 w) to promote solidarity (SO), A4 (posed by F1 ): F2 should not be the beneﬁciary of the water transfer (F2 nw) to promote saving (EC), A5 (posed by F1 ): F2 should allow F1 to be the beneﬁciary of the water transfer (F1 w&F2 nw) to avoid the intervention of a Jury (J) and A6 (posed by F2 and BA): F1 should allow F2 to be the beneﬁciary of the water transfer (F1 nw&F2 w) to avoid the intervention of a Jury (J). The BA cannot decide the water tranfer in favour of both water users, so attacks(A1, A3) and vice versa and we assume that it must take a decision favouring at least one part, so attacks(A2, A4) and vice versa. In addition, attacks(A5, A2), attacks(A5, A3) and attacks(A5, A6) and all these arguments attack A5 and attacks(A6, A1), attacks(A6, A4) and attacks(A6, A5) and all these arguments attack A6. Then, R ={attacks(A1, A3), attacks(A3, A1), attacks(A2, A4), attacks(A4, A2), attacks(A5, A2), attacks(A5, A3), attacks(A5, A6), attacks(A2, A5), attacks(A3, A5), attacks(A6, A5), attacks(A6, A1), attacks(A6, A4), attacks(A1, A6), attacks(A4, A6)} and St = < Ag, Rl, D, G, N, V, Role, DependencySt , Group, Values, V alprefq > where Ag = {F1 , F2 , BA}; Rl = {Farmer, BasinAdministrator}, D = {Power, Charity}; G = {RB}; N = NRB ; V = {EC, SO, J}, Role(F1 ) = Role(F2 ) = Farmer ; Role(BA) = t BasinAdministrator, Farmer <SPtow BasinAdministrator, Farmer <SCh Farmer, Group(F1 ) = Group(F2 ) = Group(BA) = RB, Values(F1 ) = Values(F2 ) = Values(BA) = {EC, SO, J}, ValprefF1 = {SO <SFt1 J <SFt1 EC}; Valpref(F2 ) = {EC t t SO <SBA J}. Therefore, taking into <SFt2 J <SFt2 SO}; Valpref(BA) = {EC<SBA account that F1 and F2 have a charity dependency relation between them, the AFAS for this example is shown in the ﬁgure 1a.

An Abstract Argumentation Framework for Supporting Agreements

A3 F2w SO

A1 F1w EC

A5 F1w&F2nw J

A3 F2w SO

A6 F1nw&F2w J

A2 F1nw SO

A1 F1w EC

A5 F1w&F2nw J

A4 F2nw EC

183

A6 F1nw&F2w J

A2 F1nw SO

A4 F2nw EC

Fig. 1. a) Example AF AS; b) Example AF ASF2

A3 F2w SO

A1 F1w EC

A5 F1w&F2nw J

A3 F2w SO

A6 F1nw&F2w J

A2 F1nw SO

A4 F2nw EC

A1 F1w EC

A5 F1w&F2nw J

A6 F1nw&F2w J

A2 F1nw SO

A4 F2nw EC

Fig. 2. a) Example of AF ASF1 ; b) AF ASF1 modified

Now, lets consider what happens with speciﬁc agents by creating their AAFAS. For instance, recalling that F1 prefers economy to other values and gives solidarity the lesser value (SO <SFt1 J <SFt1 EC) we have that AAF ASF1 is the following: AAF ASF1 = < Ag, Rl, D, G, N , A, R, V , Role, DependencySt , Group, V alues, val, V alprefF1 >. Then, eliminating the unsuccessful attacks (due to value preferences of F1 ) we have the equivalent AF ASF1 for AAF ASF1 as AAF ASF1 = < A, {attacks(A1, A3), attacks(A2, A4), attacks(A5, A2), attacks(A5, A3), attacks(A6, A5), attacks(A6, A1), attacks(A6, A4)}, St > which is shown in the graph of Figure 2a. This graph has the preferred extension P EF1 = {A6}, meaning that F2 should be the beneﬁciary of the water-right transfer to promote solidarity and the no intervention of a jury. This demonstrates how the power dependency relation of BA prevails over farmers and their arguments. Otherwise, if we change the environment and set a charity dependency relation of t BasinAdministrator, the prefbasin administrators over farmers F armer <SCh erences of F1 would prevail and the graph would be as the one of Figure 2b. In this case, the preferred extension would be P EF1 modif ied = {A1, A4, A5} that would defend F1 as the beneﬁtiary of the transfer agreement. In its turn, F2 gives the highest value to solidarity, but prefers to avoid a jury over economy (EC <SFt2 F <SFt2 SO). Therefore, its asssociated AAF ASF2 would be the following: AAF ASF2 = < Ag, Rl, D, G, N , A, R, V , Role, DependencySt , Group, V alues, val, V alprefF2 >. Then, eliminating the unsuccessful attacks we have the equivalent AF ASF2 for AAF ASF2 as AFF2 = < A, {attacks(A3, A1), attacks(A2, A4), attacks(A2, A5), attacks(A3, A5), attacks(A6, A5), attacks(A6, A1), attacks(A6, A4)}, St > which is shown in the graph of the

184

S. Heras, V. Botti, and V. Juli´ an

Figure 1b. This graph has the preferred extension P EF2 = A2, A3, A6 that means that F2 defends its position as beneﬁciary of the water transfer.

5

Conclusion

In this paper we have presented an abstract argumentation framework to help reaching agreements in agent societies. After deﬁning our concept of agent society, we have provided the formal deﬁnition of our argumentation framework. This is an extension of Dung’s framework [11] to include agents’s values, value preference orders and dependency relations. The framework has been illustrated in a real scenario of a water-rights transfer market. Acknowledgements. This work is supported by the Spanish government grants CONSOLIDER INGENIO 2010 CSD2007-00022, TIN2008-04446 and TIN200913839-C03-01 and by the GVA project PROMETEO 2008/051.

References 1. Rahwan, I.: Argumentation in multi-agent systems. Autonomous Agents and Multiagent Systems, Guest Editorial 11(2), 115–125 (2006) 2. Bench-Capon, T., Dunne, P.: Argumentation in artificial intelligence. Artificial Intelligence 171(10-15), 619–938 (2007) 3. Ferber, J., Gutknecht, O., Michel, F.: From agents to organizations: an organizational view of multi-agent systems. In: Giorgini, P., M¨ uller, J.P., Odell, J.J. (eds.) AOSE 2003. LNCS, vol. 2935, pp. 214–230. Springer, Heidelberg (2004) 4. Oliva, E., McBurney, P., Omicini, A.: Co-argumentation artifact for agent societies. In: Rahwan, I., Moraitis, P. (eds.) Argumentation in Multi-Agent Systems. LNCS (LNAI), vol. 5384. Springer, Heidelberg (2009) 5. Bench-Capon, T., Atkinson, K.: Abstract argumentation and values. In: Argumentation in Artificial Intelligence, pp. 45–64 (2009) 6. Dignum, V.: PhD Dissertation: A model for organizational interaction: based on agents, founded in logic. PhD thesis (2003) 7. Artikis, A., Sergot, M., Pitt, J.: Specifying norm-governed computational societies. ACM Transactions on Computational Logic 10(1) (2009) 8. Criado, N., Argente, E., Botti, V.: A Normative Model For Open Agent Organizations. In: International Conference on Artificial Intelligence, ICAI 2009 (2009) 9. Perelman, C., Olbrechts-Tyteca, L.: The New Rhetoric: A Treatise on Argumentation (1969) 10. Searle, J.R.: Rationality in Action (2001) 11. Dung, P.M.: On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming, and n -person games. Artificial Intelligence 77, 321–357 (1995) 12. Baroni, P., Giacomin, M.: Semantics of Abstract Argument Systems. In: Argumentation in Artificial Intelligence, pp. 25–44. Springer, Heidelberg (2009) 13. Botti, V., Garrido, A., Giret, A., Noriega, P.: Managing water demand as a regulated open mas. In: Workshop on Coordination, Organization, Institutions and Norms in agent systems in on-line communities, COIN 2009, vol. 494, pp. 1–10 (2009)

Reaching a Common Agreement Discourse Universe on Multi-Agent Planning Alejandro Torre˜ no, Eva Onaindia, and Oscar Sapena Departamento Sistemas Informaticos y Computacion Universidad Politecnica Valencia Valencia, Spain {atorreno,onaindia,osapena}@dsic.upv.es

Abstract. Multi-Agent Planning (MAP) is the problem of having a group of agents working together to solve a problem that requires a collective eﬀort. When coordination on a MAP system is done through negotiation, agents must share a common ontology. In this paper we propose a mechanism to reach a shared ontology through the deﬁnition of a common information model.

1

Introduction

The term Multi-Agent Planning (MAP) refers to any kind of planning in multiagent environments. MAP is concerned with planning by multiple agents, i.e. distributed planning, or planning for multiple agents, i.e. planning for multi-agent execution. In general, MAP accounts for the problem of planning in domains where several agents plan and act together. In our MAP approach, we have multiple independent agents devising a global plan jointly in an environment which they have diﬀerent views. To do so, agents are capable of sharing some of their planning information and negotiating to build plans. Hence, in this context, it is assumed that agents share a common agreement discourse universe [1], i.e. agents have to know a common set of concepts in order to negotiate. Some approaches to MAP, as Cooperative Distributed Planning (CDP) [5], put the emphasis on extending planning to a distributed environment with fully cooperative agents. Others consider self-interested agents, deﬁning MAP as the problem of ﬁnding a plan for each agent that achieves its private goals, such that these plans together are coordinated and the global goals are also met [3]. Under this perspective, the emphasis is on how to manage the interdependencies between the agents’ plans and how to solve the coordination problem [2,4]. In our MAP approach, agents are heterogeneous, i.e. they have diﬀerent visions of the environment, they can have private goals and they manage diﬀerent ontologies. Coordination among agents is achieved through a negotiation process, for which agents must share an agreement discourse universe. In this paper, we deﬁne a set of mechanisms to deﬁne a common information model built on top of each agent’s local model. This common model provides agents with a shared ontology based on PDDL (the Planning Domain Deﬁnition E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 185–192, 2010. c Springer-Verlag Berlin Heidelberg 2010

186

A. Torre˜ no, E. Onaindia, and O. Sapena

Language [7]), adapting the representation of the information contained in the agent’s local model in such a way that coherence between both models is ensured. The paper is organized as follows: next section presents the MAP model; section 3 presents the notion of heterogeneous planning agents and the issues raised by them; section 4 deﬁnes the mechanisms to design the common information model; following, we show an example of application, and last section concludes.

2

Multi-Agent Planning Model

The informal deﬁnition of a MAP problem can be stated as follows: given a description of the initial state, a set of global goals, a set of (at least) two agents, and for each agent a set of its capabilities and (probably) its private goals, ﬁnd a single competent plan that achieves the global and private goals. This deﬁnition contains two key aspects: a collaborative task by which all the agents should cooperate to attain the global goals, and an individual task that guides agents’ proposals towards the resolution of their own private goals. Therefore, a MAP problem can be seen as a Cooperative Distributed Planning (CDP) task while, in addition, agents have their own planning task to solve. First, we will focus on the CDP problem, and then on the individual problem of each agent. Definition 1 A CDP task is a tuple T = AG, Θ, P, I, G, F where AG = {1 . . . n} is a finite, non-empty set of planning agents, Θ is the set of actions that describe the state changes in the domain, P is a finite set of propositional state variables, I ⊆ P is the initial state, G ⊆ P are the problem goals and F is a utility function to select a plan when several choices are available. When solving a CDP task, agents in AG are aimed at ﬁnding an executable plan (if such a plan exists) which, applied to the initial state, leads to a state in which all problem goals hold. However, each agent has a diﬀerent planning task to solve, since agents may have private goals and the knowledge of the problem’s overall state is distributed, having each individual only a partial view. Consequently, agents will have diﬀerent initial states and most likely diﬀerent capabilities so they will actually be solving diﬀerent planning problems. On the other hand, when having common goals, agents must have some limited knowledge of the models of the rest of individuals, since this information may be potentially signiﬁcant for coordination. In our framework, the planning model of an agent encodes partial information on the capabilities of each other agent in order to promote a coherent coordination towards a joint plan. A CDP task can thus be interpreted as solving as many planning tasks as agents in AG. A diﬀerent planning task Ti = Θi , I i , Gi , Fi is associated to each agent i ∈ AG such that solving T implies solving ∀i∈AG Ti : – Θi ⊆ Θ represents the set of actions in the model of agent i. This set includes some limited knowledge of the abilities of the other agents. Formally, we deﬁne Θi = Γi ∪ Δi , where Γi denotes the actions executable by agent i, and Δi denotes the set of actions executable by any other agent j/j = i.

Reaching a Common Agreement Discourse Universe on MAP

187

– Ii ⊆ I denotes the local knowledge of agent i, its partial perspective of the environment. Formally, we can deﬁne I = ∀i∈AG Ii . – Gi = G ∪ PGi , where G are the goals of the CDP task, and PGi ⊆ P is the set of agent i’s private goals. PGi is an optional parameter. – Fi is the utility function of the agent, which can be diﬀerent to the global one, F . F will be used by all the participants to discuss the CDP solution in the same terms.

3

Planning Agents Information Model

In our MAP approach, planning agents have an information model which deﬁnes their view of the environment. As stated in the MAP deﬁnition, this model includes a description of the initial state, the set of actions that can be performed by the agent, the private and global goals, the agent’s utility function, and some information on the environment and on the abilities of the rest of agents. The participants must use a common ontology, in order to build jointly and negotiate over problem solutions. However, our MAP framework allows the presence of heterogeneous agents, which do not share a common ontology. This fact implies that the participants do not share an agreement discourse universe. As our MAP approach achieves coordination among agents through negotiation, the absence of a common discourse universe prevents the agents from negotiating and from reaching an agreement on the design of the joint plan. Consequently, a mechanism to establish a common ontology is required in order to tackle this issue. In our MAP framework, each agent initially receives a PDDL-based information model. More precisely, these models are based on PDDL2.1 [7], one of the most popular extensions to PDDL. The PDDL-based ontologies of heterogeneous agents may present diﬀerences on the following aspects: – Actions. Agents can have some shared abilities (actions), but it is not assured that they share the same exact representation of these abilities. To integrate knowledge about actions executable by others into an agent model, it is necessary to dispose of homogeneous representations of these abilities. – Objects. Similarly to the actions, heterogeneous agents can have diﬀerent internal representations of the objects. Again, a common representation is required for the agents to build and discuss about common plans successfully. Our approach to solve this issue is based on the deﬁnition of a new information model, which uses an ontology shared by all the individuals. This common information model acts as an upper layer that occludes the original one, allowing the agents to communicate appropriately. Hence, planning agents will manage a common and a local information model. Both models have to be coherent with each other, so the information in the local layer can be migrated to the common one. Coherence between models is a key aspect of our design, since it is necessary to establish a mapping between them. The next section exposes the mechanisms that have been designed to deﬁne the common information model.

188

4

A. Torre˜ no, E. Onaindia, and O. Sapena

Defining a Common Information Model

To design a common ontology shared by all the planning agents, we have deﬁned two diﬀerent mechanisms. On the one hand, we have deﬁned a set of design techniques that allow the generation of a common PDDL-based planning model, which contains homogeneous descriptions of the objects and the abilities of the agents, and is coherent with the agents’ underlying local models. On the other hand, we have extended the PDDL2.1 language with a set of constructs that allow the agents to translate information between both planning layers. The following sections detail the design techniques and the language extensions. 4.1

Modeling Techniques

Since some agents can have more detailed description of the objects and the operators than others, the design techniques are aimed to homogenize these descriptions by reducing their level of detail. The purpose of the design techniques is to create an information model that includes the simpler description of the objects and operators among the local models, thus ensuring the coherence between them. Hence, the common model will contain, in general, simpler representations of the objects and operators than the local ones. According to their eﬀect, it is possible to classify the design techniques into two groups: – Generalization techniques. Objects and operators included in the common model may be modeled as groupings of objects and operators in the agents’ local models. An object in the common model can be translated into a grouping of local objects, or multiple groupings of diﬀerent local objects. It can also be a direct mapping of a local object. Similarly, an operator can be translated as a set of local actions or directly mapped to a local action. – Detail reduction techniques. Certain objects and operators may only be included in some of the local models. This information will be considered not to be relevant for the common model, since it does not have a correspondence with all of the local models. Instead of applying a generalization technique, this information will be directly discarded from the common model. Objects generalization. At the common layer, objects must have a correspondence with the local ones. An object in the common layer can be seen as the composition of one or more instances of one or more local objects. Hence, it is possible to distinguish three diﬀerent ways to group objects: – Direct mapping. The simplest possible grouping consists in using an object at the common level as it is in a local model. – Simple grouping. This grouping technique involves a common-model object being the composition of several instances of a local object. – Multiple grouping. Unlike simple grouping, this technique allows the grouping of several instances of diﬀerent local objects into a common-model object.

Reaching a Common Agreement Discourse Universe on MAP

189

It is not possible to deﬁne any object in the common model unless it has the proper correspondence in all of the local models, thus guaranteeing that the objects of the two information layers of each agent are coherent. Operators generalization. Operators on the common model have also to be coherent with their local counterparts. The design of these operators is a similar process than the deﬁnition of Hierarchical Task Networks (HTN) [6]. It is possible, then, to consider the local operators to be primitive actions, while the common-model operators can be translated into networks of primitive actions, or directly mapped to a single one. Hence, we can distinguish two ways of deﬁning these operators: – Direct mapping. A local operator may be included in the common model as it is, replacing only its local objects for common-model ones. – Hierarchical network. This technique, based on HTN, maps commonmodel operators to networks of local ones, relating them in this way to the local model, since they are compositions of primitive (local) operators. Therefore, common-model operators must also be necessarily equivalent to single local operators or sequences of them, in order to preserve the coherence between both information levels. Detail reduction. Besides the generalization techniques, it is possible to apply a detail reduction by discarding elements from the common model that are irrelevant from a common perspective. We consider an object or an operator to be irrelevant to the common model if they are not reﬂected in the local models of all the individuals. In this situation, it is not necessary to create entities in the common model to be mapped to these objects or operators, so they are directly discarded. This way, coherence between the common information model and the participants’ local models is preserved. 4.2

Planning Language Extensions

Once we have deﬁned a common information model, it is necessary to provide the agents with a mechanism to translate the information between this layer and the underlying local model. To do so, we have included an extension in our planning language that establishes the relationship between both models, and how to translate the information from one layer to the other. More precisely, this information is speciﬁed in the mapping section, which is included into the local model. The mapping construct, models the correspondence between the two information models deﬁned through predicates. Within the section deﬁned by a mapping construct, it is possible to include several implies sentences, that associate common and local predicates. A mapping construct uses the following BNF (Backus Naur Form) syntax:

190

A. Torre˜ no, E. Onaindia, and O. Sapena

< mapping - def > < implies - def > < pred - def > < pred - def > < pred - def > < predicate > < element > < variable > < constant > < typed - list ( x ) >

::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::=

(: m a p p i n g < implies - def >+) ( i m p l i e s < com - pred >+ < loc - pred >+) < pred - def > < pred - def > ( and +) ( or +) ( < predicate > < typed - l ( e l e m e n t ) >) < name > < variable > | < constant > ? < name > < name > x*

As the syntax description states, it is possible to introduce multiple implies sentences within a mapping section. Both com-pred and loc-pred are deﬁned as a pred-def, which can be a single predicate or a conjunction or a disjunction of them. These predicates may be totally or partially instanced. Each implies sentence can be seen as a double implication of the form (com-pred ⇔ loc-pred). The predicates in these sentences act as patterns; if a common-model literal adjusts to the pattern deﬁned by com-pred, then it is possible to infer the right-hand side of the sentence, i.e. the loc-pred. Since the implication is bidirectional, the left-hand side can be inferred by having a local predicate that ﬁts into the pattern established by loc-pred. That makes possible the exchange of information between the two layers in both directions.

5

Application Example

In order to illustrate the speciﬁed mechanisms, this section presents a simple example. The example models a transport domain, in which agents act as transport agencies that use their trucks to deliver packages to a certain city. Agents work in a particular geographic area, so they will have to interact with other agents in order to deliver packages outside their area. The example includes two heterogeneous agents, Ag1 and Ag2, each having its own representation of the objects and actions. Ag1 ’s model contains the objects Truck, Package, City and Agent, that represent accurately the objects of the environment. Ag2, however, manages a less detailed representation of the objects. Its model contains the objects Package, Agent, Delivery and Area. Area refers to the geographical areas in which are placed the cities, and Delivery is a representation of a set of trucks, each of them carrying one or more packages, i.e. each Delivery is a multiple grouping that includes n trucks and n packages. Regarding the operators, Ag1 has the actions Load, to load a package into a truck; Unload, to unload a package from a truck; and Drive, to drive a truck from a city to another. Ag2 ’s model includes also three actions, Collect, to start the transportation of a delivery; Deliver, to hand over a delivery; and Move, to move a delivery from a geographical area to another.

Reaching a Common Agreement Discourse Universe on MAP

191

From this initial situation, the modeling techniques have been applied to build a common planning model. The resultant model shares the same objects with Ag2, since its model oﬀers the simplest representation of the objects in the environment. The following list shows the techniques applied in Ag1 ’s model, excluding the direct mappings: • City → Area - Simple grouping: Each Area object constitutes a simple grouping, that encloses n diﬀerent City objects. • Delivery - Multiple grouping: As previously deﬁned, a Delivery object is a multiple grouping that includes n trucks and n packages. • Truck - Detail reduction: Since the Delivery object includes the notion of truck, and the Truck object is not shared by both agents, it is not necessary to include it in the common model. The common model operators also coincide with Ag2 ’s. The following the list summarizes the techniques applied in Ag1 ’s model: • Load → Collect - Direct mapping: Collect is a direct mapping of Ag1 ’s Load action. The only diﬀerences between them lie in their parameters. • Unload → Deliver - Direct mapping: Deliver is a direct mapping of Ag1 ’s Unload action. Variations stem only from the parameters of each operator. • Drive → Move - Hierarchical network: A Move action can imply driving a truck to a city, unloading it, and loading the cargo into another truck. It is possible to repeat that pattern n times until the destination of the action is reached. Hence, the action can be seen as a hierarchical network. To illustrate the use of the :mapping section, let us focus in Ag1 ’s model, since Ag2 ’s :mapping section is more straightforward, given that all the objects on the common model have been directly mapped from its local model. Ag1 works in the area aSpain, which includes the City objects Madrid, Barcelona and Valencia. Let us also consider the common-model predicates (at ?d - Delivery ?a Area), which states that a delivery ?d is placed at an area ?a, and (in ?p Package ?d - Delivery), which indicates that a package ?p is included into a delivery ?d. Ag1 ’s local versions of these predicates state that a package ?p is in a truck ?t, and that a truck ?t is placed at a city ?c. The mapping section of Ag1 ’s model is deﬁned as follows: (: m a p p i n g ( i m p l i e s ( and ( at ? d - d e l i v e r y aSpain ) ( in ? p - p a c k a g e ? d )) ( and ( in ? p ? t - truck ) ( or ( at ? t Madrid ) ( at ? t B a r c e l o n a ) ( at ? t V a l e n c i a ))) )) The section states that having a package in a delivery placed at the geographical area aSpain implies that the package is in a truck located at Madrid, Barcelona or Valencia. This way, common-model information about deliveries and areas can be translated in terms of Ag1 ’s local objects, like cities and trucks.

192

6

A. Torre˜ no, E. Onaindia, and O. Sapena

Conclusions

This paper presents a mechanism to achieve a common agreement discourse universe among heterogeneous agents that do not share a common ontology. The problem of ﬁnding a common discourse universe constitutes a major issue in the context of a MAP system in which agents coordinate through negotiation. To tackle this issue, a PDDL-based information model, built coherently on top of the individual models of the agents, is deﬁned. This new model introduces a novel approach for handling information in MAP: each agent handles now a two-layered information model, that allows it to negotiate with the rest of participants while maintaining its original model. The model is built through a set of design techniques, the aim of which is to create a model that includes the simpler description of the objects and operators among the local models, assuring in this way the coherence between them. Hence, the common model will contain the simplest representations of the objects and the operators among the local models. Coherence is a key aspect of the deﬁned method. The common model’s objects and operators must have a correspondence with the ones included in the local models. Although the modeling techniques assure this condition by themselves, some extensions have been introduced into our planning language to establish a mapping between both models, that will allow the agents to translate information between both layers. The exposed method allows the agents to use a common ontology while preserving their original models. The main drawback of the method lies in the fact that the modeling of the common model is currently performed by hand, requiring the intervention of an expert. Hence, the future work will be focused on automating the process fully or partially. Acknowledgments. This work has been supported by the Spanish MICINN under projects TIN2008-06701-C03-03 and Consolider Ingenio 2010 CSD200700022, and the Valencian Prometeo Project 2008/051.

References 1. Carrascosa, C., Rebollo, M.: Agreement spaces for counselor agents (extended abstract). In: Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), pp. 1205–1206 (2009) 2. Chen, W., Decker, K.: Managing multi-agent coordination, planning, and scheduling. In: AAMAS, pp. 1360–1361 (2004) 3. de Weerdt, M., ter Mors, A., Witteveen, C.: Multi-agent planning. an introduction to planning and coordination. In: Handouts of the European Agent Systems Summer School (EASSS 2005), pp. 1–32 (2005) 4. Decker, K., Lesser, V.R.: Generalizing the partial global planning algorithm. Int. J. Cooperative Inf. Syst. 2(2), 319–346 (1992) 5. desJardins, M.E., Durfee, E.H., Ortiz, C.L., Wolverton, M.J.: A survey of research in distributed continual planning. AI Magazine 20(4), 13–22 (1999) 6. Erol, K., Hendler, J., Nau, D.: UCMP: A sound and complete procedure for hierarchical task-network planning. In: Proceedings of the International Conference on Artiﬁcial Intelligence Planning Systems, pp. 249–254 (1994) 7. Ghallab, M., Howe, A., Knoblock, C., McDermott, D., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL - the planning domain deﬁnition language. In: AIPS 1998 Planning Committee (1998)

Integrating Information Extraction Agents into a Tourism Recommender System Sergio Esparcia, V´ıctor S´anchez-Anguix, Estefan´ıa Argente, Ana Garc´ıa-Fornes, and Vicente Juli´ an Grupo de Tecnolog´ıa Inform´ atica - Inteligencia Artiﬁcal Departamento de Sistemas Inform´ aticos y Computaci´ on Universidad Polit´ecnica de Valencia Camino de Vera, s/n, 46022 - Valencia, Spain {sesparcia,sanguix,eargente,agarcia,vinglada}@dsic.upv.es

Abstract. Recommender systems face some problems. On the one hand information needs to be maintained updated, which can result in a costly task if it is not performed automatically. On the other hand, it may be interesting to include third party services in the recommendation since they improve its quality. In this paper, we present an add-on for the Social-Net Tourism Recommender System that uses information extraction and natural language processing techniques in order to automatically extract and classify information from the Web. Its goal is to maintain the system updated and obtain information about third party services that are not oﬀered by service providers inside the system. Keywords: recommender systems, information agents.

1

Introduction

Over the last few years, the Web has become the greatest source of available information. A goal for researchers is to obtain optimal ways of recovering speciﬁc information from the Web. One of the most important information ﬁltering techniques are recommender systems, whose goal is to present information items that could be interesting to the user. Recommender systems attempt to reduce information overload by selecting subsets of items based on user preferences. More speciﬁcally, their aim is to oﬀer new services that are adapted to the personal preferences of their users. One of the industries where recommender systems have been applied with success is the tourism industry [1,2]. Most of these systems present tour plans according to the personal preferences of the tourists. There are two main approaches on recommender systems: content-based and collaborative ﬁltering. On the one hand, content-based algorithms use item content, like name and other features. Triplehop’s Technologies TripMatcher [1], DIETORECS [3,4], and Vacation Coach’s Me-Print are three examples for ecommerce implementations that use content-based algorithms. In this kind of systems, recommendations depend on item description. On the other hand, E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 193–200, 2010. c Springer-Verlag Berlin Heidelberg 2010

194

S. Esparcia et al.

collaborative recommender systems use algorithms that give recommendations based on the user’s behavior (other users’ experiences and opinions). Some examples can be found in [2,5,6]. Collaborative ﬁltering is currently the most used technique in recommender systems. One example of collaborative recommender system is the Social-Net Tourism Recommender System (STRS) [7], a tourism application that helps tourists to make their visits to a city more proﬁtable and adapted to their preferences. It uses mobile devices, such as a phone or a PDA, allowing tourists to make reservations in restaurants or cinemas, and providing a tour plan for a day. STRS is based on multi-agent system technology and employs social networks as a mechanism to give recommendations. However, some problems arise in STRS and most tourism recommender systems: keeping up-to-date business information is a costly task and they do not oﬀer information about third party services that could enhance recommendations. For instance, a tourism system where no user oﬀers information about ﬁlms could loose the opportunity to satisfy users that are highly interested on cinema. Even although third parties are not part of the system, it should be noted that the ﬁnal goal of the system is to satisfy tourists and, consequently, increase the beneﬁts of those parties that are part of the system. Both types of information may be available on the Web. An automated update mechanism is convenient in order to be able to cope with the dynamic content that can be found on the Web. In this work, an add-on for the STRS that retrieves information from third parties and tourist service providers is presented. It is based on information agents technologies and voting processes that allow to accurately extract and classify information. The remainder of this paper is organized as follows. Section 2 provides an overview for the STRS architecture. Section 3 gives a description of the add-on: the description for information extraction and information classiﬁcation agents, and an explanation for the extended architecture. Section 4 presents the experiments used to test the system, showing the classiﬁcation accuracy of the system and an experiment in which a simple update rule for the voting power is used. Finally, section 5 presents some conclusions.

2

The Social-Net Tourism Recommender System Architecture

The Social-Net Tourism Recommender System (STRS) [7] is a tourism application, which oﬀers diﬀerent services to tourists. Its goal is to improve their stay in a city, spending their time in the most eﬃcient way, by means of the generation of tour plans. Tourists can ﬁnd two kinds of information, according to their interest: personal interesting places (restaurants, cinemas, museums, theaters...) and general interesting places (monuments, churches, beaches, parks...). Users can set their preferences using a mobile phone or a PDA. Tourists can make a reservation in a restaurant, buy tickets for a ﬁlm or a concert, and so forth.

Integrating IE Agents into a Tourism Recommender System

195

STRS integrates Multi-Agent technology and a recommender system based on social network analysis. It uses social networks to model communities of users, trying to identify the relations among them, to identify similar users that help to recommend items. STRS is formed by two subsystems that cooperate to provide comprehensive and accurate tourism recommendations: the Multi-Agent Tourism System (MATS) and the Multi-Agent Social Recommender System (MASR): – MASR is formed by four types of agents: (i) user agent : an interface between the user and the MASR; (ii) data agent : an agent responsible for the management of a database with user’s data; (iii) recommender agent : receives all the recommendation and users registration queries; and (iv) social-net agent : adds a node onto the social agent when a user joins the platform and determines the new user’s similarity with regards to the other proﬁles. – MATS is also composed by four agents: (i) broker agent: in charge of establishing a communication between the user and sight agents; (ii) sight agent : manages all the information regarding the characteristics and activities of a speciﬁc place of interest in the city; (iii) user agent : allows tourists to use the diﬀerent services by means of a GUI on their mobile devices; and (iv) plan agent : establishes and manages all the planning process oﬀered by the system, taking into account preferences and searches. As stated above, one of the main problems of this proposal is to maintain information up-to-date and to integrate information that could be interesting for the tourists and which is placed outside of our system. Therefore, next section presents an add-on for STRS that is capable of maintain the information updated and is able to introduce new information which is not supplied by any of the service providers of the system.

3

The Information Agents Add-on in the STRS

The proposed add-on is based on information agents that use natural language processing techniques retrieve information from the Web. More speciﬁcally, there are two diﬀerent types of main agents that collaborate to accomplish this task: information extraction agents (IE agents) and information classiﬁcation agents (IC agents). The ﬁrst ones extract the information from the Web sources, whereas the second ones classify the extracted information according to their service category (e.g., concerts, cinema, theater plays, etc.). Additionally, there are two types of auxiliary agents that help IC agents: contact agents and a trusted mediator. In the following subsections, we describe both types of main agents and their integration with the STRS. 3.1

Information Extraction Agents

As it has been stated before, IE agents extract information from the required websites. According to classical information agent classiﬁcations, they can be

196

S. Esparcia et al.

categorized into the wrapper agents category [8,9]. They look for speciﬁc text patterns that point to relevant information for the STRS such as service name, address, price, time, duration, etc. It must be noted that one IE agent is required per website that needs to be analyzed. The general steps that our IE agents carry out in order to obtain the required information are: (i) Send a HTTP petition to the target website and wait for its response; (ii) Analyze the HTML code of the response received; (iii) Look for speciﬁc patterns in the HTML code that point to the desired information; (iv) Extract and process the information; (v) Send the service description to IC agents. 3.2

Information Classification Agents

Since the service information extracted from the web may not be explicitly categorized, it is necessary to provide an additional mechanism that allows to perform such categorization task. Each IC agent is specialized in scoring into one speciﬁc event category (e.g., one agent specializes in concerts, other agent specializes in theater plays, etc.). This classiﬁcation is carried out by means of matching rules based in Natural Language Processing (NLP) knowledge that are applied to the received service descriptions. We propose two diﬀerent types of rules for IC agents: 1. Term Strength rules: Term Strength (TS) [10,11] is a measure of how relevant a word/lemma is with respect to a speciﬁc category. The Term Strength of words with respect to a speciﬁc category is precalculated using a corpus. We propose the following mechanism for TS rules: TS rules look for lemmas whose TS value has been precalculated during the training phase. If a match is found, the matched TS rule rj produces a score vote SCT S equal to the precalculated TS for the word. 2. Hyperonym rules: These rules are based on hyperonym trees found in Wordnet [12]. Hyperonymy trees represent the semantic relation between a speciﬁc word (root) and a more general related concept (intermediate nodes). Each branch represents a diﬀerent sense ordered by frequency (the ﬁrst branch represents the most common sense). We propose a rule based on the matching of speciﬁc patterns (usually provided by an expert) in hyperonym trees. If the pattern contained in rule rj is found then the rule produces a score vote SCH that is equal to: SCH (wi ) =

|S(wi )| − (i − 1) |S(wi )| k k=1

(1)

where wi is the word/lemma analyzed, |S(wi )| is the number of senses of wi , and i is the number of sense where the pattern was found. This way, less common senses score lower than the most frequent senses. The ﬁnal score vote associated to a word wi is equal to the score of the matching rule, if any, that produced the maximum score vote for wi . Therefore, the ﬁnal

Integrating IE Agents into a Tourism Recommender System

197

score vote for a service description is equal to the sum of the ﬁnal scores produced by the words that are part of the description. Each IC agent has its own set of rules that is specialized in scoring a speciﬁc service category. Ideally, the agent should produce high scores for descriptions that belong to their expertise category and low scores for other categories. IC agents form a mediated agent organization whom has two advantages: IE agents only need to know the contact agents of the organization and mediators can govern the classiﬁcation process. The contacts agents oﬀer the classiﬁcation service, that is called by IE agents. The service call needs a service description to be provided by IE agents. This service description is broadcasted by contact agents to all of the IC agents and a trusted mediator. The trusted mediator starts a voting process where all of the IC agents have to emit a vote that reﬂects if they believe that the service description can be categorized into their expertise area. After all of the IC agents have voted, the trusted mediator assign a service category to the service description that is equal to the expertise area of the agent whose vote scored higher. This expertise area is sent back by the contact agent to the service invoker. The mediator can regulate the voting process by adjusting the voting power (vpai ) of each IC agent according to past experiences. This voting process can be formalized as follows: Category(W ) = Expertise(argmax vpai ∗ SCai (W )) ai ∈ICS

(2)

where ICS is the set of IC agents, and vpai is the voting power that the mediator grants to the agent ai . 3.3

Integrating the Add-on in the STRS

The integration of the add-on in the STRS and the way it works can be summarized as follows: 1. An IE agent extracts a service description from the Web. 2. This agent requests a classiﬁcation service to the IC organization. The argument of the call is the service description previously extracted. 3. The contact agent of the organization receives the service call and broadcasts it to all of the IC agents and the trusted mediator. 4. The trusted mediator starts the voting process and waits for the votes. IC agents send their votes and their expertise category to the trusted mediator. 5. The mediator decides which category to assign to the service description based on the highest score and each agent voting power. 6. The classiﬁcation result is sent back to the invoking IE agent. 7. This IE agent sends a message with the information extracted and its associated category to the corresponding sight agent in the STRS. Sight agents manage all the information regarding the characteristics and activities of a speciﬁc place of interest in the city. The complete architecture of the Social-Net Tourism Recommender System and the designed add-on can be found in Fig. 1. It shows the whole process of information extraction, information classiﬁcation and its integration in the STRS system.

198

S. Esparcia et al.

Fig. 1. The architecture of the STRS system and its add-on

4

Experiments

The ﬁrst experiment consisted in testing the classiﬁcation accuracy of IC agents. Three diﬀerent service categories were employed: concerts, exhibitions, and theater plays. One IC agent per category was designed. Therefore, three diﬀerent rule sets were designed using the information of a balanced corpus of 600 service descriptions (70% training, 30% test) and information provided by experts. The voting power of each agent was ﬁxed to wai =1, and it remained static during the whole process. It must be noted that the proposed method was compared with a baseline that only used the knowledge provided by Term Strength. The results can be found in Fig. 2.b. It can be observed that the proposed method performs better in classiﬁcation error that the Term Strength method. This improvement is obtained due to the use of term strength rules (statistical knowledge) and hyperonym rules (expert knowledge). The second experiment aimed to show how the trusted mediator can govern the voting process carried out to classify event descriptions. The three agents that were used in the ﬁrst experiment were also used in this second experiment (music agent, theater agent, exhibition agent). Additionally, three malicious agents (badly designed) that represent music, theater, exhibition categories were also used. These agents generate high scores with a high probability despite the true category of the service description, thus they usually introduce error in the classiﬁcation service. The mediator updated the voting power of each agent each 10 service calls. It applied a decay on the voting power vpai based on the behavior of the agent in the past 10 service calls. The decay formula can be formalized as follows: T Pai F Pai t vpt+1 + (3) ai = vpai − |Nother | |N |

Integrating IE Agents into a Tourism Recommender System

199

Voting power evolution MusicAgent TheaterAgent ExhibitionAgent BadDesignAgent1 BadDesignAgent2 BadDesignAgent3

1

a)

Voting power

0.8

Experiment 1: Classiﬁcation error Training 11.79% Proposed method Test 11.11% b) Training 17.65% Term Strength Test 16.67%

0.6

0.4

0.2

0

20

40

60

80

100

Number of service calls

Fig. 2. a) Evolution of agents’ voting power for the second experiment. b) Results for the ﬁrst experiment. t where vpt+1 ai is the new voting power, vpai is the voting power of agent ai in the last check, F Pai is the number of times where the system decision was given by agent ai and the correct service category was not the one ai represents, T Pai is the number of times where the system decision was given by agent ai and the correct service category was the one ai represents, |N | is the total number of service calls (10 in this case), and |Nother | is the total number of service calls whose associated service category is not the one agent ai represents. The experiment was run for 100 random service calls and its results can be observed in Fig. 2.a. This ﬁgure shows the evolution of agents’ voting power. As the number of service calls increases, the voting power of malicious agents is nulliﬁed whereas the voting power of the other agents remains almost intact. Consequently, the mediator was capable of regulating the voting process in order to provide a better classiﬁcation service.

5

Conclusions

In this work, an add-on for the Social-Net Tourism Recommender (STRS) has been presented. The recommender system is based on multi-agent system technology and it uses social networks to make its recommendations. The add-on allows to keep and retrieve information about third party services and providers of the system. Information agents that employ natural language processing techniques are used in order to extract and classify the web information that the system add-on requires for the STRS. Additionally, mediated voting processes are employed to classify the extracted information into diﬀerent service categories. Some experiments have been carried out that show the classiﬁcation accuracy of the extracted information and how bad designed agents can be neutralized by means of mediated voting processes.

200

S. Esparcia et al.

Acknowledgments. This work is supported by TIN2009-13839-C03-01, TIN2008-04446 and PROMETEO/2008/051 projects of the Spanish government, CONSOLIDER-INGENIO 2010 under grant CSD2007-00022, and FPU grant AP2008-00600 awarded to V.Sanchez-Anguix.

References 1. Ricci, F., Werthner, H.: Case-based querying for travel planning recommendation. Information Technology and Tourism 4(3-4), 215–226 (2002) 2. Loh, S., Lorenzi, F., Saldana, R., Litchnow, D.: A tourism recommender system based on collaboration and text analysis. Information Technology and Tourism 6, 157–165 3. Fesenmaier, D., Ricci, F., Schaumlechner, E., Wober, K., Zanella, C.: Dietorecs: Travel advisory for multiple decision styles. In: Proc. of the ENTER 2003, pp. 232–242 (2003) 4. Herlocker, J., Konstan, J.A.: Content-independent task-focused recommendation. IEEE Internet Comput. 5, 40–47 (2001) 5. Rudstrom, A., Fagerberg, P.: Socially enhaced travel booking: a case study. Information Technology and Tourism 6(3) 6. Sebastia, L., Garcia, I., Onaindia, E., Guzman, C.: E-tourism: A tourist recommendation and planning adaptation. Int. J. Artif. Intell. Tools 18(5), 717–738 (2009) 7. Lopez, J.S., Bustos, F.A., Julian, V., Rebollo, M.: Developing a multiagent recommender system: A case study in tourism industry. International Transactions on Systems Science and Applications 4, 206–212 (2008) 8. Flesca, S., Manco, G., Masciari, E., Rende, E., Tagarelli, A.: Web wrapper induction: a brief survey. AI Commun. 17(2), 57–61 (2004) 9. Kushmerick, N., Thomas, B.: Adaptive information extraction: Core technologies for information agents. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 79–103. Springer, Heidelberg (2003) 10. John Wilbur, W., Sirotkin, K.: The automatic identiﬁcation of stop words. J. Inf. Sci. 18(1), 45–55 (1992) 11. Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proc. of the SIGIR 1995, pp. 256–263 (1995) 12. Miller, G.A.: WordNet: a lexical database for English. ACM Commun. 38(11), 41 (1995)

Adaptive Hybrid Immune Detector Maturation Algorithm Jungan Chen, Wenxin Chen, and Feng Liang Electronic Information Department, Zhejiang Wanli University No.8 South Qian Hu Road Ningbo, Zhejiang, 315100, China [email protected], [email protected], [email protected]

Abstract. In this work, a novel Adaptive Hybrid Immune Detector Maturation Algorithm is proposed for anomaly detection. T-detector Maturation Algorithm and Dynamic Negative Selection Algorithm are combined with a new state transformation model. Experiment results show that the proposed algorithm solves the population-adapt problem and can generate detectors with higher affinity. Keywords: artificial immune system, anomaly detection, adapt problem.

1 Introduction Nowadays, Artificial Immune System (AIS) is used to construct the algorithms based on negative selection, immune network model, or clonal selection[1][2][3].It is applied in many areas such as anomaly detection, classification, learning and control algorithm[4][5][6]. Negative Selection Algorithm (NSA) is firstly proposed to generate detectors which are applied to anomaly detection [1]. In NSA, match rule is one of the most important components and used to decide whether two strings are matched. There are many match rules proposed [7] [8] [9]. But no matter what kind of match rules, the match threshold (r) is constant and can not be adapted to the change of self data, which called self-adapt problem. Inspired from T-cells maturation’s process, match range model is proposed to solve the self-adapt problem and T-detector Maturation Algorithm(TMA) is put forward [10][11][12]. Besides the self-adapt problem, NSA can not generate dynamic detectors varied with nonselves, which called nonself-adapt problem. Inspired from the affinity maturation’s process, Dynamic Negative Selection Algorithm Based on Affinity Maturation (DNSA-AM) is proposed to solve the nonself-adapt problem[13]. As NSA is used to delete detectors which detect any self, DNSA-AM has the self-adapt problem. Dynamic Negative Selection Algorithm Based on Match Range Model (DNSA-MRM) is proposed to solve the problem[14]. But DNSA-MRM should set the parameter, the size of detector population (PSize), which can not adapt to the change of nonselves, called as population-adapt problem. In this work, a novel algorithm called Adaptive Hybrid Immune Detector Maturation Algorithm(AHIDMA) is proposed to solve the population-adapt problem. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 201–208, 2010. © Springer-Verlag Berlin Heidelberg 2010

202

J. Chen, W. Chen, and F. Liang

As refrence[15] mentioned, ‘Generalized lymphocytes will not be fast enough for detecting specific pathogens...The immune system incorporates mechanisms that enable lymphocytes to learn the structures of specific foreign proteins; essentially, the immune system evolves and reproduces lymphocytes that have high affinities for specific pathogens’. The mechanism mentioned is affinity maturation, which enable immune system to detect antigen faster than generalized lymphocytes. AHIDMA has combined TMA with affinity maturation or DNSA. The detectors generated by TMA are just like the generalized lymphocytes and the detectors generated by affinity maturation are the specialized lymphocytes. So there is a balance between generality and specialty. Furthermore, a new state transformation model of antigen and detector is proposed to control the population size. Based on the state transformation model, the population-adapt problem is easily solved. In a word, these three adapt problems are solved through the new proposed algorithm.

2 Algorithm 2.1 The State Transformation Model In AIS, the normal set is defined as selves and anomaly set is defined as nonselves. Antigen is the suspect network activity. Detector is used to detect the anomaly or

，

nonselves. U={0 1}n ,n is the length of binary string which is the gene expression of antigen or detector. selves nonselves=U. selves∩nonselves= Φ. In match rule, the affinity is the distance between antigen and detector. There are two binary string AgBin=g1g2…gn, AbBin=b1b2…bn. The hamming distance between AgBin and Ab is:

∪

1

new

suspect 2

4 self

new

nonself

1 4

3

highest 2

maturation

3

die

1. antigen can not be detected by existed maturation detectors 2. angtigen can not be detected over specific generations 3,4. antigen detected by any new or maturation detectors

1. detector has higher affinity with antigen 2. detector detected an nonself antigen 3,4 detector has lower affinity with all antigens and can not detect any antigen

Fig. 1. Antigen and detector’s transformation model

Adaptive Hybrid Immune Detector Maturation Algorithm

203

n

d(AgBin, AbBin) = ∑ g i ⊕ bi

(1)

i =1

In this model, the set of antigen is defined as AGS. The antigen is defined as Ag= { | harmmax,undetectedCount N, state {‘new’,’suspect’,’self’,’nonself’}}. AgBin is an binary string. d is the hamming distance with detector. Harmmax is the max d. Antigen has four states shown in top of Fig.1. State ‘new’ means an antigen just inject into the algorithm and ‘nonself’ means an antigen is detected by the detectors. If an antigen can not be detected after one generation, its state is changed to ‘suspect’ and its undetectedCount is inscreased. If an antigen can not be detected over many generations and its undetectedCount is bigger than the max undetected generations, maxUndetectedCount, its state is changed to ‘self’. The ‘self’ and ‘nonself’ will be removed from AGS. Similarly, the set of detectors is defined as DCTS. The detector is defined as dct = { | harmmax, selfmin, selfmax N,sate {‘new’,’highest’,’maturation’,’die’}}. AbBin is a binary string. selfmax is the maximized distance between dct.AbBin and selves and selfmin is the minimized distance. Detector has also four states shown in the bottom of Fig.1. State ‘new’ means a detector or lymphocyte is just generated. If a detector detects any nonself antigen, its state is changed to ‘maturation’. If a detector has the highest distance or affinity with a specific antigen than other detectors, its state is changed to ‘highest’. Otherwise, its state is changed to ‘die’ and it will be removed from DCTS.

∈

∈

∈

∈

2.2 Implementation of the Model

∈

In one detector, selfmax and selfmin is calculated by setMatchRange(dct, selves), k [1, |selves| ], selfk selfves. In equation 2, [selfmin,selfmax] is defined as self area. Others are as nonself area. Suppose there is a binary string x U and one detector dct DCTS. When d(x,dct.Ab) ∉ [dct.selfmin, dct.selfmax], x is detected as anomaly. It is called as Range Match Rule (RMR)[11].

∈

∈

∈

⎧ selfmin = min({d(self k , dct.AbBin)}) setMatchRange = ⎨ ⎩selfmax = max({d(Self k , dct.AbBin)})

(2)

M =| AGS |, N =| DCTS |, i ∈[1, M], j ∈ [1, N]

(3)

In equation 3,M,N is the size of antigens, detectors. The value i is the index of antigen in AGS and the value j is the index of detector in DCTS. So the hamming distance between antigen i and detector j is calculated as following:

Ag i .d j = dct j .d i = d ij = d ( Ag i . AgBin, dct j . AbBin)

(4)

Harmmax is the max distance of one antigen with all detectors or one detector with all antigens. harmmax of Ag,dct is calculated by Equation 5,6. In equation 5, the value x means the index of the detector which have the highest distance with antigen i.

204

J. Chen, W. Chen, and F. Liang

Ag i .harm max = d i , x = max (d i* ) d i* = {d i1 , d i 2 ,...d iN ,

dct j .harm max = max (d * j ) d * j = {d1 j , d 2 j ,...d Mj ,

}

}

(5) (6)

harmBitNumij = max((d ij − dct j .self max), (dct j .self min − d ij ))

(7)

harmBitNumi* = {harmBitNumi1 , harmBitNumi 2 ,...harmBitNumiN }

(8)

Ag i .harmbitnum max = harmBitNum i , y = max (harmBitNum i* )

(9)

In this work, a parameter harmBitNum is proposed in equation 7. if harmBitNum is bigger than 1, the distance d is bigger than selfmax or smaller than selfmin,i.e, d ∉ [dct.selfmin, dct.selfmax], x is detected as anomaly according Range Match Rule. The bigger harmBitNum is, the bigger detect range dct has. harmbitnum is calculated by equation 7~9. The value y means the index of the detector which have the highest harmBitNum with antigen i. if (dct x .d i ∈ [dct x .seflmin, dct x .seflmax]) ⎧Ag i .state =' suspect ' ⎪ ⎨Ag i .undetectedCount = Ag i .undetectedCount + 1 ⎪dct .state =' highest ' ⎩ x

(10)

if ( Ag i .un det ectedCount > max un det ectedCount) Ag i .state =' self '

According equation 5, the detector x has the highest distance with antigen i. In equation 10, if angtigen i can not detect by detector x, antigen i is taken as suspect antigen, which means that antigen i required to be detected over many generations until antigen is taken as self antigen or nonself antigen. If antigen i can not be detected over maxundetectedCount generations, it is changed to ‘self antigen’. Furthermore, if detector x can not detect the antigen i, detector x is changed to highest detector.

if (dct y .d i ∉ [dct y .seflmin, dct y .seflmax]) Ag i .state =' nonself ' , dct y .state =' maturation' ,

(11)

According equation 9, the detector y has the highest harmbitnum with antigen i. In equation 11, if detector y detects the antige i, antigen i is changed to nonself antigen and detector y is changed to maturated detector. Because the detector with bigger harmBitNum has more detect range, the mechanism is used to ensure the balance of generality and specialty. 2.3 The Detection Process The algorithm proposed has combined TMA with Affinity Maturation. So it has two detect processes. Some variables are defined in equation 12~15.

Adaptive Hybrid Immune Detector Maturation Algorithm

205

∀Ag new ∈ AGS new ∈ AGS , Ag new .state =' new'

(12)

∀Ag suspect ∈ AGS suspect ∈ AGS , Ag suspect .state =' suspect '

(13)

∀dct highest ∈ DCTS highest ∈ DCTS , dct highest .state =' highest '

(14)

∀dct maturation ∈ DCTS marturation ∈ DCTS , dct highest .state =' maturation '

(15)

1.AGSnew is firstly detected by DCTSmaturation.,called TMADetect. After the process. Angtigen in AGSnew will be split into suspect or nonself antigen.If one antigen can not detected by detectors, highest detectors will be generated. 2.AGSsuspect is detected by DCTShighest,called AMDetect. In this process, new detectors are generated from the highest detectors through affinity maturation. Randomly generate new detectors, the number of detectors is | AGSsuspect |. Each detector in DCTShighest reproduces child detectors. The higher affinity the detector has , the higher the number of clones generated[3]. The total number of new detectors generated is the value of harmax There is no cross operator but mutate operator according the hypermuate principle. The higher affinity, the smaller mutate rate[3]. So mutate Rate is (the length of dct.agbin-dct.harmmax)/2.

3 Experiments The objective of the experiments is to: (1) verify the detect result and whether the size of detectors is adaptive. (2) investigate the effect of affinity maturation and selfantigen’s detection. Experiments are carried out using the famous benchmark Fisher’s Iris Data and Wisconsin Diagnostic Breast Cancer (WDBC)[5]data listed in table.1. Minimal entropy discretization algorithm is used to discretize these data sets[16]. Algorithm runs ten times. The max generation maxg=1000. For verifying the adaptive character, nonself data are changed every 50 generations, maxundetectCount=maxg. In the Iris Data, It has 4 attributes and has total 150 examples with three classes: ’Setosa’, ’ Versicolour’, ’ Virginica’. Each class has 50 examples. One of the three types of iris is considered as normal data. The other two are considered anomaly and injected into the algorithm in turn and repeatedly. As for WDBC data, it has 30 attributes and consists of 569 examples with two classes: ‘benign’, ’malignant’. 357 examples belong to ‘benign’ and 212 belong to ‘malignant’. ‘malignant’ is defined as self set and ‘benign’ as nonself set. 30 nonself data is injected every 50 generation and repeatedly. To investigate the effect of self-antigen’s detection, iris data is used. Versicolour’ is as normal data and others is as anomaly.50 nonself data are injected every 50 generations and 5 self data are injected in the first generation. The parameter maxundetectCount=100. The detection results are shown in table.1. there is none nonself-antigen can not be detected and 5 self-antigens is detected successfully.

206

J. Chen, W. Chen, and F. Liang Table 1. Data set used in expriment and results

Data- Self data Set

Setosa Virginica Versicolor Wdbc Malignant’ Iris Versicolor Iris

Size Of Self ves

SizeOf SizeOf Self Nonself Anti Antigens gens

50

0

100

212 50

0 5

357 100

Maxund SizeOf SizeOf etec Valid Antigens tedCount Detectors Cannot Detected 6.9 0 18.3 0 1000 11.2 0 52.4 0 100 11.5 0

Size Of Self Antigens 0 0 0 0 5

3.1 Population-Adapt In fig 1, the former figure shows the number of antigens changed in every 50 generations. No matter how the antigens change, there are fewer antigens which can not be detected shown in the second figure because of the adaptive population of AHIDMA shown in the fourth figure.

Fig. 1. Results of algorithm using wdbc data

3.2 Self Antigen’s Detection and the Effect of Affinity Maturation In the experiment, 50 nonself data are injected every 50 generations and 5 self data are injected in the first generation. In fig.2, the first figure presents the variation curve of the number of antigen undetected. Stimulated by the antigen undetected, daughter detectors with higher ‘harmmax’ are generated and the value ‘harmmax’ in the third figure fluctuates because of Affirnity Maturation. So does the number of detectors in the fourth figure. In the second figure, five self-antigens are detected after the 100th generations because the parameter maxundetectCount is set to 100.

Adaptive Hybrid Immune Detector Maturation Algorithm

207

Fig. 2. Results of self-antigen detection

Furthermore, the number of detectors produced by affinity maturation or TMA is in direct proportion to the number of antigens. The first figure shows that the number of antigens between the 0th and 100th generation are more than other generations, so the number of detectors between the 50th and 100th generation are more than other generation in the fourth figure. After the 100th generation, TMADetect can detect all the antigens and AMDetect will not be activated, so detector with ‘highest’ state will die and the number of detectors decreased.

4 Conclusion In this work, a new state transformation model is proposed. Based on the transformation model, a new algorithm called Adaptive Hybrid Immune Detector Maturation Algorithm(AHIDMA) is proposed. It combines the TMA with AM and solved the adapt problem, that is the self-adapt, nonself-adapt and population-adapt problem. In the algorithm, randomly detectors generating mechanism is used to implement the NSA or TMA and affinity maturation is used to implement the DNSA or AM. TMDetect process is used to detect antigen detected before and AMDetect process is used to detect antigen undetected before. Affinity maturation is the key component to produce new detectors. Acknowledgments. This work is supported by Ningbo Nature Science Foundation 200701A6301043 and Scientific Research Fund of Zhejiang Provincial Education Department 20070731. Thanks for the assistance received by using KDD Cup 1999 data set [http://kdd.ics.uci.edu/databases / kddcup99/ kddcup99.html]

References 1. Forrest, S., Perelson, A.S., Allen, L., Cherukuri, R.: Self-nonself Discrimination in a Computer. In: Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy (1994)

208

J. Chen, W. Chen, and F. Liang

2. de Castro, L.N., Von Zuben, F.J.: aiNet: An Artificial Immune Network for Data Analysis. In: Abbass, H.A., Sarker, R.A., Newton, C.S. (eds.) Data Mining: A Heuristic Approach, ch. XII, pp. 231–259. Idea Group Publishing, USA (2001) 3. de Castro, L.N., Von Zuben, F.J.: Learning and Optimization Using the Clonal Selection Principle. IEEE Transactions on Evolutionary Computation, Special Issue on Artificial Immune Systems (2002) 4. Chen, T.-C., Chen, C.-Y.: Satellite-Derived Land-Cover Classification Using Immune Based Mining Approach. In: 5th WSEAS International Conference on Applied Computer Science (2006) 5. Kim, J.W.: Integrating Artificial Immune Algorithms for Intrusion Detection, PhD Thesis, Department of Computer Science, University College London (2002) 6. Huang, T.L., Lee, K.T., Chang, C.H., Hwang, T.Y.: Two-Level Sliding Mode Controller Using Artificial Immue Algorithm. WSEAS Transcations on Power Systems (2006) 7. Hofmeyr, S.A.: An Immunological Model of Distributed Detection and its Application to Computer Security, PhD Dissertation, University of New Mexico (1999) 8. Gonzalez, F.: A Study of Artificial Immune Systems applied to Anomaly Detection, PhD Dissertation, The University of Memphis (May 2003) 9. Ji, Z., Dasgupta, D.: Revisiting Negative Selection Algorithms. Evolutionary Computation (2007) 10. Yang, D., Chen, J.: The T-Detectors Maturation Algorithm Based on Genetic Algorithm, LNCS. Springer, Heidelberg (2004) 11. Yang, D., Chen, J.: The T-detectors maturation algorithm based on match range model. In: Proceedings of the 2005 ACM Symposium on Applied Computing (2005) 12. Chen, J.: T-detectors Maturation Algorithm with Min- Match Range Model. In: The 3rd IEEE International Conference on Intelligent System (2006) 13. Wenjian, L.: Research on Artificial Immune Model and Algorithms Applied to Intrusion Detection, PhD Dissertation, University of Science and Technology of China (2003) 14. Chen, J.: Dynamic Negative Selection Algorithm Based on Match Range Model, LNCS. Springer, Heidelberg (2005) 15. http://www.cs.unm.edu/~immsec/html-imm/affmat.html 16. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features, http://robotics.stanford.edu/~ronnyk/disc.ps

Interactive Visualization Applets for Modular Exponentiation Using Addition Chains Hatem M. Bahig and Yasser Kotb Computer Science Division, Department of Mathematics, Faculty of Science, Ain Shams University, Cairo, Egypt {hmbahig,kotb}@asunet.shams.edu.eg

Abstract. Online visualization systems have come to be heavily used in education, particularly for online learning. Most e-learning systems, including interactive learning systems, have been designed to simplify understanding the ideas of some main problems or in general overall course materials. This paper presents a novel interactive visualization system for one of the most important operation in public-key cryptosystems. This operation is modular exponentiation using addition chains. An addition chain for a natural number e is a sequence 1 = a0 < a1 < . . . < ar = e of numbers such that for each 0 < i ≤ r, ai = aj +ak for some 0 ≤ k ≤ j < i. Finding an addition chain with minimal length is NP-hard problem. The proposed system visualizes how to generate addition chains with minimal length using depth-ﬁrst branch and bound technique and how to compute the modular exponentiation using addition chains. keywords: addition chain, branch and bound algorithm, public-key cryptosystem, visualization.

1

Introduction

The modular exponentiation (computing me mod n, for given m, e, and n) is one of the most important operations for many public-key cryptosystems. For example, in the RSA [17], the encryption of the message m is me mod n, where n, and e are the public-key. In general, modular exponentiation is computed using a chain of modular multiplications. There are two strategies to improve the throughput of these cryptosystems implementation. The ﬁrst strategy is optimizing the multiplication [11]. The second strategy is reducing the number of the required modular multiplications. The computation of exponentiation me with minimal number of multiplications given that the only operation allowed is multiplying two already-computed powers corresponds to the problem of ﬁnding a sequence of increasing natural numbers approaching the exponent e such that the sequence starts with 1, which represents the element m, ends with e which represents the elements me and every other element in the sequence is the sum of two preceding elements (not necessary distinct). Such sequence is called an addition chain. Finding the minimal length addition chains is NP-hard problem [10]. Therefore, there is a need to understand this problem to improve the performance of E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 209–216, 2010. c Springer-Verlag Berlin Heidelberg 2010

210

H.M. Bahig and Y. Kotb

those cryptosystems. There are two directions to ﬁnd addition chains. The ﬁrst one is to ﬁnd a short (not necessary minimal length) addition chains. The other is to ﬁnd a minimal length addition chains. Artiﬁcial Intelligence (AI) can play in both directions. For the ﬁrst direction, many techniques in AI have been introduced to ﬁnd addition chain with short length. For examples, Genetic algorithms [4,14], Ant colony algorithms [15], Swarm algorithms [13] and Artiﬁcial Immune System paradigm [5] have been applied to generate a short addition chain. For the second direction, search tree techniques in AI for exact solutions have been applied to generate a minimal length addition chain, see [2,7] for examples. A depth-ﬁrst branch and bound algorithm is the best known technique to generate minimal length addition chain [1,19]. Therefore, we concentrate on it. E-learning can be deﬁned as technology-based learning in which learning material is delivered electronically to remote learners via a computer network. Elearning (or Internet-based learning) could be seen as a professional level of education but with the advantages of lower time and cost. Some other advantages of e-learning include larger learning population, shortage of qualiﬁed training staﬀ and lower cost of campus maintenance, up-to-date information and accessibility. In a typical e-learning environment the lecturers, students and information are in diﬀerent geographical locations and are connected via the Internet. Traditional web-based courses usually are static hypertext pages without student adaptability. However, since last ninetieth, several research teams are implementing diﬀerent kinds of adaptive and intelligent systems for Web-based education [3]. There are few softwares to help students to understand cryptographic protocols and operations, for examples [6,8,9]. There is no educational system for modular exponentiation operation. This motivates us to visualize modular exponentiation using addition chain. The paper is organized as follows. Section 2 includes overview of addition chains. In Section 3, we present the uses of depth-ﬁrst branch and bound algorithm to generate addition chains with minimal length. In Section 4, we present a visualization of addition chains. In Section 5, we give a brief discussion with classroom methodological studies. Finally, Section 6 presents the conclusion and future work.

2

Overview of Addition Chains

In this section we mention some basic deﬁnitions, notations, and facts about addition chains. An addition chain [12] of length r for a natural number e is a strictly increasing sequence of (r + 1)-natural numbers 1 = a0 , a1 , . . . , ar = e such that for each i ≥ 1, ai = aj + ak for some k ≤ j < i. The integer r is called the length of the addition chain for e. The minimal length of an addition chain for e is denoted by (e). For example, the sequences 1, 2, 4, 5, 10, 15 and 1, 2, 4, 8, 12, 14, 15 are two addition chains for 15 with lengths 5 and 6 respectively. The computation of m15 using the ﬁrst chain is m, m2 , m4 , m5 , m10 , m15 while it is m, m2 , m4 , m8 , m12 , m14 , m15 using the second chain.

Interactive Visualization Applets for Addition Chains

211

Let λ(e) = log2 e and ν(e) be the number of 1’s in the binary representation of e. The ith step ai = aj + ak (0 ≤ k ≤ j < i) is called star if j = i − 1, small if λ(ai ) = λ(ai−1 ), and big if λ(ai ) = λ(ai−1 ) + 1. The length of an addition chain can be expressed as r = λ(e) + S(a0 , a1 , . . . , ar = e), where S(a0 , a1 , . . . , ai ) denotes the number of small steps in the chain up to ai . The lower bound of (e) is (e) ≥ log2 e + log2 (ν(e)) − 2.13.

(1)

A set of (lb+1)−natural numbers {bi }lb i=0 is called a bounding sequence of length lb for e if bi ≤ ai for each addition chain a0 , a1 , . . . , alb = e of length lb for e. The purpose of bounding sequences is to cut oﬀ some branches in the search tree that cannot lead to a shortest chain. Thurber [19] proposed three bounding sequences: (2) bi = e/2lb−i , i = 0, · · · , lb. lb−i−2 e/(3 · 2 ) 0 ≤ i ≤ lb − t − 2; bi = (3) e/2lb−i lb − t − 1 ≤ i ≤ lb. e/2t · (2lb−t−(i+1) + 1) 0 ≤ i ≤ lb − t − 2; (4) bi = e/2lb−i lb − t − 1 ≤ i ≤ lb. A bounding sequence is called vertical (VBS) if it is used using the condition ai < bi and is called slant (SBS) if it is used using the condition ai+1 +ai < bi+2 . Computing VBS and SBS for a given e is as follows [19]. – if ν(e) = 1 and e = (2j + 1)k, j > 0, then use Eq.(4) as VBS and Eq.(3) as SBS. – if ν(e) = 1 and e = 5k, then use Eq.(3) as VBS and SBS. – otherwise, use Eq.(2) as VBS and Eq.(2) as SBS.

3

Generating Minimal Length Addition Chains

Since ﬁnding the minimal length addition chains is NP-hard problem, diﬀerent strategies in artiﬁcial intelligence for exact solutions have been proposed to generate addition chains with minimal length [2,7]. For example, A∗ -algorithm [2,16] expands the node in the search tree whose evaluation function (f (ai ) + h(ai ), where f (ai ) is the lower bound of ai , and h(ai ) is the length of the shortest path from the goal node e to the current node ai ) gives the lowest value among all nodes that are not yet expanded. The algorithm terminates when the goal node is found and all unexpanded nodes in the search tree have a higher or equal evaluation than the goal node. However, this algorithm requires a lot of memory, in particular the size of search tree grows very fast with large e. A search algorithm is eﬃcient when the search tree can be pruned rigorously. Therefore, we use depth-ﬁrst search algorithm with branch-and-bound technique. The next algorithm [1,19] traverses a search tree using cut and branch technique. The depth of the tree starts with the lower bound of addition chains

212

H.M. Bahig and Y. Kotb

Eq.(1). At each step in the search tree, the possible children ai+1 of ai , their types (start or nonstar) and their levels i + 1 are put in the stack. The set of possible children ai+1 of ai is {ai +ak ≤ e; k ≤ i}∪{ai < aj +ak ≤ e; j, k ≤ i−1}. The children of ai constitute a stack segment. To cut some elements, and so some branches, in the search tree that cannot lead to a minimal length chain, we use VBS and SBS. VBS and SBS have two advantages. The ﬁrst one is speeding up generation of minimal length chains and the other is decreasing the maximum length of the stack. Algorithm. Generating minimal length addition chains. Input: e > 2. Output: minimal length addition chains for e Begin lb ←− log 2 e + log2 (ν(e)) − 2.13 a0 ←− 1; a1 ←− 2; loop Determine vertical and slant bounding sequences VBS and SBS for e. i ←− 1; loop find-chain if (i < lb) then Determine whether to retain ai ; if ai is retained then Push on the stack the possibilities for ai+1 ; i ←− i + 1; Let ai be the element on the top of the stack; if ai = e then Chain is found and then take the next element oﬀ of the stack that is not in the stack segment of ai ; end if else Take the next element oﬀ of the stack; Let ai be the element on the top of the stack; end if else Take the next element oﬀ of the stack that is not in the stack segment of ai ; Let ai be the element on the top of the stack; end if end loop find-chain if no chains found then lb ←− lb + 1; else exit end if end loop End.

4

Visualizing Addition Chains

This section presents our interactive visualization system for modular exponentiation using addition chains. The system contains two applets. The ﬁrst and the main one is to visualize generation of addition chains with minimal length. The second one is to visualize how to compute modular exponentiation given an addition chain. Fig. 1 shows the main applet. It contains the following.

Interactive Visualization Applets for Addition Chains

213

1. The Input Text Field: it accepts the number e for which the corresponding all possible addition chains will be generated. After the user inputs the number e and clicks on the button “Enter”, the Input Text Field will be disabled. 2. Trace Mode Option: there are two modes for our web application: the “Stepby-step” mode and “Step-over ” mode. The Step-by-step mode enables the user to generate the addition chains in step-by-step manner. In this case, the user interacts with the system in each generation step. While the Step-over mode enables the user to show the generation process without interaction with the system. The user can switch between both modes while the system is running by checking/un-checking the associated check box.

Fig. 1. Visualization Addition Chain System Snapshot

214

H.M. Bahig and Y. Kotb

3. Pause/Continue Button: this button allows the user to pause or continue the generation process. The button title changes automatically in dual manner. 4. Stack Panel: this panel shows the current state of the stack. Each cell in the stack holds (1) the element ai . (2) its index i which indicates to the level of the search tree. (3) its type: star ‘*’ or nonstar ‘!*’. The next children of each element in the search tree are pushed on the stack. 5. Path Panel: shows the current path of the search tree, i.e., the current addition chain. 6. Step Type Panel: it shows the type of each element (star or nostar) in the current addition chain (Path Panel). Both Path and Step Type Panels are run concurrently. 7. Next Child Button: this button generates the possible children of the last element (called current element/node) in the current addition chain. The current element is displayed in the button title. 8. Children Panel: it displays all possible children of the current element and their step types (star or nonstar). If there is no child, the message “No Child” will appear. 9. Push/Pop Buttons: the push button inserts the last generated children into the stack. In case of “No Child”, the system will step over the Push case and directly go to the Pop step. The Pop button is responsible to pop the top of the stack. Both buttons need not to be clicked if the user choose the “Step-over” mode. 10. Addition Chain Tree Panel: this panel shows the addition chain tree. This tree is generated simultaneously with the previous steps. The tree root starts with the value “1” and a sub-child from “1” is “2”. Starting from “2” all possible children will be generated. By clicking on each child in the tree, the unexpanded children will expand. 11. Lower Bound Information: it presents (e), the lower bound of (e), and the number of small steps S(a0 , a1 , . . . , ar ). 12. Vertical Bounding Sequence: it presents the vertical bounding sequence VBS for the number in the Input Text Field. 13. Slant Bounding Sequence: it shows the corresponding slant bounding sequence SBS for the number in the Input Text Field. 14. List of Addition Chains Panel: this panel displays all addition chains for the number in the Input Text Field. This list is generated one by one as the system working. 15. Reset Button: this button cleans all data in the applet.

5

Classroom Methodological Studies

There are ﬁve main types of empirical methodological studies [18]: controlled experiments, observational studies, questionnaires and surveys, ethnographic field techniques, and usability studies. In the current work, we concentrate on effectiveness of visualizations, i.e., how well one performs with the visualization algorithm compared to others who do not use the visualization. In fact, we see the

Interactive Visualization Applets for Addition Chains

215

domain of visualization studies as extending beyond the eﬀectiveness. A number of classroom studies have been conducted in order to evaluate and analyze the usage of the system by using both qualitative and quantitative methods. Even though classroom studies are more diﬃcult to control, they can give more externally valid results. Thus, they can be used to describe the practices that take place in classrooms and analyze how the tool usage aﬀects them. As a qualitative work; studies were conducted in graduated and post-graduated cryptography courses at our department and data were collected by observations, video recording, and interviews as well as by using questionnaires. The results were presented as categories of visualization utilizations, problems encountered when using the visualizations, types of visualizations used, etc. The content of these categories have guided the further development of the system by giving a better idea in what kinds of situations the system can be used and how it is expected to work. The quantitative studies indicated that the system helped especially the mediocre students to learn addition chain principles, and to write a program for generating minimal length addition chains.

6

Conclusions and Future Work

We have presented a highly interactive visualization system for ﬁnding a minimal length addition chains using depth-ﬁrst branch-and-bound search technique. The system contains (1) controls for stepping the process. (2) fully understand what is happening in the generation process. (3) animation. Our aims in the future work to do short-term and longitudinal methodological studies as well as mixed methods studies. Also, we will include a comparison between diﬀerent methods to generate a minimal length addition chain, and the performance of each bounding sequence.

Acknowledgements We are grateful to M. Fathy for valuable comments.

References 1. Bahig, H.: Improved generation of minimal addition chains. Computing 78, 161–172 (2006) 2. Bleichenbacher, D.: Eﬃciency and Security of Cryptosystems Based on Number Theory. ch. 4, A Docotor Thesis, Swiss Federal Institue of Technology Zurich, Zurich (1996) 3. Brusilovsky, P.: Adaptive and intelligent technologies for web-based education. Kunstliche Intelligenz. Special Issue on Intelligent Tutoring Systems and Teleteaching 4 (1999) 4. Cruz-Cort´es, N., Rodr´ıguez-Henr´ıquez, F., Ju´ arez-Morales, R., Coello Coello, C.: Finding Optimal Addition Chains Using a Genetic Algorithm Approach. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-m., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 208–215. Springer, Heidelberg (2005)

216

H.M. Bahig and Y. Kotb

5. Cruz-Cort´es, N., Rodr´ıguez-Henr´ıquez, F., Ju´ arez-Morales, R., Coello Coello, C.: An Artiﬁcial Immune System Heuristic for Generating Short Addition Chains. IEEE Trans. Evolutionary Computation 12(1), 1–24 (2008) 6. Cattaneo, G., De Santis, A., Ferraro Petrillo, U.: Visualization of cryptographic protocols with GRACE. Journal of Visual Languages and Computing 19, 258–290 (2008) 7. Chin, Y., Tsai, Y.: Algorithms for ﬁnding the shortest addition chain. In: Proceedings National Computer Symposium, Kaoshiung, Taiwan, pp. 1398–1414 (1985) 8. Cryptography demos, http://nsfsecurity.pr.erau.edu/crypto/index.html 9. Cryptool, http://www.cryptool.org/ 10. Downey, P., Leong, B., Sethi, R.: Computing sequences with addition chains. SIAM J. Computing 10(3), 638–646 (1981) 11. Gordon, D.M.: A survey of fast exponentiation methods. J. Algorithms 122, 129– 146 (1998) 12. Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, 3rd edn., vol. 2, pp. 461–485. Addison-Wesley, Reading (1997) 13. Alejandro, L., Cruz-Cort´es, N., Moreno-Armend´ ariz, M., Orantes-Jim´enez, S.: Finding Minimal Addition Chains with a Particle Swarm Optimization Algorithm. LNCS, vol. 5845, pp. 680–691. Springer, Heidelberg (2009) 14. Nedjah, N., Mourelle, L.: Minimal Addition-Subtraction Chains Using Genetic Algorithms. In: Yakhno, T. (ed.) ADVIS 2002. LNCS, vol. 2457, pp. 303–313. Springer, Heidelberg (2002) 15. Nedjah, N., Mourelle, L.: Towards minimal addition chains using Ant colony optimization. Journal of Mathematical modeling and algorithms 5(4), 525–543 (2003) 16. Nilsson, N.J.: Principles of Ariticial Intelligence, 2nd edn. Springer, Heidelberg (1982) 17. Rivest, R., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21(2), 120–126 (1978) 18. Stasko, J., Hundhausen, C.: Algorithm Visualization. In: Fincher, S., Petre, M. (eds.) Computer Science Education Research, pp. 199–228 (2005) 19. Thurber, E.: Eﬃcient generation of minimal length addition chains. SIAM J. Computing 28, 1247–1263 (1999)

Multimedia Elements in a Hybrid Multi-Agent System for the Analysis of Web Usability E. Mosqueira-Rey, B. Baldonedo del R´ıo, D. Alonso-R´ıos, E. Rodr´ıguez-Poch, and D. Prado-Gesto University of A Coru˜ na, Laboratorio de I+D en Inteligencia Artiﬁcial, A Coru˜ na, Spain http://www.dc.fi.udc.es/lidia/

Abstract. The widespread use of the World Wide Web by persons from diﬀerent age groups, diverse cultures, and with diﬀerent computer skills has put more emphasis in the process of ensuring that web sites and applications are usable. On the other hand we can see how plain text web pages have been replaced by ones in which multimedia information (images, video and audio) are an important part of them. In this context we have developed a multi-agent system for analysing web usability that performs both static and dynamic analysis of web sites. The static analysis can examine certain multimedia issues related to usability (mostly associated with images) and in this paper we also comment on the issues related with video and audio technologies.

1

Introduction

At present, web sites and web applications are used on a daily basis by persons from diﬀerent age groups, from diverse cultures, and with diﬀerent computer skills. In these web sites, multimedia elements are becoming very popular nowadays. We can see that there is a transition from old-style web pages (based on HTML code and images) to ones with audio and video clips inserted and with capabilities that resemble those of desktop applications using Rich Internet Application –RIA– technologies. At the same time, technological progress has brought about a signiﬁcant change of mentality in users, compelling us to become more demanding and to get what we want with the least eﬀort. As a consequence, we are quick to become impatient with user-unfriendly applications. In this paper we present a multi-agent system based on evolutionary learning for the usability analysis of web sites. Our ﬁrst approach is to analyse the static HTML code and to try to simulate the navigation performed by standard web users who ﬁnd typical usability problems. In the second part of the paper we analyse how the multimedia elements aﬀect the usability of web pages and our plans to inspect usability issues in those elements.

This research has been partly funded by the Xunta of Galicia Regional Government of Spain under Project 08SIN010CT.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 217–224, 2010. c Springer-Verlag Berlin Heidelberg 2010

218

2

E. Mosqueira-Rey et al.

A Multi-Agent System for the Usability Analysis of Web Sites

In this section we describe brieﬂy the multi-agent system developed to analyse the usability of web sites [1]. It uses two diﬀerent approaches: on the one hand we have a static strategy in which the content of a web page (basically HTML code) is analysed in order to ﬁnd well-known issues that can derive in usability problems. On the other hand, we follow a dynamic strategy that consists of a simulation of the browsing process. This simulation is based on modelling users who try to reach one particular URL in the web site guided by a set of goals (e.g., desired information or intended actions) that are represented by key phrases. In this case we are testing the clarity of the structure of the web site and how easily users ﬁnd the information that is relevant to them. The system is then structured in two diﬀerent types of agents as shown in Figure 1: HTML Analyser Agent and User Agents. Each user agent has as its goal to arrive to a destination URL from an initial URL, and possesses a set of rules of potential use in achieving this goal. When a user agent arrives to a new page, it requests information from the HTML Analyser Agent on the available links (link text, text surrounding the link, target URL). The user agent then checks this information against its rules. The HTML Analyser Agent stores the information on the links for future requests. User agents use a reinforcement learning mechanism that will reward or penalise agent actions. For example a positive reinforcement would reward being able to use the initial key phrases and a negative reinforcement would penalise being obliged to return to the previous page. In order to speed up the learning process, user agents are considered as individuals of an evolutionary process and the typical operators of crossover and mutation are applied. The objective is to ensure that the best rules are passed on to the next generation and become widespread between the diﬀerent agents.

Fig. 1. Structure of the multi-agent system

Multimedia Elements in a Hybrid Multi-Agent System

2.1

219

HTML Analyser Agent

The HTML Analyser Agent [2] examines the HTML code of a single page, a group of pages, or an entire web site, with the aim of detecting usability problems. Even though HTML code is relatively transparent and consists of a reduced set of elements, it can be a complex to infer usability issues from it and all kinds of ambiguities may appear. The HTML Analyser Agent, thus, uses heuristics to decide whether a usability problem actually exists, and reports every problem found indicating its estimated criticality. As an example of usability aspects analysed we can cite: – Web page usability issues, such as the estimated page size, programming problems (browser-speciﬁc tags, deprecated tags, the lack of text encoding, etc.), ﬂexibility problems (ﬁxed font sizes, ﬁxed-width elements), problematic elements in certain contexts (cookies, CSS formatting, Javascript code, animations, etc.) and the presence of search engines. – Image-specific usability issues: described in more detail in section 3. – Form-specific usability issues: Number of elements, the existence of elements that are subject to some type of validation, accessibility features such as easily clickable controls, etc. – Table-specific usability issues: Size, whether tables are used for presenting data or for establishing the page layout, the existence of recommended elements such as headers, captions, and summaries. – Link-specific usability issues: Non-standard representations, broken links, badly constructed links, inappropriate link texts, the use of anchors, the existence of links to non-HTML ﬁles, etc. The last issue, that is, the existence of links to non-HTML ﬁles, is important in the context of this paper since many times these ﬁles are used to store multimedia information (e.g. video or audio ﬁles). While using non-HTML ﬁles is not necessarily bad it is true that they can have a negative impact on usability in many ways: lack of consistency, switch between diﬀerent ways of navigating, proprietary ﬁle formats, additional software needs to be installed on the client side, etc. These issues are commented on in more detail in the following sections of this paper. 2.2

User Agents

User agents model the dynamic browsing process of the human users. Each agent has as its goal to arrive to a destination URL from another one, and possesses a set of key phrases of potential use in achieving this goal. The motivation for this word-based approach is that the Web is primarily a linguistic medium that involves browsing through text pages and examining text content. Our multi-agent system contains a population of non-identical user agents that model diﬀerent types of human users and obtain diﬀerent results. User agents are implemented using an evolutionary agent architecture composed by rule-based reactive agents.

220

E. Mosqueira-Rey et al.

The motivation behind the fallible and non-deterministic behaviour of the agents is to decide if the text content and the links of the web site help users in performing tasks and ﬁnding the information they seek. That is, if an agent fails, it is probably because of usability problems in the web site. In a product with a good level of usability, computerised task implementation will be as close as possible to the mental model of the user. In our case that means that the structure of the web site, the text content, and the link labels should be as intuitive as possible. 2.3

Multimedia Elements in Web Usability Analysis

The initial reason for including multimedia elements in web sites was to make them more attractive, to enhance the user experience, and to convey information more eﬃciently. Ideally, this would make the Web more usable. However, these multimedia elements can easily hamper usability if they are not used judiciously. Moreover, proprietary formats or external plugins, for instance, can cause compatibility problems that can even prevent the user from navigating the web site. The continuing availability of new multimedia formats and functionalities means that these problems have become increasingly complex and more and more common. Our HTML Agent helps to discover these usability problems by (1) detecting the use of multimedia elements in a web site, (2) automatically analysing their basic characteristics, and (3) generating reports on the issues found. A detailed explanation of the inherent usability problems of each multimedia type appears below. Section 3 describes the problems with the images and Section 4 examines the issues with video and audio ﬁles. At the relevant points, we include descriptions of how the HTML Analyser Agent addresses those issues.

3

Images

Images are probably the oldest type of multimedia ﬁles used on web pages. Their basic usability problems are fairly well-known and are related to their dimensions, formatting problems, accessibility issues, and the size of the ﬁle. In order to detect these problems, our HTML Analyser Agent automatically examines the characteristics of the image ﬁle and inspects the HTML code of the page that contains it. More speciﬁcally, the following issues are analysed and reported on: – Image dimensions: A basic usability heuristic is that images should not be so big that they do not ﬁt on an average screen. Regardless of the actual dimensions of the image, HTML also oﬀers attributes to explicitly specify the height and the width in which it will be displayed. Moreover, these attributes let the browser know how much space must be reserved in order to display the image before it is actually loaded. Failing to indicate this can cause formatting problems because the elements of the page can be continuously rearranged as the images are retrieved and displayed.

Multimedia Elements in a Hybrid Multi-Agent System

221

– Accessibility features: HTML also oﬀers attributes for declaring a long description of the image and an alternative text for users who are not able to see images properly [3]. Omitting these attributes is a typical accessibility failure. Our Agent also warns the user if the alternative text is too long. – File size: This is an inherent property of the ﬁle that has an impact on download time. It should be kept in mind that not all users have broadband access, and, even in broadband situations, response times can be slow [4]. – The use of image maps: These are pictures in which diﬀerent areas contain links to diﬀerent URLs. They should be used only when necessary. Figure 2 shows an example of a report generated by the HTML Analyser Agent on the usability problems of a particular image. As can be seen, the HTML code for the image lacks both the “alternative text” and “long description” attributes, and neither the height nor the width are expressly declared.

Fig. 2. Usability issues found for a speciﬁc image

4

Video and Audio

When a multimedia ﬁle is placed on a web page, the HTML Analyser Agent can extract useful data from its MIME type, including the kind of software needed to handle the ﬁle (commercial software, shareware, or freeware) or whether its format is proprietary or not. However, some other signiﬁcant information, such as the video format (e.g., H.264, Theora, VP6 or VP8), cannot be obtained from MIME types, since they mostly refer to container formats: for instance, an AVI ﬁle may contain audio/visual data in several compression algorithms, including Motion JPEG, RealVideo, or MPEG-4 Video.

222

E. Mosqueira-Rey et al.

One important aspect that directly aﬀects web usability is how multimedia content is displayed, that is, how usable are web video players. In the next two subsections we introduce the two competing platforms used to embed video on web pages: Flash and HTML 5. 4.1

Flash

Since Adobe Flash Player version 4 appeared in 1999, Flash Video has quickly established itself as the de facto format of choice for embedded video on the web and has become one of the most popular platforms for developing Rich Internet Applications. Flash animations were easy to create and allowed web developers to expand their possibilities in design. Furthermore, after installing a speciﬁc plugin, any content shown in Flash was viewable on most web browsers and operating systems. This resulted in Adobe Flash Player being part of 99% of Internet-enabled desktops in 2009. However, this situation might be changing since HTML 5 was released and new popular products, such as Apple iPhone and iPad, do not support Flash. Despite of its popularity, the Flash platform is known to lead to web designs with many usability issues, as shown in [5]. Flash developing tools allow or, in many cases, force users not to follow web-design standards and the resulting Flash objects usually increase the size of web pages and therefore their load time. Moreover, in spite of being indexable by Google, Flash-based web pages are still hard to explore by many other search engines, since its indexation mechanism is too speciﬁc. Finally, the plugin needed to visualise Flash content does not seem to always work well on some operating systems such as Mac OS X or Linux. 4.2

HTML 5

On March 4, 2010, the Web Hypertext Application Technology Working Group (WHATWG) published the latest working draft for the ﬁfth revision of HTML. As far as our context is concerned, the main characteristic is the inclusion of some tags to embed multimedia content, namely, the video and audio elements for video and audio data, respectively. Every browser that supports the HTML 5 protocol will provide a media player that allows to display the speciﬁed content inside the aforementioned tags. The design of this player is made up of a default conﬁguration which satisﬁes the established standards of usability. In this manner the web developers do not need to implement and conﬁgure an external player, which could be neither usable nor accessible enough. This issue makes the design of web sites with multimedia content more user-friendly. As HTML is a markup language with elements and attributes, the media elements in web pages become easily parseable and hence indexable by a search engine. This feature gives us more information, which would be used by our HTML Analyser Agent to provide some advice about the use of multimedia contents.

Multimedia Elements in a Hybrid Multi-Agent System

223

The main drawback is an issue external to the implementation of the new version of HTML and it arises from the problem of choosing a standard format to encode the videos included in web sites. At ﬁrst the WHATWG suggested the following encoding formats: Theora video and Vorbis audio encapsulated in Ogg containers. However, the decision to require this speciﬁc format led to opposition by Google, Apple Inc. and Nokia, citing uncertainty about potential patents and lack of hardware support, opting instead for other codecs. Finally, they decided to settle the matter in such a way that suits all, that is, they agreed to allow the presence of media types encoded with any codec. This means that the video publisher must encode all their media ﬁles in all formats supported by each browser, and hence write larger code. Until this “format war” comes to an end, the WHATWG recommends a workaround based on the use of source tags as shown below.

5

Discussion

As we can see, a key point in the prevalence of one of these protocols among the Internet community is the video format standardisation, whose future appears to be uncertain. On the one hand, signiﬁcant improvements in Theora, such as providing hardware support, could lead it to win the “format war” and make it the standard video codec. On the other hand, this lack of standardisation around HTML 5 may beneﬁt the FlashVideo dominance, since its plugin is widely spread and its features are being constantly improved. A new alternative has arisen recently since Google purchased On2 Technologies on February 2010 and, as a result, Google now owns all the patents behind VP8, a new high performance video codec, that can be included in the HTML5 speciﬁcation if released under a royalty-free licence. Also we have to take into account the usability issues regarding web accessibility to people with disabilities. With HTML 5 we can conclude that the introduction of the multimedia tags would boost the use of accessibility tools in web sites –such as screen readers, subtitles, and audio transcription–, generating a growing interest in the web community to include some speciﬁc tags in order to deﬁne accessibility contents. On the other hand, Flash elements can be adequate for users with disabilities if designed with care [6] and, ideally, with an alternative access to the information in the form of a standard HTML version of the content [7]. Regardless of which will be the most extended technology in the next years, the features that HTML 5 oﬀers are much more useful in order to improve our HTML

224

E. Mosqueira-Rey et al.

Analyser Agent. The ease of parsing the markup languages makes it possible to quickly obtain information on the attributes in the multimedia elements, for instance the displaying size, the video/audio format, the used codec, the MIME type of the ﬁle, etc. With such enhancements we will be able to increase, both in number and in quality, the usability and accessibility reports that web developers could use to make more user-friendly interfaces. One interesting feature that may be included in our system is interest analysis of on-line multimedia content. Aspects like short duration of video clips, high dynamism of scenes and lack of distracting elements in the main frame are considered for improving the web-user experience [8]. In order to make a simple analysis of the header to obtain interesting data –such as audio/video bitrate, media duration, ﬁle size, video dimensions, audio sample rate, etc.– some tools (e.g., ﬀmpeg) could be used to cover the above-mentioned aspects. However, the amount and the type of information stored in a ﬁle are speciﬁed by the container format and, at this point, a multimedia content description standard (MPEG21, MPEG-7, etc.) could be useful. There are some other characteristics that can be automatically analysed by applying Multimedia Information Retrieval (MIR) techniques which allow us to extract relevant data from video and audio. Speech recognition, video OCR and image similarity matching are examples of MIR algorithms that can be useful to achieve the aforementioned analysis. More about MIR can be found in [9].

References 1. Mosqueira-Rey, E., Alonso-R´ıos, D., V´ azquez-Garc´ıa, A., Baldonedo del R´ıo, B., Moret-Bonillo, V.: A Multi-Agent System Based on Evolutionary Learning for the Usability Analysis of Websites. In: Nguyen, N.T., Jain, L.C. (eds.) Intelligent Agents in the Evolution of Web and Applications. SCI, vol. 167, pp. 11–34. Springer, Heidelberg (2009) 2. Alonso-R´ıos, D., Luis-V´ azquez, I., Mosqueira-Rey, E., Moret-Bonillo, V.: An HTML analyzer for the study of web usability. In: IEEE Int. Conf. on Systems, Man, and Cybernetics, San Antonio, Texas, USA, pp. 1261–1266 (2009) 3. Nielsen, J.: Designing Web Usability. New Riders, Berkeley (2000) 4. Nielsen, J., Loranger, H.: Prioritizing Web Usability. New Riders, Berkeley (2006) 5. Nielsen, J.: Flash: 99% Bad. Jakob Nielsen’s Alertbox (October 2000), http://www.useit.com/alertbox/20001029.html (retrieved 2010-02-25) 6. Nielsen, J.: Making Flash Usable for Users With Disabilities. Jackob Nielsen’s Alertbox (October 2002), http://www.useit.com/alertbox/20021014.html (retrieved 2010-02-25) 7. Hudson, R.: Flash and accessibility. Web Usability - Accessibility and Usability services (November 2003), http://www.usability.com.au/resources/flash.cfm (retrieved 2010-02-28) 8. Nielsen, J.: Talking-Head Video Is Boring Online. Jakob Nielsen’s Alertbox (December 2005), http://www.useit.com/alertbox/video.html (retrieved 2010-02-25) 9. Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-Based Multimedia Information Retrieval: State of the Art and Challenges. ACM Trans. on Multimedia Computing, Communications, and Applications 2, 1–19 (2006)

An Approach for an AVC to SVC Transcoder with Temporal Scalability Rosario Garrido-Cantos, José Luis Martínez, Pedro Cuenca, and Antonio Garrido Albacete Research Institute of Informatics Universidad de Castilla-La Mancha 02071 Albacete, Spain {charo,joseluismm,pcuenca,antonio}@dsi.uclm.es

Abstract. The scalable extension (SVC) of H.264/AVC uses a notion of layers within the encoded bitstream for providing temporal, spatial and quality scalability, separately or combined. This scalability allows adaptation depending on the scenarios with different devices and heterogeneous networks. The SVC design requires scalability to be provided at the encoder side by exploiting interlayer dependencies during encoding. This implies that existing H.264/AVC content cannot benefit from the scalability tools in SVC due to the lack of intrinsic scalability provided in the bitstream at encoding time. Since a lot of technical and financial effort is currently being spent on the migration from MPEG-2 equipment to H.264/AVC, it is unlikely that a new migration to SVC will occur in the short term. Due to broadcasters and content distributors want to have scalable bitstreams at their disposal, efficient techniques for migration of single-layer content to a scalable format are desirable. In this paper, an approach for temporal scalability transcoding from H.264/AVC to SVC is discussed. This approach is applied to the upper layers of SVC, where coding complexity is higher, and it is capable to reduce this coding complexity around to 55.75% while maintaining the coding efficiency. Keywords: Scalable Video Coding (SVC), H.264/AVC, Transcoding, Temporal Scalability.

1 Introduction Recent advances in video coding technology and standardization along with the rapid developments and improvements of mobile and handled devices, network infrastructures, storage capacity, and computing power are enabling an increasing number of video applications. Application areas today range from multimedia messaging, video telephony, and video conferencing over mobile TV, wireless and wired Internet video streaming, standard and high-definition TV broadcasting to DVD, Blu-ray Disc, and HD DVD optical storage media. For these applications, adaptable video transmission and storage systems must be employed. Video coding standards such as H.264/AVC [1] provide mechanisms for coding video optimizing compression efficiency and satisfying the needs of these emerging multimedia applications. These multimedia applications are typically characterized by a wide range of connection qualities and receiving devices with different decoding capacities (display, E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 225–232, 2010. © Springer-Verlag Berlin Heidelberg 2010

226

R. Garrido-Cantos et al.

processing power, memory capabilities, etc). Moreover, the networks used to deliver video contents are heterogeneous. Therefore, the compressed video bit stream has to be adapted to the network connections and different characteristics of devices to ensure a continuous and quality image. Video adaption has become an essential technology for providing multimedia contents in an appropriate way to the user devices. Scalable Video Coding is a highly attractive solution to the problems posed by the characteristics of modern video transmission systems. The main idea of scalable video coding is to encode the video as one base layer and a few enhancement layers, so that lower bit rates, spatial resolutions and temporal resolutions could be obtained by simply truncating certain layers from the original bitstream to adapt the communication channel bandwidth and user device capabilities. Recently, joint efforts of MPEG and VCEG have led to the standardization of a new state-of-the-art scalable video codec. This scalable extension of H.264/AVC [1], denoted as SVC (Scalable Video Coding), makes it possible to encode scalable video bitstreams containing several quality, spatial, and temporal layers. By parsing and extracting, lower-layers can easily be obtained, hence providing different types of scalability in a flexible manner. It supports temporal, spatial and quality-SNR scalability for video that allows adaption to certain application requirements such as display and processing capabilities of target devices and varying transmission conditions. Temporal scalability in SVC is provided by using Hierarchical Prediction Structures. Spatial scalability is achieved by encoding each spatial resolution into one layer. Additionally, inter-layer prediction mechanisms are applied to remove redundancy between layers. Regarding quality-SNR scalability, is intended to give different levels of detail and fidelity to the original video. It can be seen as a case of spatial scalability where the base and enhancement layers have identical pictures sizes, but different qualities. It is decision of the encoding process to decide which more details will be added to parts of the video images. Different cases of quality levels are distinguished: Coarse-Grain Scalability (CGS) and Medium-Grain Scalability (MGS). A third one, Fine-Grain Scalability (FGS) was removed from the SVC amendment finalized in July 2007. Despite these scalability tools, most of the video contents today are still created in a single-layer format (H.264/AVC video streams). The lack of scalable streams results in the necessity for developing alternative techniques to enable video adaptation. In this paper, video transcoding [2] is proposed for enabling efficient adaptation of H.264/AVC to H.264/SVC video streams. Its efficiency is obtained by reusing as much information as possible from the original bitstream, such as mode decisions and motion information. The ultimate goal is to perform the required adaptation process faster than the straightforward concatenation of decoder and encoder. In particular, this paper describes an approach to transcoding from a single-layer of a H.264/AVC bitstream without temporal scalability (typical IBBP GOP pattern) to SVC bitstream with temporal scalability with hierarchical prediction structures (with B-pictures). The remainder of this paper is organized as follows. In Sect. 2, the state-of-the-art for H.264/AVC to SVC transcoding is discussed. Sect. 3 describes the temporal scalability technique in SVC. In Sect. 4, our approach is shown and, finally, in Sect. 5 conclusions are presented.

An Approach for an AVC to SVC Transcoder with Temporal Scalability

227

2 Related Work Since it is beneficial for broadcasters and content distributors to have scalable bitstreams at their disposal, efficient techniques for migration of H.264/AVC to a H.264/SVC format are desirable. Due to its computational efficiency, transcoding can be used for introducing scalability in compressed, single-layer bitstreams. In this way, re-encoding can be avoided when migrating legacy content to a scalable format. A number of techniques have been proposed in the past for introducing scalability in compressed bitstreams. The majority of the proposals are related to quality-SNR scalability, although there are a few related the other two types of scalability (spatial and temporal). Respecting quality-SNR scalability, in [3] a technique was studied for transcoding from hierarchically encoded H.264/AVC to FGS streams. Although it was the first work in this type of transcoding, does not have a great relevance since this technique for providing quality-SNR scalability was removed from the following versions of the standard due to its high computational complexity. In [4], different architectures for transcoding from single layer H.264/AVC bitstream to SNR scalable SVC streams with CGS layers were proposed that depends on the macroblock type. Moreover, the normative bitstream rewriting process implemented in SVC standard to convert SVC to H.264/AVC bitstream is used to reduce the computational complexity of architectures proposed. For spatial scalability, a proposal was presented in [5]. They presented an algorithm for converting a single layer H.264/AVC bitstream to a multi layer spatially scalable SVC video bitstream, containing layers of video with different spatial resolution. Using a full-decode full-encode algorithm as starting point, some modification are made to reuse information available after decoding a H.264/AVC bitstream for motion estimation and refinement processes on the encoder. The scalability is achieved by an Information Downscaling Algorithm which use the top enhancement layer (this layer has the same resolution as the original video output) to produce different spatial layers of the output SVC bitstream. In the temporal scalability framework, the most relevant work is in [6]. A transcoding method from H.264/AVC P-picture based bitstream to a SVC bitstream with temporal scalability was presented. In this approach, the H.264/AVC bitstream is transcoded to a two layers of P-pictures (one with reference pictures and another with non reference ones). Then, this bitstream is transformed to a SVC bitstream by syntax adaptation.

3 Temporal Scalability in H.264/SVC A bit stream provides temporal scalability when the set of corresponding access units can be partitioned into a temporal base layer and one or more temporal enhancement layers with the following property. Let the temporal layers be identified by a temporal layer identifier T, which starts from 0 for the base layer and is increased by 1 from one temporal layer to the next. Then for each natural number k, the bit stream that is obtained by removing all access units of all temporal layers with a temporal layer identifier T greater than k forms another valid bit stream for the given decoder.

228

R. Garrido-Cantos et al.

For hybrid video codecs, temporal scalability can generally be enabled by restricting motion-compensated prediction to reference pictures with a temporal layer identifier that is less than or equal to the temporal layer identifier of the picture to be predicted. In H.264/AVC and for extension in SVC, any picture can be marked as reference picture and used for motion compensated prediction of following pictures. This feature allows the coding of picture sequences with arbitrary temporal dependencies. Hence, for supporting temporal scalability with a reasonable number of temporal layers, no changes to the design of H.264/AVC were required. The only related change in SVC refers to the signaling of temporal layers. In this way, to achieve temporal scalability, SVC links its reference and predicted frames using Hierarchical Prediction Structures [7] which defines the temporal layering of the final structure. With Hierarchical Prediction Structures, key pictures (typically I or P frames) are coded in regular intervals by using only previous key pictures as references. The pictures between two key pictures are hierarchically predicted and together with the succeeding key picture are known as Group of Pictures (GOP). The sequence of key pictures represents the lowest temporal (temporal base layer) which can be increase with the non key pictures that are divided into enhancement layers.

Fig. 1. Hierarchical B prediction structure with four temporal layers (TL)

There are different structures for enabling temporal scalability, but the typical GOP structure is based on hierarchical B pictures, which is also the used in the JSVM reference encoder software [8]. The number of temporal layers is thus equal to 1+log2[GOP size]. One of these structures, with dyadic structure, GOP of 8 (I7BP pattern) and therefore four temporal layers, is illustrated in Fig. 1.

4 Proposed H.264/AVC to SVC Video Transcoder The most time consuming tasks carried out at H.264/AVC and SVC encoders are the Motion Estimation (ME) and the procedure called as Macro Block (MB) mode coded

An Approach for an AVC to SVC Transcoder with Temporal Scalability

229

decision. Both techniques perform the inter prediction and they are the most suitable modules to be accelerated. The idea behind the proposed transcoder consists of reusing most of the operations that can be gathered in the H.264/AVC decoding algorithm (as part of the transcoder) to accelerate the SVC encoding one (also included in the transcoder). In this framework, on the one hand, the proposed transcoder tackles the ME reduction by reusing the Motion Vectors (MVs) used in H.264/AVC in order to define smaller search areas in SVC (this approach is depicted in Section 4.1). On the other hand, previously MB partitions developed by H.264/AVC can be used as candidate MB partitions for SVC. Moreover, the residual information (the residual frame) can be also used to refine this preliminary MB partitions. In this line, Machine Learning (ML) can be applied to convert these observations into rules that can be implemented in the proposed transcoder instead of the more complex original. This technique of applying ML has been previously used in MPEG-2 to H.264/AVC video transcoder and shows that ML is an appropriate solution in the framework of transcoding [9]. Experimental results show that the proposed approach reduces the MB mode selection complexity by as much as 95% while maintaining the coding efficiency. 4.1 Dynamic Motion Window for Motion Estimation The idea of ME consists of eliminating the temporal redundancy in a way to determine the movement of the scene. Because of SVC is an extension of H.264/AVC, the ME carried out in SVC will be highly correlated with the ME previously performed in H.264/AVC. Therefore, it looks obviously that performing the complete ME again is a waste of time. Although it looks that a simpler approach will be to use the incoming MVs themselves from H.264/AVC in SVC, the fact is that the H.264/AVC MVs are correlated with those generated in SVC, but are not the same. This is because the AVC pattern follows, in general, the IBBP structure and SVC uses hierarchical B pictures (see Figure 1). This mismatches between GOP sizes and formats deals with different MVs in both ME. Therefore, this paper, firstly, proposes a Dynamic Motion Window (DMW) technique that uses the incoming MVs from H.264/AVC to determine a small area to find the real MVs calculated in SVC (which is depicted in Figure 2). In the proposed transcoder based on DMW, the motion vector search range for every SVC MB is adaptively determined and it is recalculated for every macroblock (or sub-macroblock partition) that can occur in the MB mode coded decision and reduced depending on the length and the orientation of the incoming H.264/AVC MVs. It is used to determine a dynamic search range area around the H.264/AVC MVs orientation. The description of the DMW approach is depicted in Figure 2. The new search range is determined by the area created by the circumference equation, centered in the (0,0) point for each mode or sub-mode, with the length of the incoming vector such as the radio of the circumference (see Figure 2). This technique of applying Dynamic Motion Window for Motion Estimation has been used to reduce the motion estimation complexity in upper temporal layers by as much 55.75% while maintaining the coding efficiency.

230

R. Garrido-Cantos et al. Block Size = M x N

N + 2d max

Exhaustive Search Max search range = d max

N

M + 2d max

d max

M

Search area: (M + 2d max) (N + 2d max) Number of search points = (2d max+1)2 DMW Search

d max

Max search range = Limited by the circumference Search area: x2 + y2 <= (MVx2 + MVy2), where x and y are the coordinates of the candidate points

H.264/AVC Motion Vector (MVx, MVy) Circumference equation centered in (0,0): x2 + y2 = r2, where r is the radio

Number of search points: Inner point of the circumference

Using the Pythagoras theorem: r2 = MVx2 + MVy2

Fig. 2. Proposed dynamic motion estimation

4.2 Implementation Results In this section, results from the implementation of the proposal described in Sect 4.1 are shown. Test sequences with varying characteristics were used, namely Foreman, Bus, Football, Mobile, Soccer and Hall in CIF and QCIF resolutions. Table 1. Encoding time for each temporal layer (TL) with different GOP sizes using CIF

Sequence Foreman Bus Football Mobile Soccer Hall Average

Encoding time (%) of every temporal layer – CIF (30 Hz) GOP = 8 GOP = 16 TL 0 TL1 TL 2 TL 3 TL 0 TL1 TL 2 TL 3 4.72 13.59 27.29 54.40 1.52 6.63 13.11 26.34 4.73 13.67 27.29 54.31 1.75 6.47 13.29 26.22 4.70 13.62 27.41 54.27 1.55 6.58 13.13 26.34 4.72 13.60 27.23 54.45 1.49 6.61 13.09 26.32 4.71 13.59 27.25 54.45 1.54 6.55 13.11 26.35 4.68 13.57 27.26 54.49 1.57 6.56 13.15 26.39 4.71 13.61 27.28 54.40 1.57 6.57 13.15 26.33

TL4 52.40 52.27 52.40 52.50 52.45 52.33 52.38

These sequences were encoded using the H.264/AVC Joint Model reference software, version 16.2 [10], with an IBBPBBP pattern with a fixed QP in a trade-off between quality and bitrate. Then, for reference results, encoded bitstreams are decoded and re-encoded using the JSVM software, version 9.19.3 [8] with hierarchical GOP structures and different values of QP (28, 32, 36, 40). For results of the proposal, encoded bitstreams in H.264/AVC are transcoded using the technique described in Section 4.1.

An Approach for an AVC to SVC Transcoder with Temporal Scalability

231

Table 2. Encoding time for each temporal layer (TL) with different GOP sizes using QCIF

Sequence Foreman Bus Football Mobile Soccer Hall Average

Encoding time (%) of every temporal layer – QCIF (15 Hz) GOP = 4 GOP = 8 TL 0 TL1 TL 2 TL 0 TL1 TL 2 11.98 29.45 58.57 4.71 13.73 27.32 11.71 29.26 59.03 4.84 13.58 27.06 11.72 29.40 58.87 4.67 13.73 27.26 11.94 29.36 58.70 4.72 13.65 27.37 11.74 29.42 58.84 4.99 13.44 27.13 11.69 29.51 58.80 5.08 13.52 27.10 11.80 29.40 58.80 4.84 13.61 27.21

TL 3 54.24 54.52 54.34 54.26 54.44 54.30 54.34

The technique explained previously will be applied on the upper temporal layers because the encoder spends more time to encode the higher enhancement layers (around 75% in the two last ones) as is shown in Table 1 and Table 2. Table 3. RD performance of the approach using CIF (30 Hz) RD performance of AVC/SVC transcoder GOP = 16 - CIF (30 Hz) Sequence ∆PSNR (dB) ∆Bitrate (%) ∆Time (%) Foreman -0.1027 3.04 -50.34 Bus -0.1774 11.37 -45.56 Football -0.0997 1.63 -25.21 Mobile -0.1653 5.96 -84.78 Soccer -0.2774 8.78 -39.36 Hall -0.0337 2.95 -86.49 Average -0.1427 5.62 -55.29 Table 4. RD performance of the approach using QCIF (15 Hz) RD performance of AVC/SVC transcoder GOP = 8 - QCIF (15 Hz) Sequence ∆PSNR (dB) ∆Bitrate (%) ∆Time (%) Foreman -0.1109 4.44 -51.95 Bus -0.1845 12.51 -46.21 Football -0.1095 1.82 -26.12 Mobile -0.1727 6.45 -85.31 Soccer -0.2873 9.26 -40.46 Hall -0.0877 3.12 -87.30 Average -0.1588 6.27 -56.23

Tables 3 and 4 shows ∆PSNR, ∆Bitrate and ∆Time for the sequences under studio on average between QPs when our approach is applied compared to the more complex reference transcoder. The values obtained with the proposed transcoder are very close to the results obtained when applying the reference transcoder: the average PSNR lost over the reference is 0.15 dB, with an average of increase of bitrate around 6% and achieving around to 55.75% reduction of computational complexity.

232

R. Garrido-Cantos et al.

5 Conclusions This work presents an approach for H.264/AVC to SVC with temporal scalability transcoding. Beginning on higher layers and reusing information available after decoding a H.264/AVC bitstream, motion estimation can be accelerated with a dynamic motion window. Experimental results applying these approaches show that they are capable to reduce the coding complexity around to 55.75% while maintaining the coding efficiency. Acknowledgments. This work was supported by the Spanish MEC and MICINN, as well as European Commission FEDER funds, under Grants CSD2006-00046, TIN2009-14475-C04 and TIN2009-05737-E, and it was also partly supported by JCCM funds under grant PEII09-0037-2328 and PII2I09-0045-9916.

References 1. ITU-T and ISO/IEC JTC 1: Advanced Video Coding for Generic Audiovisual Services. ITU-T Rec. H.264/AVC and ISO/IEC 14496-10 (including SVC extension) (March 2009) 2. Vetro, A., Christopoulos, C., Sun, H.: Video Transcoding Architectures and Techniques: an Overview. IEEE Signal Processing Magazine, 18–29 (2003) 3. Shen, H., Sun, X.S., Wu, F., Li, H., Li, S.: Transcoding to FGS Streams from H.264/AVC Hierarchical B-Pictures. In: IEEE Int. Conf. Image Processing, Atlanta (2006) 4. De Cock, J., Notebaert, S., Lambert, P., Van de Walle, R.: Architectures of Fast Transcoding of H.264/AVC to Quality-Scalable SVC Streams. IEEE Transaction on Multimedia 11(7), 1209–1224 (2009) 5. Sachdeva, R., Johar, S., Piccinelli, E.: Adding SVC Spatial Scalability to Existing H.264/AVC Video. In: 8th IEEE/ACIS International Conference on Computer and Information Science, Shangai (2009) 6. Dziri, A., Diallo, A., Kieffer, M., Duhamel, P.: P-Picture Based H.264 AVC to H.264 SVC Temporal Transcoding. In: International Wireless Communications and Mobile Computing Conference (2008) 7. Schwarz, H., Marpe, D., Wiegand, T.: Analysis of Hierarchical B pictures and MCTF. In: IEEE Int. Conf. ICME and Expo., Toronto (2006) 8. Joint Video Team JSVM reference software, http://ip.hhi.de/imagecom_G1/savce/downloads/ SVC-Reference-Software.htm 9. Fernández-Escribano, G., Bialkowski, J., Gámez, J.A., Kalva, H., Cuenca, P., OrozcoBarbosa, L., Kaup, A.: Low-Complexity Heterogeneous Video Transcoding Using Data Mining. IEEE Transaction on Multimedia 10(2), 286–299 (2008) 10. Joint Model JM reference software, http://iphome.hhi.de/suehring/tml/download/

A GPU-Based DVC to H.264/AVC Transcoder Alberto Corrales-García1, Rafael Rodríguez-Sánchez1, José Luis Martínez1, Gerardo Fernández-Escribano1, José M. Claver2, and José Luis Sánchez1 1

Instituto de Investigación en Informática de Albacete (I3A) Universidad de Castilla-La Mancha 02071 Albacete, Spain {albertocorrales,rrsanchez,joseluismm,gerardo, jsanchez}@dsi.uclm.es 2 Departamento de Informática Universidad de Valencia 46100 Burjassot, Valencia, Spain [email protected]

Abstract. Mobile to mobile video conferencing is one of the services that the newest mobile network operators can offer to users. With the apparition of the distributed video coding paradigm which moves the majority of complexity from the encoder to the decoder, this offering can be achieved by introducing a transcoder. This device has to convert from the distributed video coding paradigm to traditional video coding such as H.264/AVC which is formed by simpler decoders and more complex encoders, and allows to the users to execute only the low complex algorithms. In order to deal with this high complex video transcoder, this paper introduces a graphics processing unit based transcoder as base station. The use of graphic accelerators in this framework has not been proposed before in the literature and offer a new field to explore with promising results. The proposed transcoder offers a time reduction of the whole process over 79% with negligible rate distortion penalty. Keywords: Distributed Video Coding, H.264/AVC, Graphic Processing Units, Transcoding, Heterogeneous Computing.

1 Introduction Multimedia communications between mobile devices are becoming an important area of interest in telecommunications because of the advance in mobile networks (such as 4G). Nowadays, one of the most requested mobile services is the video conferencing, where both the transmitter and receiver devices may not have necessary computing power, resources or complexity constraints to perform complex video algorithms (both coding and decoding). On the one hand, traditional video codecs such as H.264 Advanced Video Coding (AVC) [1] typically have highly complex encoders and less complex decoders. On the other hand, Distributed Video Coding (DVC) [2] has received great interest from multimedia research community because it offers low complexity encoders and more complex decoders. In other words, DVC framework offers E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 233–240, 2010. © Springer-Verlag Berlin Heidelberg 2010

234

A. Corrales-García et al.

a reversal of the asymmetry in terms of complexity compared to traditional codecs like H.264/AVC. This mobile to mobile scenario is depicted in Figure 1.

Fig. 1. Video communication system using a DVC to H.264/AVC video transcoder

In order to achieve this low complexity communication between both paradigms, a DVC to H.264/AVC video transcoder needs to be included into the network which converts the bitstream. Basically, both sending and receiving devices shift their complexity to the base station resulting in less complex user devices. On the contrary, the transcoder has to handle two complex processes: DVC decoding and H.264/AVC encoding. It is worth to mention that this transcoder device does not have any computational restriction and it is designed to be a high processing unit with many resources. Recently, in high-performance computing are used accelerator or multi-core processor devices such as Graphics Processing Units (GPUs), Cell Broadband Engines (Cell BEs), and Field-Programmable Gate Arrays (FPGAs). These small devices consist of tens or hundreds of homogeneous processing cores which are designed and organized with the goal of achieving higher performance are being used. Therefore, these new hardware opportunities open a new door in the field of multimedia processing and computing; in particular, in the framework of DVC to H.264/AVC transcoders, which joint two of the most time consuming processes (such as H.264/AVC encoding and DVC decoding algorithms). At this point, this paper proposes a DVC to H.264/AVC GPU-based video transcoder in which the H.264/AVC encoding algorithm (the second half of the proposed transcoder) is accelerated by means of parallel processing. The Motion Vectors (MVs) generated in the DVC side information process (this process is the DVC motion estimation) are reused as MVs predictors in the H.264/AVC encoding stage and then, the H.264/AVC motion estimation is executed in a parallel way over a GPU. In other words, the center point of the H.264/AVC search area is adjusted based on the DVC incoming MVs. The proposed transcoder is a straight forward step because of the parallel processing has not been used before in the literature in the framework of DVC based transcoders; all the previous DVC-based transcoders (H.263 [3] and H.264/AVC [4]) are based on sequential execution. This paper is organized as follows: Section 2 presents the basics of DVC, H.264/AVC and GPUs. Section 3 presents the proposed GPU-based video transcoder which is evaluated in Section 4. Finally, the conclusions are presented in Section 5.

A GPU-Based DVC to H.264/AVC Transcoder

235

2 Technical Background 2.1 Distributed Video Coding DVC provides a new video coding paradigm, where the architecture is characterized by encoders less complex than decoders. On the encoder side, frames are labeled as Key Frames (K) and Wyner-Ziv Frames (WZ). K frames are encoded as Intra frames in traditional codecs. However, WZ frames only store a few parity bits and temporal correlation is not exploited. For this reason, encoding procedure is much faster than traditional encoders. On the other hand, DVC decoder receives K frames firstly. From each two adjacent K frames is done an estimation of the middle WZ frame, which is called Side Information (SI). Figure 2 shows the first step in the SI generation for a MacroBlock (MB) using two frames with positions k and k+n in the sequence, then SI represents an approximation of the frame with position k+n/2. Every MB in the K frame k+n is matching with another MB in the K frame k. This matching is done by checking all the possibilities into the defined search area and choosing the lowest residual one. The displacement is quantified by a MV, and the middle of this MV represents the displacement for the MB interpolated. Afterwards, a channel decoding algorithm tries to refine this SI by using the parity information, as is specified in [2] [5].

Fig. 2. First step of SI generation process

2.2 Overview of H.264/AVC H.264/AVC [1] is the most recent predictive video compression standard that outperforms other previous existing video codecs. The H.264/AVC standard builds on those previous coding standards to achieve a compression gain of about 50%, largely at the cost of the increase in computational complexity of the encoder. These compression gains are mainly related to the variable and smaller block size motion compensation, improved entropy coding, multiple reference frames, smaller blocks transform and deblocking filter among others. The inter prediction in H.264/AVC supports motion compensation block sizes ranging from 16x16 to 4x4 with many options available between them. Then, the Motion Estimation (ME) process is carried out for each partition and sub-partition. This process is known as tree structured motion compensation algorithm. Therefore, the ME process is carried out many times per MB and, for this reason; this process spends most of the time of the encoding algorithm. Moreover, MVs neighboring partitions are often highly correlated and so each motion vector is

236

A. Corrales-García et al.

predicted from vectors of nearby, previously coded partitions. The predicted MVs are computed as the average of candidate MVs. These MBs include the left MB, above MB, and above-right MB against the current MB. 2.3 Graphics Processing Units In the past few years new heterogeneous architectures are being currently used in high-performance computing [6]. An example of this architecture is the GPUs. GPUs are small accelerator devices with hundreds of cores which are organized in several Single Instruction Multiple Data (SIMD) blocks, and designed with the goal of achieve high performance in graphics applications. GPUs are characterized by a high parallelism level and they are usually used as a coprocessor to assist the Central Processing Unit (CPU) in computing massive data. It must be taken in account that current GPUs can offer 10x higher main memory bandwidth and use data parallelism to achieve up to 10x more floating point throughput than the CPUs. Although GPUs can be used for general purpose they come primarily from multimedia and gaming applications. To facilitate the programming tasks of these devices, GPU manufacturers provide diverse tools, function libraries, languages or extensions for the most common used high level programming languages. For example, NVIDIA proposes a powerful GPU architecture called Compute Unified Device Architecture (CUDA) [7]. This architecture allows a great number of threads running simultaneously the same code (kernel) taking advantage of the high computation capacity and main memory bandwidth.

3 Proposed DVC to H.264/AVC GPU-Based Video Transcoder This work proposes a DVC to H.264/AVC transcoding architecture from each DVC GOP to H.264/AVC I11P GOP (baseline profile). This procedure is done in an efficient way by the reusing of the MVs calculated during the DVC decoding phase in order to determine the MBs predictors of the H.264/AVC ME task. In consequence, the time spent in the overall transcoding process is largely reduced. 3.1 Allocation of Motion Vectors In the DVC decoding stage, MVs are calculated during the SI generation process, as it was explained in section 2.1. These MVs offer an approximation about the displacement between frames. For different DVC GOP lengths, decoding process changes but MVs are selected in a similar way. For example, Figure 3 shows the mapping of MVs from a DVC GOP length 4 to a H.264/AVC I11P GOP. On the step 1, DVC decodes frame WZ2 using K0 and K4 frames as references. As a result, MVs V0-4 are available, but these MVs are not considered because references with high distance do not provide a good accuracy. On the second step, frame WZ1 is decoded using frames WZ0 and WZ’2 as references, and likewise frame WZ3 is decoded using as references frames WZ2 and WZ4. MVs generated in the second step (V0-2 and V2-4) provide a better accuracy, so they will be used by H.264/AVC, which will use them like predictors. However, as they are calculated for a distance of 2 and P frames have references

A GPU-Based DVC to H.264/AVC Transcoder

237

Fig. 3. Allocation of MVs from DVC GOP 4 to H.264/AVC I11P GOP

with distance 1, they are split into two halves. Notice that each DVC GOP has the last DVC step in common and MVs used like predictors for H.264 are selected in this step. Consequently, MVs’ extraction process is generic for each DVC GOP. 3.2 H.264/AVC GPU-Based Execution The improved H.264/AVC encoding algorithm as part of the whole transcoding process is presented in the following lines. The idea behind of this approach is motivated by the fact that the ME is carried out many times in H.264/AVC encoding algorithm. As it has been explained in Section 2.2, the H.264/AVC encoding algorithm supports many MB partitions and for each of them, it calls to the ME algorithm. As the number of partitions increase, the time consumption also increases. Therefore, GPUs philosophy fits well in this framework because of they are based on a SIMD computing device. In fact, the ME algorithm is carried out over multiples data, which are the MB positions to be checked. Therefore the H.264/AVC inter prediction (which includes the ME process) is executed over the GPU by using CUDA. For this purpose, the proposed algorithm is divided into three steps; all of them need to be executed sequentially but each one is exploited following a highly parallel procedure by using the GPU. The goal of the first kernel is to obtain the Sum Absolute Differences (SAD) calculation between the current MB (split into sixteen 4x4 partitions) and all MB positions in the reference frame inside the search range. Then, by using the previous 4x4 block SAD calculations, it is able to obtain the SAD costs for the different sub-partitions. Finally, the last kernel reduces the SAD cost to one SAD cost for each one of the 41 MB partitions of each MB. More detail about the algorithm can be found in [8] In a nutshell, the main challenge of this approach is to support efficiently the tree structured motion compensation algorithm developed at the H.264/AVC encoding algorithm. In order to achieve that, H.264/AVC encoding algorithm uses the SAD calculation to determine the best MB partition, which is calculated in parallel over the GPU. The main problem of performing the ME in a parallel way is that the MVs predictors of neighboring MBs are not accessible for the current MB. As it has been

238

A. Corrales-García et al.

explained in Section 2.2, the defined search area for each MB is determined based on the MVs of its neighboring but, this information is not accessible because they are being calculated at the same time that the current MB. In the present approach, the MVs generated in the DVC decoding algorithm, which are calculated as Section 3.1 explains, are used to determine the predicted search area. Figure 4 depicts this approach.

Fig. 4. Predicted search area based on the incoming DVC motion vectors

4 Performance Evaluation In order to evaluate the performance of the proposed transcoder, four QCIF sequences were encoded at 30 fps by a DVC codec based on VISNET-II [9]. QCIF format was selected because it is the most suitable format for mobile-to-mobile video communications due to the reduced size of mobile displays and the low network bandwidth requirements. In the DVC encoding stage, 300 frames were encoded for each sequence using a QP matrix fixed to 7 [5] and GOP lengths 2, 4 and 8. During the DVC decoding stage, MVs are passed to H.264/AVC encoder as predictors. On the second stage of the transcoder, the H.264/AVC encoder converts each DVC GOP input into a H.264/AVC I11P GOP using QPs = 28, 32, 36 and 40 as specified in Bjøntegaard and Sullivan´s common test rule [10]. In the simulations the H.264/AVC JM reference software, version 15.1 [11] was used. The baseline profile with the configuration by default was applied. In addition, RD-Optimization was turned off to make a suitable real-time encoding for low complexity mobile devices. In order to evaluate the performance, percentage of ME Time Reduction (%TR) reports the average of times reduction of ME displayed by H.264/AVC for the four QP points under study. Table 1 shows RD results for the proposed transcoder. As it is observed, the transcoder complexity is highly reduced reaching about a 79% of TR on average without significant RD penalties. This small RD drop is a consequence of the parallel execution which cannot use the sequential standard process to calculate the predictors, so it uses an approximation provided by the DVC MVs. Moreover, similar results are observed for different GOPs due to the generic MVs extraction procedure employed by the proposal. The last column of Table 1 shows the frame per second (fps) rate achieved for the whole encoding process which performs real time encoding. The present approach is a straight forward step in the framework of GPU-based transcoders and, thus, although RD results are similar than presented in [8] (without using

A GPU-Based DVC to H.264/AVC Transcoder

239

MVs), this approach provides a more accurate displacement of the search area. This could be extended as future work trying to reduce the search area by using the incoming MVs to reach better time reduction without large increasing of the RD penalty. In addition, Figure 5 displays RD results from a graphical point of view. As it is shown, all QP points are much close and the proposal presents a similar behavior for different GOPs. Table 1. Performance of the proposed transcoder for 30 fps QCIF sequences RD performance of the WZ/H.264/AVC video transcoder – 30fps GOP PSNR (dB) Bitrate (%) TR (%) 2 -0.191 4.60 80.53 4 -0.217 4.80 80.54 8 -0.161 4.12 80.88 2 -0.055 1.21 72.97 4 -0.042 0.92 73.09 8 -0.036 0.82 73.18 2 -0.118 2.95 82.47 4 -0.102 2.33 81.90 8 -0.105 2.37 81.71 2 -0.216 5.11 81.31 4 -0.213 5.26 81.97 8 -0.201 5.08 82.11 -0.138 3.297 79.39

Sequence Foreman

Hall

Coastguard

Soccer mean

fps 27,12 26,95 26,83 28,96 28,99 29,05 27,11 27,07 27,13 26,57 26,84 26,65 27,44

Sequences QCIF (176x144) 30 fps GOP = 2 41.00 Soccer

Hall 39.00

Foreman CoastGuard

37.00

PSNR

35.00

33.00

31.00

29.00

Reference Proposed

27.00

25.00 0

50

100

150

200

250

300

Bit rate [kbit/s]

Fig. 5. PSNR/bitrate results for sequences with GOP = 2. Reference symbols: ●Foreman ♦Hall ▲CoastGuard ■ Soccer.

5 Conclusions This paper presents a GPU-based video transcoder to efficiently support the mobile to mobile communications. The incoming DVC MVs are used as candidate to define the predicted search area and then, the ME algorithm is executed in parallel over the

240

A. Corrales-García et al.

GPU. The presented transcoder shows that the parallel computing in general, and GPUs as particular, is another efficient way to accelerate video coding algorithms. The improved transcoder depicted in this paper achieves a time reduction 79% on average with negligible rate distortion penalty. Ongoing work walks to improve the DVC decoding part of the transcoder using also parallel processing. Acknowledgments. This work was supported by the Spanish MEC and MICINN, as well as European Comission FEDER funds, under Grants CSD2006-00046, TIN200914475-C04 and TIN2009-05737-E. It was also partly supported by The Council of Science and Technology of Castilla-La Mancha under Grants PEII09-0037-2328, PII2I09-0045-9916 and PCC08-0078-9856. The work presented was developed by using the VISNET2-WZ-IST software developed in the framework of the VISNET II project.

References 1. ITU-T and ISO/IEC JTC 1: Advanced Video Coding for Generic Audiovisual Services. ITU-T Rec. H.264/AVC and ISO/IEC 14496-10 Version 8 (2007) 2. Girod, B., Aaron, A., Rane, S., Monedero, D.R.: Distributed Video Coding. In: Proc. of IEEE Special Issue on Advances in Video Coding and Delivery, vol. 93(1), pp. 1–12 (2005) 3. Peixoto, E., Queiroz, R.L., Mukherjee, D.: A Wyner-Ziv Video Transcoder. IEEE Trans. Circuits and Systems for Video Technology (to appear, 2010) 4. Martínez, J.L., Kalva, H., Fernández-Escribano, G., Fernando, W.A.C., Cuenca, P.: Wyner-Ziv to H.264 video transcoder. In: 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, pp. 2941–2944 (2009) 5. Ascenso, J., Brites, C., Pereira, F.: Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding. In: 5th EURASIP Conference on Speech and Image Processing, Multimedia Communications and Services, Smolenice, Slovak Republic (2005) 6. Feng, W.-c., Manocha, D.: High-performance computing using accelerators. Parallel Computing 33(10-11), 645–647 (2007) 7. NVIDA, NVIDIA CUDA Compute Unified Device Architecture-Programming Guide, Version 2.2 (February 2009) 8. Rodriguez, R., Martínez, J.L., Fernández-Escribano, G., Claver, J.M., Sánchez, J.L.: Accelerating H.264 Inter Prediction in a GPU by using CUDA. In: Proceedings of IEEE International Conference on Consumer Electronics, Las Vegas, NV, USA (2010) 9. VISNET II project, http://www.visnet-noe.org/ (last visited March 2010) 10. Sullivan, G., Bjøntegaard, G.: Recommended Simulation Common Conditions for H.26L Coding Efficiency Experiments on Low-Resolution Progressive-Scan Source Material. ITU-T VCEG, Doc. VCEG-N81 (2001) 11. Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Reference Software to Committee Draft. JVT-F100 JM15.1 (2009)

Hybrid Color Space Transformation to Visualize Color Constancy Ramón Moreno, José Manuel López-Guede, and Alicia d’Anjou Computational Intelligence Group Universidad del País Vasco, UPV/EHU http://www.ehu.es/cwintco

Abstract. Color constancy and chromatic edge detection are fundamental problems in artiﬁcial vision. In this paperwe present a way to provide a visualization of color constancy that works well even in dark scenes where such humans and computer vision algorithms have hard problems due to the noise. The method is an hybrid and non linear transform of the RGB image based on the assignment of the chromatic angle as the luminosity value in the HSV space. This chromatic angle is deﬁned on the basis of the dichromatic reﬂection model, having thus a physical model supporting it. Keywords: Color Constancy, Chromatic Edge, Color Segmentation, Illumination Transform.

1

Introduction

Color constancy (CC) is fundamental problem in artiﬁcial vision [4,10,15], and it has been the subject of neuropsicological research [1], it can be very inﬂuential in Color Clustering processes [2,11,7,3]. It is the ability of the human observer to identify the same surface color in spite of changes of environmental light, shadows and diverse degrees of noise. A related problem is that of Chromatic Edge detection (CE), meaning the ability to detect the location of surface and scene color transitions, corresponding to object boundaries. In the artiﬁcial vision framework, works ensuring CC or trying to perform CR, must assume some color space, often they must perform the estimation of the illumination source chromaticity [6,14] and proceed by the separation of diﬀuse and specular image components [9,12,16]. Usually, CC is associated with the diﬀuse component of the image. Measurements on human subjects lead to the conclusion that retinal processing is not enough to extract chromatic features and chromatic based structural image information. Some works demonstrate that CC analysis is done in the visual cortex, in the areas V4 and V4A [1]. Assuming the analogy with the human

This work been supported by Ministerio de Ciencia e Innovación of Spain TIN200905736-E/TIN.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 241–247, 2010. c Springer-Verlag Berlin Heidelberg 2010

242

R. Moreno, J.M. López-Guede, and A. d’Anjou

vision biology, artiﬁcial vision systems need no trivial processing to ensure CC results on the processing real images. Dark scenes are critical for CC, because dark image regions are usually very noisy, that is, the signal to noise ratio is very high due to the low magnitude of the visual signal. In these regions, the ubiquitous thermodynamical noise has an ampliﬁed eﬀect that distorts region and edge detection ensuring CC conditions. Our approach obtains remarkable good results in these critical regions. In this paper we present a hybrid and non linear transformation of the RGB image based on the assignment of the chromatic angle of the pixel (computed in the RGB space) as the luminosity value in the HSV space. The image is preprocessed to remove the specular component [13]. The chromatic angle was deﬁned on the basis of the Dichromatic Reﬂection Model (DRM), having thus a physical interpretation supporting it. In the HSV color space the intensity is represented in the V value, changing it does not change the pixel chromatic information. Thus, to visualize CC we assign constant intensity to the pixels having common chromatic features, by assigning the chromatic angle as the V value in HSV space. The paper has the following structure: section 2 is a brief overview of the dichromatic reﬂection model (DRM). Section 3 presents our approach. Section 4 shows and explains the experimental results. Section 5 gives the conclusions and directions for further works.

2

Dichromatic Reflection Model (DRM) in the RGB Space

The Dichromatic Reﬂection Model (DRM) was introduced by Shafer [8]. It explains the perceived color intensity I ∈ R3 of each pixel in the image as addition of two components, one diﬀuse component D ∈ R3 and a specular component S ∈ R3 . The diﬀuse component refers to the chromatic properties of the observed surface, while the specular component refers to the illumination color. Surface reﬂections are pixels with a high specular component. The mathematical expression of the model, when we have only one surface color in the scene, is as follows: I(x) = md (x)D + ms (x)S, (1) where md and ms are weighting values for the diﬀuse and specular components, taking values in [0, 1]. In ﬁgure1 the stripped region represents a convex region of the plane Πdc in RGB that contains all the possible colors expressed by the DRM equation 1. For an scene with several surface colors, the DRM equation must assume that the diﬀuse component may vary spatially, while the specular component is constant across the image domain: I(x) = md (x)D(x) + ms (x)S.

Hybrid Color Space Transformation to Visualize Color Constancy

243

Fig. 1. Typical distribution of pixels in the RGB space according to the Dichromatic Reﬂection Model

That the specular component is space invariant in both cases, means that the illumination is constant for all the scene. Finally, assuming several illumination colors we have the most general DRM I(x) = md (x)D(x) + ms (x)S(x), where the surface and illumination chromaticity are spatially variant. In the HSV color space, chromaticity is identiﬁed with the pair (H, S), and the V variable represents the luminosity or light intensity. Plotting on the RGB space a collection of color points that have constant (H, S) components and variable intensity I component, we have observed that chromaticity in the RGB space is geometrically characterized by a straight line crossing the RGB space’s origin, determined by the φ and θ angles of the polar coordinates of the points over this chromaticity line. The plot of the pixels in a chromatically uniform image region appear as straight line in the RGB space. We denote Ld this diﬀuse line. If the image has surface reﬂection bright spots, the plot of the pixels in these highly specular regions appear as another line Ls intersecting Ld . For diﬀuse pixels (those with a small specular weight ms (x)) the zenith φ and azimuthal θ angles are almost constant, while they are changing for specular pixels, and dramatically changing among diﬀuse pixels belonging to diﬀerent color regions. Therefore, the angle between the vectors representing two neighboring pixels I (xp ) and I (xq ), denoted ∠ (Ip , Iq ), reﬂects the chromatic variation among them. For two pixels in the same chromatic regions, this angle must be ∠(Ip , Iq ) = 0 because they will be collinear in RGB space. The angle between Ip , Iq is calculated with the equation: ⎞ ⎛ T I (xp ) I (xq ) ⎠. ∠(Ip , Iq ) = arccos ⎝ (2) I (xp )2 + I (xq )2

3

An Approach for Regular Region Intensity

The basic idea of our approach is to assign a constant luminosity to the pixels inside an homogeneous chromatic region. To do that we must combine manipulations over

244

R. Moreno, J.M. López-Guede, and A. d’Anjou

the two color space representations of the pixels, the HSV and RGB. The process is highly non linear and it is composed of the following steps: 1. Isolate the diﬀuse component removing specular components (ms = 0): we are interested only in the diﬀuse component because it is the representation of the true surface color. We use the method presented in [12] to perform the diﬀuse and specular component separation. 2. Transform the diﬀuse RGB image into the HSV color space. 3. Compute for each pixel in the image the chromaticity angle as the angle between the gray diagonal line in the RGB space, going from the black space origin to the pure white corner, and the chromaticity line of the pixel. 4. Assume the normalized chromaticity angle as the new luminosity value in the HSV space pixel representation. In an homogeneous chromatic region, all pixels fall on the same diﬀuse line Ld : (r, g, b) = O + sσ; ∀s ∈ R+ where O = [0, 0, 0] and σ = [σr , σg , σb ] is the region chromaticity. The chromatic reference is the pure white line Lpw which is deﬁned as Lpw : (r, g, b) = c + su; ∀s ∈ R+ where O = [0, 0, 0] and u = [1, 1, 1]. Therefore, if all pixels is a region belong to the same chromatic line, the angle between each pixel and the line Lpw must be the same, and the result of this angular measurement is a constant for whole region. Our strategy is to normalize this measure in his domain of deﬁnition (the RGB cube) and assume it as the constant luminosity value V . This method is expressed with the equation: V new (x) =

∠ (I(x), u) arccos(ϑ)

(3)

where the denominator arccos(ϑ) is the normalization constant corresponding to the maximum angle between the extreme chromatic lines of the RGB space (red, green or blue axes) and the pure white line. Algorithm 1, shows a Matlab/Scilab implementation of the method, where ϑ takes the value 13 and arccos(ϑ) = 0.9553166. Algorithm 1. Regular Region Intensity function IR = SF3(I) Idiﬀ = imDiﬀuse(I); // look for the diﬀuse component new_intensity = angle(Idiﬀ, [1 1 1]); // return a matrix of chromatic angles Ihsv = rgb2hsv(Idiﬀ); Ihsv(:,:,3) = new_intensity; // assign the normalized angles as image intensity IR = hsv2rgb(Ihsv); endfunction

4

Experimental Results

We present the results from three computational experiments. The ﬁrst one using a synthetic image and the remaining using natural images. The ﬁgure 2 displays the ﬁrst experimental results. The ﬁgure 2a is the original image. The ﬁgure 2b

Hybrid Color Space Transformation to Visualize Color Constancy

(a)

(b)

(c)

(d)

245

Fig. 2. Synthetic image results (a) original image, (b) diﬀuse component of the image, (c) our method on image (a), our method on image (b)

is the diﬀuse image obtained applying the method in [13]. The image 2c is the result applying our proposed method in the image 2a. The ﬁgure 2d display the result applying the method in the image 2b. It can be appreciated that our method is able to identify the main chromatic regions even without component separation (ﬁgure 2c), with some artifact due to the bright reﬂections. After removal of these reﬂections, the method has a very clean identiﬁcation of the chromatic regions. For the next experiments we use natural images that have been used by other researchers previously. The ﬁgures 3 and 4 show the experimental results. In both

(a)

(b)

(c)

(d)

Fig. 3. Natural image results, (a) original image, (b) diﬀuse component of the image, (c) our method on image (a), our method on image (b)

246

R. Moreno, J.M. López-Guede, and A. d’Anjou

(a)

(b)

(c)

(d)

Fig. 4. Natural images, (a) original image, (b) diﬀuse component of the image, (c) our method on image (a), our method on image (b)

cases the subﬁgure (a) has the original image, subﬁgure (b) shows the diﬀuse image, subﬁgure (c) displays the results applying our proposed method to the original image (a), subﬁgure (d) show the results applying our method in the diﬀuse image (b). In both experiments we can see a similar eﬀect of applying specular correction. The images (c) obtained without component separation, show a better chromatic preservation, although with some degradation in the regions corresponding to the specular brights. The images obtained after diﬀuse component identiﬁcation [13] are less sensitive to specular eﬀects, however they show some chromatic region oversegmentation. It is important to note that no clustering process has been performed to obtain these images.

5

Conclusions and Further Works

In this work we present a color transformation that enables good visualization of Color Constancies in the image, changing only the image luminosity and preserving its chromaticity. The result is a new image with strong contrast between chromatic homogeneous regions, and good visualization of these regions as uniform regions in the image. This method performs very well in dark regions, which are critical for most CC methods and image segmentation based on color clustering processes. The method could be the basis for such a process, applying the clustering process to the chromaticity angle. We have found that specular correction of the image improves the results on highly specular regions of the image, however our approach performs well also

Hybrid Color Space Transformation to Visualize Color Constancy

247

on images that have not been preprocessed. Future works will be addressed to the computation of color edge detection and color image segmentation based on this approach. Hierarchical approaches may be useful [5].

References 1. Barbur, J.L., Spang, K.: Colour constancy and conscious perception of changes of illuminant. Neuropsychologia 46, 853–863 (2008); PMID: 18206187 2. Cheng, H.D., Jiang, X.H., Sun, Y., Wang, J.: Color image segmentation: advances and prospects. Pattern Recognition 34(12), 2259–2281 (2001) 3. Garcia-Sebastian, M., Gonzalez, A.I., Grana, M.: An adaptive ﬁeld rule for non-parametric mri intensity inhomogeneity estimation algorithm. Neurocomputing 72(16-18), 3556–3569 (2009); Financial Engineering; Computational and Ambient Intelligence (IWANN 2007) 4. Gijsenij, A., Gevers, T., van de Weijer, J.: Generalized gamut mapping using image derivative structures for color constancy. International Journal of Computer Vision 86(2), 127–139 (2010) 5. Graña, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 6. Choi, Y.-J., Yoon, K.-J., Kweon, I.S.: Illuminant chromaticity estimation using dichromatic slope and dichromatic line space. In: Korea-Japan Joint Workshop on Frontiers of Computer Vision, FCV, pp. 219–224 (2005) 7. Lezoray, O., Charrier, C.: Color image segmentation using morphological clustering and fusion with automatic scale selection. Pattern Recognition Letters 30(4), 397– 406 (2009) 8. Shafer, S.A.: Using color to separate reﬂection components. Color Research and Aplications 10, 43–51 (1984) 9. Shen, H.-L., Zhang, H.-G., Shao, S.-J., Xin, J.H.: Chromaticity-based separation of reﬂection components in a single image. Pattern Recognition 41, 2461–2469 (2008) 10. Skaﬀ, S., Arbel, T., Clark, J.J.: A sequential bayesian approach to color constancy using non-uniform ﬁlters. Computer Vision and Image Understanding 113(9), 993– 1004 (2009) 11. Tan, R.T., Nishino, K., Ikeuchi, K.: Color constancy through inverse-intensity chromaticity space. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 21(3), 321–334 (2004) 12. Tan, R.T., Nishino, K., Ikeuchi, K.: Separating reﬂection components based on chromaticity and noise analysis. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1373–1379 (2004) 13. Tan, R.T., Ikeuchi, K.: Reﬂection components decomposition of textured surfaces using linear basis functions. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, June 20-25, vol. 1, pp. 125–131 (2005) 14. Tan, T.T., Nishino, K., Ikeuchi, K.: Illumination chromaticity estimation using inverse-intensity chromaticity space. In: Proceedings of 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 18-20, vol. 1, pp. I-673–I-680 (2003) 15. Yoon, K.-J., Choﬁ, Y.J., Kweon, I.-S.: Dichromatic-based color constancy using dichromatic slope and dichromatic line space. In: IEEE International Conference on Image Processing, ICIP 2005, September 11-14, vol. 3, pp. III–960–3 (2005) 16. Yoon, K.-J., Choi, Y., Kweon, I.S.: Fast separation of reﬂection components using a specularity-invariant image representation. In: IEEE International Conference on Image Processing, October 8-11, pp. 973–976 (2006)

A Novel Hybrid Approach to Improve Performance of Frequency Division Duplex Systems with Linear Precoding Paula M. Castro, Jos´e A. Garc´ıa-Naya, Daniel Iglesia, and Adriana Dapena Department of Electronics and Systems University of A Coru˜ na, Spain {pcastro,jagarcia,dani,adriana}@udc.es

Abstract. Linear precoding is an attractive technique to combat interference in multiple-input multiple-output systems because it reduces costs and power consumption in the receiver equipment. Most of the frequency division duplex systems with linear precoding acquire the channel state information at the receiver by using supervised algorithms. Such algorithms make use of pilot symbols periodically sent by the transmitter. In a later step, the channel state information is sent to the transmitter side through a limited feedback channel. In order to reduce the overhead inherent to the periodical transmission of training data, we propose to acquire the channel state information by combining supervised and unsupervised algorithms, leading to a hybrid and more eﬃcient approach. Simulation results show that the performance achieved with the proposed scheme is clearly better than that with standard algorithms. Keywords: Linear Precoding, MIMO Systems.

1

Introduction

The increased demand of multimedia contents has produced a continuous development of new techniques to try to improve the throughput of digital communication systems. For instance, current transmission standards for Multiple-Input Multiple-Output (MIMO) systems include the so-called precoders in order to guarantee that the link throughput be maximized [1, 2]. Precoding algorithms for MIMO are classiﬁed into linear and nonlinear precoding types. In the sequel, we consider Linear Precoding (LP) approaches because they achieve reasonable throughput with a complexity lower than that required by non linear precoding approaches. In order to be able to implement precoding schemes, the base station must know the Channel State Information (CSI). However, in most of the Frequency Division Duplex (FDD) systems the transmitter (TX) cannot obtain the CSI from the received signals —even under the assumption of perfect calibration— because the channels are not reciprocal. The CSI is thus estimated at the receiver (RX) side and transmitted back through a limited feedback channel. Usually, E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 248–255, 2010. Springer-Verlag Berlin Heidelberg 2010

A Novel Hybrid Channel Estimation Approach with Linear Precoding

u [n] F x [n] H

gI uˆ [n]

Q(Ċ)

249

u˜ [n]

η [n] Fig. 1. MIMO System with linear transmit ﬁlter (linear precoding)

current standards perform the channel estimation by using supervised algorithms that make use of pilot symbols. Such pilot symbols do not convey information, and therefore, both system throughput and spectral eﬃciency are penalized. In this paper, we propose to combine two important paradigms of Neural Networks: supervised and unsupervised learning. The kind of learning to be used is decided based on a simple criterion that determines the time instant when the channel has suﬀered a signiﬁcant variation. In such moment, a supervised algorithm is employed to re-estimate the channel making use of pilot symbols. The rest of the time, when the channel variation is not signiﬁcant enough, the unsupervised algorithm known as Infomax [3] is utilized.

2

System Model

We consider a MIMO system with Nt transmit antennas and Nr receive antennas. The precoder generates the transmit signal x from all data symbols u = [u1 , . . . , uNr ] corresponding to the diﬀerent receive antennas 1, . . . , Nr .We denote the equivalent low-pass channel impulse response between the j–th transmit antenna and the i–th receive antenna as hi,j (τ, t). For ﬂat fading channels, the channel matrix H(t) is given by ⎛

⎞ h1,1 (t) · · · h1,Nt (t) ⎜ ⎟ .. .. .. H(t) = ⎝ ⎠, . . . hNr ,1 (t) · · · hNr ,Nt (t)

and the received signal is yj (t) =

Nt

hji (t)xi (t) + ηj (t), y(t) = H(t)x(t) + η(t),

(1)

i=1

where ηj (t) is the additive noise, x(t) = [x1 (t), . . . , xNt (t)]T ∈ CNt , y(t) = [y1 (t), . . . , yNr (t)]T ∈ CNr , and η(t) = [η1 (t), . . . , ηNr (t)]T ∈ CNr In general, if f [n] = f (nTs + Δ) denote the samples of f (t) every Ts seconds with Δ being the sampling delay and Ts the symbol time, then sampling y(t) every Ts seconds yields to the discrete time signal y[n] = y(nTs + Δ) given by y[n] = H[q]x[n] + η(n),

(2)

where n = 0, 1, 2, . . . corresponds to samples spaced Ts seconds, and q denotes the time slot. The channel remains stationary during a block of NB symbols.

250

P.M. Castro et al.

Note that this discrete time model is equivalent to the continuous-time model in (1) only if Inter-Symbol Interference (ISI) is avoided (i.e. if the Nyquist criterion is satisﬁed). In that case, we will be able to reconstruct the original continuoustime signal from the samples. This channel model is known as time-varying flat block-fading channel and it will be assumed in the sequel. For brevity, we omit the slot index q in the sequel. At the TX side, a way to carry out the pre–equalizer (or precoding) step consists in including a transmit ﬁlter matrix F ∈ CNt ×Nr , and a RX ﬁlter matrix G = gI ∈ CNr ×Nr , leading to Nr scalar data streams. Figure 1 shows the resulting communications system in which the data symbols u[n] are passed through the transmit ﬁlter F to form the transmit signal x[n] = F u[n] ∈ CNt . Note that the constraint for the transmit energy must be fulﬁlled. Therefore, the received signal is given by y[n] = HF u[n] + η[n] ∈ CNr , where H ∈ CNr ×Nt , and η[n] ∈ CNr is the Additive White Gaussian Noise (AWGN). After multiplying by the receive gain g, we get the estimated symbols ˆ = gHF u[n] + gη[n] ∈ CNr . u[n]

(3)

Clearly, the restriction that all the receivers apply the same scalar weight g is not necessary for decentralized receivers. Replacing G by a diagonal matrix suﬃces (e.g. [4]). However, usually no closed form can be obtained for the precoder if G is diagonal. Fortunately, F can be found in closed form for G = gI. Thus, we use G = gI in the following. Although Wiener ﬁltering for precoding has only been considered by a few authors [5] in comparison with other criteria for precoding, it is a very powerful transmit optimization that minimizes the Mean Square Error (MSE) with a transmit energy constraint [6, 7, 8, 2], i.e.

2 H ˆ {FWF , gWF } = argmin E u[n] − u[n] (4) 2 , s.t.: tr(F Cu F ) ≤ Etx , {F ,g}

where Cu = E u[n]uH [n] . It has been demonstrated in [5] that (4) leads to a unique solution if we restrict g to be positive real. Then, the solution for the Wiener ﬁlter is given by −2

tr (H H H+ξI ) H H Cu H −1 Cη −1 H H FWF = gWF H H +ξI H , gWF = , ξ = tr Etx Etx (5)

3

Adaptive Algorithms

The model explained in Section 2 states that the observations are linear and instantaneous mixtures of the transmitted signals x[n] of (2). For the case of the

A Novel Hybrid Channel Estimation Approach with Linear Precoding

251

linear precoder described in the previous section, this equation can be rewritten as follows y[n] = HF u[n] + η[n]. (6) This means that the observations y[n] are instantaneous mixtures of the data symbols u[n], where the mixing matrix is given by HF . In the sequel, we will denote this mixing matrix as A, so the observations y[n] can be obtained as y[n] = Ad[n] + η[n].

(7)

According to our target, A may represent the channel matrix (see (2)), or the whole coding–channel matrix HF (see (6)). In the ﬁrst case, d[n] represents the coded signal x[n] = F u[n] and, in the second case, the user data signal u[n]. We assume that the mixing matrix is unknown but full rank. Without any loss of generality, we can suppose that the source data have a normalized power equal to one since possible diﬀerences in power may be included into the mixing matrix A. In order to recover the source data, we will use a linear system of which output is a combination of the observations, expressed as z[n] = W H [n]y[n], W ∈ CNr ×Nr ,

(8)

by combining both (7) and (8), the output z[n] can be rewritten as a linear combination of the desired signal z[n] = Γ [n]d[n],

(9)

where Γ [n] = W H [n]A represents the overall mixing/separating system. Sources are optimally recovered when the matrix W [n] is selected such as every output extracts a diﬀerent single source. This occurs when the matrix Γ [n] has the form Γ [n] = D[n]P [n],

(10)

where D[n] is a diagonal invertible matrix, and P [n] is a permutation matrix. In this paper, we consider two types of Neural Network paradigms: supervised and unsupervised approaches. Supervised Approach. A way to estimate the channel matrix, H, consists in minimizing the Mean Square Error MSE between the outputs y[n] and the code signals x[n]. In particular, by considering only one sample, we obtain the Least Mean Squares (LMS) algorithm, W [n + 1] = W [n] − μy[n](W H [n]y[n] − d[n])H . This algorithm is also called delta rule of Widrow-Hopf [9] in the context of Artiﬁcial Neural Networks. It is easy to prove that the stationary points of this rule are (11) W [n] = Cy −1 Cyd ,

252

P.M. Castro et al.

where Cy = E[y[n]y H [n]] is the autocorrelation of the observations and Cyd = E[y[n]dH [n]] is the cross–correlation between the observations and the desired signals. In practice, the desired signal is considered known only during a ﬁnite number of instants (pilot symbols) and the expectations are estimated by averaging samples. Unsupervised Approach. The inclusion of pilot symbols reduces the system throughput (or equivalently, the spectral eﬃciency of the system) and wastes transmission energy because pilot sequences do not convey user data. This limitation can be avoided by using Blind Source Separation (BSS) algorithms, which simultaneously estimate the mixing matrix A, and the realizations of the source vector u[n] from the corresponding realizations of the observations y[n]. One of the best known BSS algorithms has been proposed by Bell and Sejnowski in [3]. Given an activation function h(·), the idea is to obtain the weighted coeﬃcients of a Neural Network W [n] in order to maximize the mutual information between the outputs before the activation function h(z[n]) = h(W H [n]y[n]), and its inputs y[n], which is given by JMI (W [n]) = ln(det(W H [n])) +

NB

E[ln(hi (zi [n]))],

(12)

i=1

where hi is the i–th element of the vector h(z[n]) and denotes the ﬁrst derivative. The resulting algorithm, named Infomax, has the following form

W [n + 1] = W [n] + μW [n]W H [n] · y[n] g H (z[n]) − W −H [n]

(13) = W [n] + μW [n] z[n]g H (z[n]) − I . The expression in (12) admits an interesting interpretation when the non–linear function g(z) = z ∗ (1 − |z|2 ) is utilized. In this case, Castedo and Macchi [12] have shown that the Bell and Sejnowski rules are equivalent to the Constant Modulus Algorithm (CMA) proposed by Godard in [13].

4

Hybrid Approach

One of advantages of adaptive unsupervised algorithms is their ability to track low variations of the channel. On the contrary, supervised solutions provide a fast channel estimation for low or high variations at the cost of using pilot symbols. In this section, we combine this two paradigms in order to obtain a performance similar to that oﬀered by supervised approaches, but using lower number of pilot symbols. We will denote by Wu [n] and Ws [n] the matrices for the unsupervised and the supervised modules, respectively. We start with an initial estimation of the channel matrix obtained using the Widrow-Hopf solution (11). This estimation is used at the TX in order to obtain the optimum coding matrix F , and at the RX with the goal of initializing the unsupervised algorithm to Wu [n] = (F H)−H .

A Novel Hybrid Channel Estimation Approach with Linear Precoding

253

While the channel does not suﬀer from a signiﬁcant variation, the matrix Wu [n] is adapted (unsupervised mode) and the data symbols u[n] are recovered using z[n] = WuH [n]y[n]. However, when a signiﬁcant variation is detected, the RX sends an alarm to the TX through the feedback channel. Next, a pilot sequence is transmitted. Then, at the RX a supervised algorithm estimates the channel from the pilot symbols (channel estimation update). In particular, we make use of Widrow-Hopf solution (11) by considering that u[n] are the coded signals at the output of the linear precoder. This solution provides us the channel matrix estimation, which is sent to the TX in order to adapt the coding matrix. ˆ , and The RX also computes the coding matrix F , the reference matrix HF −1 ˆ initializes the unsupervised algorithms as Wu [n] = HF . The question now is how to determine when the channel has suﬀered a significant change. An interesting consequence of using a linear precoder is that the permutation indeterminacy (see (10)) associated to unsupervised algorithms is avoided because of the following initialization Wu [n] = (F H)−H . This means that the sources are recovered in the same order as they were transmitted. (10) implies that the optimum separation matrix produces a diagonal matrix Γ [n], and therefore, the mismatch of Γ [n] with respect to a diagonal matrix allows us to measure the variations of the channel. ˆ Although the channel matrix is unknown, we can use the estimation HF computed by the supervised approach as a reference. This means that in each ˆ . Consequently, the diﬀerence with iteration we can compute Γ [n] = WuH [n]HF respect to a diagonal matrix can be obtained using the following error criterion Nt Nt |γij [n]|2 |γji [n]|2 Error(n) = + , |γii [n]|2 |γii [n]|2 i=1

(14)

j=1,j =i

where γii [n] denotes the i–th element of its diagonal. The decision rule consists in comparing with some threshold t, i.e. Error(n) > t → Use supervised approach (15) Error(n) ≤ t → Use unsupervised approach

5

Experimental Results

We evaluate the performance of the proposed combined schemes by simulations. We transmit 8 000 pixels of the image cameraman (in TIF format with 256 gray levels) using a QPSK and a 4×4 MIMO system. The channel matrix is updated each 2 000 symbols using the following model: H = (1 − α)H + αHnew , where Hnew is a 4 × 4 matrix randomly generated according to a Gaussian distribution. The SNR has been ﬁxed to 20 dB. We compare the performance of three diﬀerent schemes (see Figure 2): the Widrow-Hopf solution (11) with 200 pilot symbols transmitted every 2 000 symbols (supervised approach); the Infomax algorithm (13) with the non linear

254

P.M. Castro et al.

10

0

Constant channel during a random number of symbols (between 2000 and 3000)

BER

Unsupervised 10

−1

10

−2

10

−3

10

−4

Constant channel during a ﬁxed number of symbols (2000)

Supervised

Hybrid

# channel estimation updates

4 3.5 3 2.5 2 1.5 1 0.5 0 0

0.1 0.2 0.3 0.4 Channel updating parameter α

0

0.1 0.2 0.3 0.4 Channel updating parameter α

Fig. 2. Performance results (see Section 5)

function g(z) = z ∗ (1 − |z|2 ), and μ = 0.001 (unsupervised approach); and the hybrid approach with a threshold t = 0.7. The left-hand side of Figure 2 shows the results when the channel remains constant during a random number of symbols between 2 000 and 3 000. The right-hand side of Figure 2 plots the results when the channel remains constant during NB = 2 000 symbols. The top side of Figure 2 shows the Bit Error Ratio (BER) obtained for all approaches. The bottom side shows the number of times the mixing matrix has been estimated, and updated, using the supervised approach. Comparing the curves in Figure 2, we observe that the BER oﬀered by the hybrid system is invariant to the number of symbols in which the channel remains constant.

6

Conclusion

In order to reduce the overhead due to the transmission of pilot symbols we have proposed to combine supervised and unsupervised algorithms. The algorithm selection was done by using a simple decision rule to determine a signiﬁcant variation in the channel. This information was sent to the TX using a limited

A Novel Hybrid Channel Estimation Approach with Linear Precoding

255

feedback channel. The experimental results showed that the hybrid approach is an attractive solution because it provides an adequate BER with a reduced number of pilot symbols.

Acknowledgment This work been supported by Xunta de Galicia, Ministerio de Ciencia e Innovaci´on of Spain, and FEDER funds under the grants 09TIC008105PR, TEC200768020-C04-01, CSD2008-00010, and TIN2009-05736-E/TIN.

References [1] Fischer, R.F.H.: Precoding and Signal Shaping for Digital Transmission. John Wiley & Sons, Chichester (2002) [2] Joham, M.: Optimization of Linear and Nonlinear Transmit Signal Processing. PhD dissertation, Munich University of Technology (2004) [3] Bell, A., Sejnowski, T.: An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation 7(6), 1129–1159 (1995) [4] Hunger, R., Joham, M., Utschick, W.: Extension of Linear and Nonlinear Transmit Filters for Decentralized Receivers. In: European Wireless 2005, vol. 1, pp. 40–46 (2005) [5] Joham, M., Kusume, M., Gzara, M.H., Utschick, W., Nossek, J.A.: Transmit Wiener Filter for the Downlink of TDD DS-CDMA Systems. In: Proc. ISSSTA, vol. 1, pp. 9–13 (2002) [6] Choi, R.L., Murch, R.D.: New Transmit Schemes and Simpliﬁed Receiver for MIMO Wireless Communication Systems. IEEE Transactions on Wireless Communications 2(6), 1217–1230 (2003) [7] Karimi, H.R., Sandell, M., Salz, J.: Comparison between Transmitter and Receiver Array Processing to Achieve Interference Nulling and Diversity. In: Proc. PIMRC 1999, vol. 3, pp. 997–1001 (1999) [8] Nossek, J.A., Joham, M., Utschick, W.: Transmit Processing in MIMO Wireless Systems. In: Proc. 6th IEEE Circuits and Systems Symposium on Emerging Technologies: Frontiers of Mobile and Wireless Communication, Shanghai, pp. 1–18 (2004) [9] Haykin, S.: Neural Networks. A Comprehensive Foundation. Macmillan College Publishing Company, New York (1994) [10] Amari, S.-I.: Gradient Learning in Structured Parameter Spaces: Adaptive Blind Separation of Signal Sources. In: Proc. WCNN 1996, pp. 951–956 (1996) [11] Mejuto, C., Castedo, L.: A Neural Network Approach to Blind Source Separation. In: Proc. Neural Networks for Signal Processing VII, pp. 486–595 (1997) [12] Castedo, L., Macchi, O.: Maximizing the Information Transfer for Adaptive Unsupervised Source Separation. In: Proc. SPAWC 1997, pp. 65–68 (1997) [13] Godard, D.N.: Self-Recovering Equalization Carrier Tracking in Two-Dimensional DataCommunications Systems. IEEE Transactions on Communications (1980)

Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW) Otoniel L´ opez , Miguel Mart´ınez-Rach, Pablo Pi˜ nol, Manuel P. Malumbres, and Jos´e Oliver Miguel Hern´ andez University, Avda. Universidad s/n, 03202 Elche, Spain Universidad Polit´ecnica de Valencia Camino de Vera s/n, 46022 Valencia, Spain {otoniel,mmrach,pablop,mels}@umh.es, [email protected]

Abstract. The 3D-DWT is a mathematical tool of increasing importance in those applications that require an eﬃcient processing of volumetric info. However, the huge memory requirement of the algorithms that compute it is one of the main drawbacks in practical implementations. In this paper, we introduce a fast frame-based 3D-DWT video encoder with low memory usage, based on lower-trees. In this scheme, there is no need to divide the input video sequence into group of pictures (GOP), and it can be applied in a continuous manner, so that no boundary eﬀects between GOPs appear. Keywords: 3D-DWT, wavelet-based video coding.

1

Introduction

In recent years, three-dimensional wavelet transform (3D-DWT) has focused the attention of the research community, most of all in areas such as video watermarking and 3D coding (e.g., compression of volumetric data [1] or multispectral images [2], 3D model coding [3], and especially, video coding). In video compression, some early proposals were based on merely applying the wavelet transform on the time axis after computing the 2D-DWT for each frame [4]. Then, an adapted version of an image encoder can be used, taking into account the new dimension. For instance, the two dimensional (2D) embedded zero-tree (IEZW) method has been extended to 3D IEZW for video coding by Chen and Pearlman[5], and showed promise of an eﬀective and computationally simple video coding system without motion compensation, obtaining excellent numerical and visual results. A 3D zero-tree coding through modiﬁed EZW has also been used with good results in compression of volumetric images[6]. In [4], instead of the typical quad-trees of image coding, a tree with eight descendants per coeﬃcient is used to extend the SPIHT image encoder to 3D video coding.

Thanks to Spanish Ministry of education and Science under grant DPI2007-66796C03-03 for funding.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 256–263, 2010. c Springer-Verlag Berlin Heidelberg 2010

Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW)

257

A more eﬃcient strategy for video coding with time ﬁltering is Motion Compensated Temporal Filtering (MCTF) [7]. In MCTF, in order to compensate object (or pixel) misalignment between frames, and hence avoid the signiﬁcant amount of energy that appears in high-frequency subbands, a motion compensation algorithm is introduced to align all the objects (or pixels) in the frames before being temporally ﬁltered. In all these applications, the ﬁrst problem that arises is the extremely high memory consumption of the 3D wavelet transform if the regular algorithm is used, since a group of frames must be kept in memory before applying temporal ﬁltering, and in the case of video coding, we know that the greater temporal decorrelation, the greater number of frames are needed in memory. So, the GOP size should be small in order to prevent high memory usage. This leads to another problem, since dividing the video sequence in small GOPs hinders the temporal decorrelation and may produce visual artifacts at the GOP boundary. Even though several proposals have been made to avoid the aforementioned problems, most of them are not general (for any wavelet transform) and/or complete (the wavelet coeﬃcients are not the same as those from the usual dyadic wavelet transform). In addition, software implementation is not always easy. In this paper, we propose a video encoder based on a frame-by-frame 3D-DWT scheme which does not require a GOP division, signiﬁcantly reduces the memory usage and performs the 3D-DWT much faster than traditional algorithms.

2

3D-DWT with Low Memory Usage

In this section we propose an extension to a three-dimensional wavelet transform of the classical line-based approach [8], which computes the 2D-DWT with reduced memory consumption. In the new approach, frames are continuously input with no need to divide the video sequence into GOPs. Moreover, the algorithm yields slices of wavelet subbands (which we call subband frames) as soon as it has enough frames to compute them. This approach works as follows: The algorithm starts requesting LLL frames to the last level (GetLLLframe( nlevel) in Fig. 1). As seen in Fig. 2, the nlevel buﬀer must be ﬁlled with subband frames from the nlevel -1 level before it can generate frames. In order to get them, this function recursively calls itself until level 0 is reached. At this point, it no longer needs to call itself since it can return a frame from the video sequence, which can be directly read from the input/output system. The ﬁrst time that the recursive function is called at every level, it has its buﬀer (buﬀerlevel ) empty. Then, its upper half (from N to 2N) is recursively ﬁlled with frames from the previous level. Recall that once a frame is received, it must be transformed using a 2D-DWT before being stored. Once the upper half is full, the lower half is ﬁlled by using symmetric extension. On the other hand, if the buﬀer is not empty, it simply has to be updated. In order to update it, it is shifted one position so that the frame contained in the ﬁrst position is discarded and a new frame can be introduced in the last position (2N) by using a recursive call. This operation is repeated twice.

258

O. L´ opez et al.

However, if there are no more frames in the previous level, this recursive call will return End Of Frame (EOF). That points out that we are about to ﬁnish the computation at this level, but we still need to continue ﬁlling the buﬀer. We ﬁll it by using symmetric extension again. Once the buﬀer is ﬁlled or updated, both high-pass and low-pass ﬁlter banks are applied to the frames in the buﬀer. As a result of the convolution, we get a frame of every wavelet subband at this level (HHLlevel , HLHlevel , HHHlevel , HLLlevel , LHLlevel , LLHlevel and LHHlevel ), and an LLL frame. The highfrequency coeﬃcients are compressed and this function returns the LLL frame (see Fig. 2). For more details about frame-by-frame 3D-DWT, and a formal description of the algorithm, the reader is referred to [9]. function LowMemUsage3D FWT(nlevel) set F ramesReadlevel = 0 ∀level ∈ nlevel set buﬀerlevel = empty ∀level ∈ nlevel repeat LLL = GetLLLframe(nlevel) if (LLL != EOF) ProcessLowFreqSubFrame(LLL) until LLL = EOF end of fuction

Fig. 1. Perform the 3DFWT by calling GetLLLFrame recursive function

3

3D LTW

This section introduces the extension of the LTW still image coding [10] to 3D video coding. Our main concern is to keep the same simplicity of the 2D LTW, still giving high performance and low memory requirements. However, some changes must be done in the LTW algorithm so that it can be incorporated in this eﬃcient wavelet transform. The main changes are: – Global knowledge of the video frame is no longer available, and therefore an estimation of the highest coeﬃcient that may appear should be made, mainly depending on the type of wavelet normalization and the pixel resolution of the source video (in bpp). Finally, to ensure the correctness of the encoder, an escape code should be used for values outside the predicted range. – Since coeﬃcients from diﬀerent subband levels are interleaved (due to the computation order of the proposed wavelet transform), instead of a single bitstream, we should generate a diﬀerent bitstream for every subband level. These bitstreams can be held in memory or saved in secondary storage, and are employed to form the ﬁnal ordered bitstream. – Now, the root of a tree has eight descendants, instead of the four descendants in the 2D-LTW. Fig. 3 shows our overall system. The 3D-DWT module releases subband frames at diﬀerent decomposition levels. At each level the subband frames are stored

Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW)

259

function GetLLLFrame (level) 1) First base case: No more frames to read at this level if F ramesReadlevel = MaxF rameslevel return EOF 2) Second base case: The current level belongs to the space domain and not to the wavelet domain else if level = 0 return InputFrame() else 3) Recursive case 3.1) Recursively ﬁll or update the buﬀer for this level if buﬀerlevel is empty for i = N . . . 2N buﬀerlevel (i) = 2DF W T (GetLLf rame(level − 1)) FullSymmetricExtension(buﬀerlevel) else repeat twice Shift(buﬀerlevel ) frame = GetLLLframe(level − 1) if frame = EOF buﬀerlevel (2N ) = SymmetricExt(buﬀerlevel) else buﬀerlevel (2N ) = 2DFWT(frame) 3.2) Calculate the WT for the time direction from the frames in buﬀer, then process the resulting high frequency subband frames {LLL, LLH, LHL, LHH} =Z-axis FWT LowPass(buﬀerlevel ) {HLL, HLH, HHL, HHH} =Z-axis FWT HighPass(buﬀerlevel ) ProcessSubFrames({LLH, LHL, LHH, HLL, HLH, HHL, HHH}) set F ramesReadlevel =F ramesReadlevel + 1 return LLL end of fuction

Fig. 2. GetLLLFrame Recursive function

in a dedicated encoder buﬀer. There are two subband frames for each subband type. When this buﬀer is full, the 3D-DWT encoder process all subbands and maintains the signiﬁcance map for building the trees. An important diﬀerence between this version and the LTW presented previously is that the new adapted encoder must process coeﬃcients in only one-pass, and therefore symbols must be computed and output at once. However, in this case, it is not an important drawback because the order of the wavelet coefﬁcients is later arranged for the decoder with an independent bitstream per decomposition level. The 3D-LTW algorithm is formally described in Fig. 4. Let us see it with some detail. The encoder has to determine if each 2x2 block of coeﬃcients of both subband frames stored in the encoding buﬀer is part of a lower-tree. If the eight coeﬃcients in these blocks are lower than the quantization threshold 2rplanes , and their descendant oﬀspring are also insigniﬁcant, they are part of a lower-tree and do not need to be encoded. In order to know if their oﬀspring are signiﬁcant, we need to hold a binary signiﬁcance map of every encoder buﬀer (S L in the ﬁgure) because the encoder buﬀer is overwritten by the wavelet transform once it is encoded, and hence the signiﬁcance for their ascendant coeﬃcients is not automatically held. Obviously, this signiﬁcance map was not needed in the original LTW because the whole image was available for the encoder. The width of each signiﬁcance map is sized eigth the size of the encoder buﬀer that it represents, since the signiﬁcance is held for both 2x2 block. The signiﬁcance of

260

O. L´ opez et al.

N level buffers LLLnlevel Bits

Buffer size (Width/2nlevel-1)x(Height/2nlevel-1)

FraameͲbased 3DDWTT

HLL2

HLH2

LHL2

LHH2

HHL2

HHH2 Buffers Length

LLH2

Buffer size (Width/4)x(Height/4) HLL1 buffer

HLH1 buffer

LHL1 buffer

LHH1 buffer

HHL1 buffer

HHH1 buffer

LLH1 buffer

.. .

S2 Significance map

2nd level bitstream

S1 Significance map

B ff Buffers Length

Buffer size (Width/2)x(Height/2) Video Frames (level=0) (INPUT)

TreeͲbased Subband Encoder

.. .

1st level bitstream

Final Bitstream (OUTPUT)

Fig. 3. Overview of the proposed tree-based encoder with eﬃcient use of memory

both 2x2 blocks can be held with a single bit. Therefore, the memory required for these signiﬁcance maps is almost negligible when compared with the rest of buﬀers. As in original LTW encoder, when there is a signiﬁcant coeﬃcient in both 2x2 block or in its descendant coeﬃcients, we need to encode each coeﬃcient separately. Recall that in this case, if a coeﬃcient and all its descendants are insigniﬁcant, we use the LOWER symbol to encode the entire tree, but if it is insigniﬁcant, and the signiﬁcance map of its eight direct descendant coeﬃcients shows that it has a signiﬁcant descendant, the coeﬃcient is encoded as ISOLATED LOWER. Finally, when a coeﬃcient is signiﬁcant, it is encoded with a numeric symbol along with its signiﬁcant bits and sign. At the last level (N), the tree cannot be propagated upward, and for this reason, we always encode all the coeﬃcients at this level. Moreover, we can keep the compressed bit-stream in memory, which allows us to invert the order of the bitstream for the inverse procedure.

4

Results

In this section we analyze the behavior of the proposed encoder (3D-LTW). We will compare the 3D-LTW encoder versus the fast M-LTW Intra video encoder[11], 3D-SPIHT [12] and H.264 (JM16.1 version), in terms of R/D performance, coding and decoding delay and memory requirements. All the evaluated encoders have been tested on an Intel PentiumM Dual Core 3.0 GHz with 1 Gbyte RAM memory. In the frame-by-frame 3D wavelet transform, each buﬀer must be able to keep either 2N + 1 (ﬁlter length) low frequency frames at every level and each buﬀer at a level i needs a quarter of coeﬃcients if compared with the previous level (i − 1). Therefore, for a frame size of (wxh) and an nlevel time decomposition, the number of coeﬃcients required by this algorithm is:

Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW)

261

function SubbandCode( level , Buﬀer, S level−1 , S level ) Scan Buﬀer in 2x2 blocks (Bx,y ) in horizontal raster order for each blockBx,y = {c2x,2y , c2x+1,2y , c2x,2y+1 , c2x+1,2y+1 } level−1 if level = N ∧ ci,j < 2rplanes ∧ Si,j isInsignif.∀ci,j ∈ Bx,y level set Sx,y = Insignif. else level set Sx,y = Signif. for each ci,j ∈ Bx,y if ci,j < 2rplanes

level−1 if Si,j isInsignif. arithmetic output LOWER else arithmetic output ISOLATED LOWER else nbitsi,j = log2 (|Ci,j |) level−1 if Si,j isInsignif. ER arithmetic output nbitsLOW i,j else arithmetic output nbitsi,j output bitnbits(i,j) −1 (|Ci,j |). . . bitrplane+1 (|Ci,j |) output sign(ci,j ) endif endif end of fuction Note: bitn (C) is a function that returns the nth bit of C

Fig. 4. Lower tree wavelet coding with reduced memory usage ∞ (2N + 1) × (w × h) 4 = (2N + 1) × (w × h) × . n 4 3 n=0

(1)

which is asymptotically (as nlevel approaches inﬁnity) independent of the number of frames to be encoded, less than the regular case, which needs (wxhxG), being G the number of frames in a GOP. In Table 1, the memory requirements of encoders under test are shown. Obviously, the M-LTW encoder only uses the memory needed to store one frame. The 3D-LTW encoder (using Daubechies 9/7F ﬁlter for both spatial and temporal ﬁltering) uses up to 3.4 times less memory than 3D-SPIHT for CIF sequence size and up to 9 times less memory than H.264 for QCIF sequence size. Regarding R/D, in Fig. 5 we can see the R/D behavior of all evaluated encoders. As shown, H.264 is the one that obtains the best results, mainly due to the motion estimation/motion compensation (ME/MC) stage included in this encoder, contrary to 3D-SPIHT and 3D-LTW that do not include any ME/MC stage. It is interesting to see the improvement of 3D-SPIHT and 3D-LTW when Table 1. Memory requirements for evaluated encoders (KB) (results obtained with Windows XP task manager, peak memory usage index) Codec/Format H.264 3D-SPIHT 3D-LTW M-LTW QCIF 35824 10152 4008 1104 CIF 86272 34504 10644 1540

262

O. L´ opez et al. 50 55

45

50

P PSNR(dB)

PSNR(dB)

45 40 35

40

35

3DͲSPIHT

30

3DͲSPIHT

3DͲLTW

3DͲLTW

30

MͲLTW(Intra)

25

MͲLTW(Intra)

H264 0

100

200

300

400

500

600

700

H264

25

20 800

0

900

500

1000

1500

2000

2500

3000

3500

TargetBitͲrate(Kbps)

TargetBitͲrate(Kbps)

(a) Container

(b) Foreman

Fig. 5. PSNR (dB) for all evaluated encoders for a) Container sequence in QCIF format and b) Foreman sequence in CIF format 684.62

1,000.00

190.27

167.73

FraamesperSecond

100.00

62.61

47.06 23.19

10.00

1.00 0.18

0.10

0.04

0.01

CIF 3DͲSPIHT

QCIF 3DͲLTW

MͲLTW(Intra)

H264

Fig. 6. Execution time comparison of the encoding process

compared to an INTRA video encoder. As mentioned, no ME stage is included in 3D-SPIHT and 3D-LTW, so this improvement is accomplished by exploiting only the temporal redundancy among video frames. The R/D behavior of 3DSPIHT and 3D-LTW is similar for images with moderate-high motion activity, being 3D-LTW slightly better than 3D-SPIHT (up to 0.5 dB), but for sequences with low movement, 3D-SPIHT outperforms 3D-LTW, mainly due to the further dyadic decompositions applied in the temporal high frequency. Regarding coding delay, in Fig. 6 we can see that the 3D-LTW encoder is the fastest one, being up to 10 times faster than 3D-SPIHT for QCIF size sequences, 3.5 times faster than the M-LTW INTRA video encoder and up to 3800 times faster than H.264. The decoding process is also faster in 3D-LTW than in the other encoders.

5

Conclusions

In this paper a fast and very low memory demanding 3D-DWT encoder has been presented. The new encoder reduces the memory requirements compared with 3D-SPIHT (3.5 times less memory) and H.264 (up to 10 times less memory). The new 3D-DWT encoder is very fast (up to 10 times faster than 3D-SPIHT) and it has better R/D behavior than the INTRA video coder M-LTW (up to 11 dB). In order to improve the coding eﬃciency, an ME/MC stage could be added. In

Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW)

263

this manner, the objects/pixels of the input video sequence will be aligned, and so, fewer frequencies would appear at the higher frequency subbands, improving the compression performance. Acknowledgments. Thanks to Spanish Ministry of Science and Innovation under grant TIN2009-05737-E for funding.

References 1. Schelkens, P., Munteanu, A., Barbariend, J., Galca, M., Giro-Nieto, X., Cornelis, J.: Wavelet coding of volumetric medical datasets. IEEE Transactions on Medical Imaging 22(3), 441–458 (2003) 2. Dragotti, P., Poggi, G.: Compression of multispectral images by three-dimensional SPITH algorithm. IEEE Transactions on Geoscience and Remote Sensing 38(1), 416–428 (2000) 3. Aviles, M., Moran, F., Garcia, N.: Progressive lower trees of wavelet coeﬃcients: Eﬃcient spatial and SNR scalable coding of 3D models. In: Ho, Y.-S., Kim, H.-J. (eds.) PCM 2005. LNCS, vol. 3767, pp. 61–72. Springer, Heidelberg (2005) 4. Kim, B., Xiong, Z., Pearlman, W.: Low bit-rate scalable video coding with 3D set partitioning in hierarchical trees (3D SPIHT). IEEE Transactions on Circuits and Systems for Video Technology 10, 1374–1387 (2000) 5. Chen, Y., Pearlman, W.A.: Three-dimensional subband coding of video using the zero-tree method. In: Visual Communications and Image Processing, Proc. SPIE, March 1996, vol. 2727, pp. 1302–1309 (1996) 6. Luo, J., Wang, X., Chen, C., Parker, K.: Volumetric medical image compression with three-dimensional wavelet transform and octave zerotree coding. In: Visual Communications and Image Processing, Proc. SPIE, March 1996, vol. 2727, pp. 579–590 (1996) 7. Secker, A., Taubman, D.: Motion-compensated highly scalable video compression using an adaptive 3D wavelet transform based on lifting. In: IEEE Internantional Conference on Image Processing, October 2001, pp. 1029–1032 (2001) 8. Chrysaﬁs, C., Ortega, A.: Line-based, reduced memory, wavelet image compression. IEEE Transactions on Image Processing 9(3), 378–389 (2000) 9. Oliver, J., Lopez, O., Martinez-Rach, M., Malumbres, M.: A general frame-byframe wavelet transform algorithm for a three-dimensional analysis with reduced memory usage. In: IEEE International Conference on Image Processing, October 2007, pp. 469–472 (2007) 10. Oliver, J., Malumbres, M.P.: Low-complexity multiresolution image compression using wavelet lower trees. IEEE Transactions on Circuits and Systems for Video Technology 16(11), 1437–1444 (2006) 11. Lopez, O., Martinez-Rach, M., Pi˜ nol, P., Malumbres, M., Oliver, J.: M-LTW: A fast and eﬃcient intra video codec. Signal Processing: Image Communication (23), 637–648 (2008) 12. Kim, B.J., Xiong, Z., Pearlman, W.: Very low bit-rate embedded video coding with 3D set partitioning in hierarchical trees (3D SPIHT) (1997)

Color Video Segmentation by Dissimilarity Based on Edges Luc´ıa Ramos1 , Jorge Novo1 , Jos´e Rouco1 , Antonio Mosquera2, and Manuel G. Penedo1 1

VARPA Group, Department of Computer Science, University of A Coru˜ na, Spain [email protected], {jnovo,jrouco,mgpenedo}@udc.es 2 Artiﬁcial Vision Group, Department of Electronics and Computer Science, University of Santiago de Compostela, Spain [email protected]

Abstract. In this work new approaches are proposed to the extension to color-space of diﬀerent shot change detection methods. These techniques are those ones that use the dissimilarity based on edges, in particular on space and frequency domains. They were previously deﬁned to deal with grayscale videos, so the methods are redesigned to provide the best possible results in color-space videos. Moreover, some improvements are performed to obtain better results, as the use of an adaptative threshold instead of the ﬁxed one previously used. Some experiments are presented to show the better behaviour of the new developed approaches. Keywords: Video segmentation, detection of changes of scene, cuts, fades, dissolves.

1

Introduction and Previous Work

Multimedia information has been growing in many areas of application over the years. Thus, there is an increased demand in new technologies and tools for organization, indexing, search and retrieval of data to satisfy user needs. Temporal video segmentation is the preliminary step to get the visual and semantic information to describe scenes for proper indexing and searching. The problem of shot change detection in video sequences has been widely studied in the bibliography in the area of analysis multimedia. All the existing techniques are based on the fact that frames within the same scene preserve certain degree of similarity, whereas frames around the limits of a scene show an important change in the visual content. Thus, shot changes are detected when the distance between successive frames is higher than a given threshold. There are methods that use the values of the image for the dissimilarity between frames. Some of them, use local characteristics of the image, as the approach proposed by Nagasaka and Tanaka [1] which consists on comparing the dissimilarity between consecutive frames computing the diﬀerence of intensities. Others compare two images according to global features as the image histogram dissimilarity proposed in the work of Kasturi et al. [2]. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 264–271, 2010. c Springer-Verlag Berlin Heidelberg 2010

Color Video Segmentation by Dissimilarity Based on Edges

265

Previous methods are based on trivial features of images. There are other techniques focused on axes or edges which use more complex characteristics to get similarity measures between frames. Zabih et al. [3] proposed to obtain the similarity between frames by analyzing the edges of consecutive images. There are other approaches of this category as the work of Ardebilian et al. [4] which is based on comparing characteristic points or the approach proposed by Porter et al. [5] who use the correlation between frames. This work focuses on the dissimilarity measure based on edges, considering a method in the space domain and other one in the frequency domain. For each of them, besides the original version using grayscale frames, an approach to color extension is performed. Color versions improve some limitations of the original versions as the problems caused by changes in illumination. The selected methods employ a ﬁxed threshold for detection. However, data video are highly dependent on content, so it is very diﬃcult to establish an universal threshold for all the sequences. In this paper, it is proposed an adaptive threshold according to the sequence information. This paper is organized as follows. Section 2 details the steps for the shot change detection and the methods selected. Section 3 explains the contributions added to the original versions. Section 4 shows the experiments done and the results obtained. Finally, section 5 expounds the conclusions reached.

2

Temporal Video Segmentation Methods

The purpose of these methods is to detect changes of scene that presents a video sequence to be divided into a set of manageable and meaningful segments. The transitions between two scenes can be sudden or gradual. In the ﬁrst case, there is a total change of visual content from one frame to the next, so the detection is simple and easy. In the second one, the changes are gradual, occurring slowly over successive frames, so the detection is more complex. The following types are identiﬁed: Cut: Strong change between two consecutive frames where the last frame of a scene is directly followed by the ﬁrst frame of the next one. Dissolve: Gradual transition from one scene to another in which both frames are superimposed, so the last frame of the previous scene fades out as the ﬁrst of the new scene fades in. Fade: Special case of dissolve where a monochrome frame replaces the last frame of the previous scene or the ﬁrst frame of the next one. The ﬁrst step for shot change detection is to obtain successive frames composing the video sequence, on which the operations necessary to analyze the features of interest are performed. To determine whether changes are happening between consecutive frames is necessary to establish a distance metric called “dissimilarity”. This metric allows us to decide if the variation of visual content is enough to consider the existence of a scene change between these frames. The methods studied here are oriented to deﬁne the dissimilarity based on edges. After

266

L. Ramos et al.

obtaining the distances between consecutive frames, these measures are compared with a threshold to determine the existence of scene changes. 2.1

Dissimilarity on Space Domain

This approach proposed by Zabih et al. [3] is based on the idea that when a scene change happens, new edges appear far from the positions of the previous edges and old edges disappear far from the emerging ones. As shown in the diagram of Figure 1, this method takes as input two binary images E and E obtained from applying Canny edge detector [6] on two consecutive frames. Then, the images E and E are created, where each edge pixel from E and E’ is dilated by a radius r. This dilation allows small displacements between consecutive frames without interfering in the detection. An entering pixel is deﬁned as the edge pixel belonging to E , which appears far from the edge pixels in E. In the same way, an exiting pixel is deﬁned as the edge pixel belonging to E that disappears far from the edge pixels in E . Thus, shot changes can be detected by the number of entering and exiting pixels.

Fig. 1. Steps in the calculation of the dissimilarity on space domain

Considering only the pixels belonging to edges of these images, ρin and ρout are deﬁned by: ⎧ E[x,y]E [x,y] ⎨ρin = 1 − x,y ⎩ρ

out

=1−

x,y

E[x,y]

[x,y] E[x,y] x,y

E[x,y]E x,y

as the fraction of entering and exiting pixels, respectively. Shot transitions can be detected looking for peaks in ρ which is the maximum of these values. 2.2

Dissimilarity on Frequency Domain

The correlation is a measure of correspondence between two images. However, this operation in the spatial domain is too expensive. For that reason, this method computes correlation in the frequency domain. In the method proposed by Porter et al. [5], the ﬁrst step is to calculate the Fourier transform of two frames to get their representation in the frequency domain. Then, a high-pass ﬁlter is applied to each image to accentuate the

Color Video Segmentation by Dissimilarity Based on Edges

267

contributions from higher spatial frequencies, because the edges and other strong changes in an image are related to the high frequency components of its Fourier transform. After applying the high-pass ﬁlter, the normalized correlation is calculated by the following equation: ρ (ξ) =

F T −1 {F T x1 (ω) F T x∗2 (w)} | F T x1 (ω) |2 dω · |F T x2 (ω) |2 dω

(1)

where ξ and ω are the spatial and spatial frequency coordinate vectors, respectively, F T xi (ω) denotes the Fourier transform of a frame xi (ξ), ∗ is the complex conjugate and T F −1 denotes the inverse Fourier transform. The dissimilarity measure for two consecutive frames is obtained as 1 - d where d represents the maximum value between the correlation coeﬃcients, that is, the best match between two images. Figure 2 shows an example of the dissimilarity measures obtained between consecutive frames of a video sequence where the peaks represent scene changes.

Fig. 2. Dissimilarity measure between consecutive frames

3

Improved Extension to Color-Space

The original version of these methods calculates the dissimilarity in terms of intensity variations, obtained from the grayscale images. However, the methods have so many problems working with scenes that are too dark or bright. For that reason, an approach to color extension is developed for each method that identiﬁes those variations as a result from the diﬀerences in color, not in intensity. 3.1

Color Dissimilarity on Space Domain

The color version of this method consists on obtaining the color edges on space domain. As a ﬁrst approximation it was considered the use of the RGB model. The main idea is to get the edges separately for each channel, and then to obtain a global estimation of all of them. But the problem is that this estimation is not equivalent to get the edges directly in the space combination of the three components. As a solution, the ratios of Gevers [7] are used to capture all possible

268

L. Ramos et al.

color diﬀerences. This work adapts the original version of the method to detect edges according to color diﬀerences applying the thresholding stage of the Canny edge detector on these color ratios. This allows the calculation of perceptual color diﬀerences independently of the luminosity, providing a solution to the problems of lighting changes found in the grayscale version. These color ratios are deﬁned by the relation of the RGB components red, green and blue between two neighboring pixels, as the following equation: mi

− → − → − → − → C1k1 , C1k2 , C2k1 , C2k2

=

− →

− →

− →

− →

C1k1 C2k2 C1k2 C2k1

(2)

→ − → − where C1 , C2 ∈ {R, G, B}, and k1 and k2 are the coordinates in the image of two neighboring pixels. These ratios can be considered as the correspondence between these pixels in the image domain. To adapt this method to color ratios, it is used the ﬁnite diﬀerences between pixels along a particular direction and calculated between red-green, red-blue and green-blue channels to get the gradient of the Canny operator. Thus, the edge detector is more robust against changes in shading and lighting between consecutive frames of the same scene. 3.2

Color Dissimilarity on Frequency Domain

Just as in the previous method, the idea would be to obtain the correlation on frequency domain of each channel and then merge all together. However, this is not equivalent to measure the correlation directly in the space composition of the three dimensions. For that reason, once again, the direct use of the RGB model was discarded for this approximation. As the authors proposed, it is better to obtain the correlation in a frequency space. The Fourier transform works with complex numbers so the adaptation of this method is proposed performing the hue-saturation space as a complex number which is a chromatic subspace of HSV deﬁned by the following expression: b (x, y) = S (x, y) · eiH(x,y)

(3)

where the saturation is interpreted as the magnitude and hue as the phase of the complex. The interpretation of the hue and saturation as polar coordinates allows direct use this color space for Fourier transform, making easier the adaptation of this method. After that, the remaining steps of the method are common to the grayscale version. 3.3

Improved Adaptative Threshold

Shot changes are detected when the distance between successive frames is higher than a given threshold. It is very diﬃcult to establish a single threshold valid for all sequences because video data are quite variable. As Figure 3(a) shows, to determine a threshold too high means that not all shot changes are detected, meanwhile with low values for the threshold, frames not corresponding to scene changes are

Color Video Segmentation by Dissimilarity Based on Edges

269

included, increasing the rate of false positives. To obtain a better behaviour, it would be more suitable the use of a threshold calculated from the video characteristics, considering the variability between frames of diﬀerent scenes. The selected methods use a ﬁxed threshold which implies some limitations. In this work it is proposed the use of an adaptative threshold considering that shot changes correspond to outliers of the distribution of similarity measures. That is performed following the equation proposed by Kobla et al. [8]: Tl = μ + ασ

(4)

where α is a constant, and μ and σ are the mean and standard deviations of interframe diﬀerences. These values are computed dynamically on successive frames from the last scene change detected so this detection is based on the variability in the scene being computed each time. This contribution manages to increase the rate of true positives without increasing false detections, as it can be seen in Figure 3(b).

(a)

(b)

Fig. 3. Application examples of ﬁxed and adaptive threshold on a video sequence. (a) Fixed threshold (b) Adaptative threshold.

4

Results

In this section the results obtained of applying the methods with a video dataset are presented. This dataset must be heterogeneous in terms of types of transitions and must contain some common cases of segmentation error, such as fast-moving sequences, signiﬁcant lighting changes or gradual changes of diﬀerent lengths. The selected video dataset was taken from TRECVID [9], which provides videos designed speciﬁcally to study this kind of methods. The dataset contains a total of 1338 changes of scene of which 1078 are cuts, 40 are fades, 211 are dissolves and 9 are other types. The evaluation of the methodologies is based on quality criteria, taking the number of scene changes detected, undetected and false detections. The used metrics are recall, which measures the ability to detect all the scene changes, and precision, which measures the ability to detect only scene changes. The used methods present several parameters that need to be tuned. For that reason, it was needed to perform a tuning process on a subset of the database

270

L. Ramos et al. Table 1. Evaluation of the results obtained with the methods

Threshold Fixed Grayscale Adaptative Edges Fixed Color ratios Adaptative Fixed Grayscale Adaptative Correlation Fixed HSV Adaptative

Cuts 52.75% 58.21% 71.07% 78.43% 74.22% 78.90% 82.80% 92.41%

Recall Dissolves 14.73% 28.04% 21.64% 41.18% 15.33% 30.02% 42.81% 50.00%

Fades 71.50% 77.50% 93.54% 100.00% 86.91% 50.00% 94.29% 66.98%

Precision 41.91% 52.11% 57.29% 68.50% 65.75% 66.10% 78.70% 81.90%

ROC Curve

ROC Curve 1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 precision

precision

Recall 51.49% 59.01% 66.99% 70.38% 63.24% 64.17% 76.22% 82.60%

0.5

0.5

0.4

0.4

0.3

0.3 0.2

0.2

0.1

0.1

Grayscale with fixed threshold Color with adaptative threshold)

Grayscale with fixed threshold Color with adaptative threshold) 0

0 0

0.2

0.4

0.6 recall

(a)

0.8

1

0

0.2

0.4

0.6

0.8

1

recall

(b)

Fig. 4. ROC curve comparison between basic methods and improved ones. (a) Space domain (b) Frequency domain.

to obtain those values that gave the best results possible for a compromise between recall and precision. With this tuned parameter set, the experiments were extended to the entire database to test the behaviour of the methodologies. Table 1 shows the results obtained for each method in their versions of grayscale and color, using ﬁxed and adaptive threshold. The grayscale versions presented some problems in scenes too dark or with excessive changes in illumination. Our extensions using color ratios and HSV space, which are insensitive to changes in lighting, solved this problem improving performance, especially in the case of fades, as seen in the results. Moreover, the use of adaptive threshold according to the video data, thanks to a dynamic reconﬁguration considering the variability between consecutive frames, improves the results in terms of increasing the rate of true positives as well as reduces false detections. The worse results obtained with the frequency domain method with adaptative threshold in the case of fades could be due to low energy edges which causes the dissimilarity evolve slowly. That situation implies that the threshold cannot be adapted in time, so there are scene changes that are lost. It could be solved by applying a thresholding after the high pass ﬁlter to eliminate low energy edges.

Color Video Segmentation by Dissimilarity Based on Edges

271

This would be equivalent to the process of hysteresis of the edge detector in the method on space domain. The graphs of Figure 4 shows the ROC curves for the most basic versions of each of the methods, ie grayscale with ﬁxed threshold, together with improved color versions with adaptive thresholding. As it can be clearly seen, the best results of recall versus precision were obtained with the improved versions.

5

Conclusions

In this paper two representative methods of edges based detection were studied, considering one technique on space domain and another on frequency domain. For each of them, an extension to color space was made and a variation to the stage of thresholding was added. Previous studies working with grayscale frames only considered variations in intensity for the dissimilarity between frames which means some limitations in certain cases related to luminosity. In this work, an approach to color extension was performed for these methods considering the color representation more adapted to the characteristics of each one. Moreover, an adaptive threshold was proposed which, unlike the ﬁxed threshold suggested in the literature, is calculated dynamically depending on video data. All these approaches have been tested in a speciﬁcally selected database video. In the stage of experimentation it was found that these color extensions solved some problems found in the original versions and get better results. Furthermore, the adaptative threshold improves the results of detection in most cases increasing the success ratio and reducing false detections.

References 1. Nagasaka, A., Tanaka, Y.: Automatic video indexing and full-video search for object appearances. In: Visual Database Systems, IFIP Working Conference, October 1991, pp. 113–127 (1991) 2. Kasturi, R., Strayer, S.H., Gargi, U.: An evaluation of color histogram based methods in video indexing. In: International workshop on image databases and multimedia search, Amsterdam, The Netherlands, August 1996, vol. 9 (1996) 3. Zabih, R., Miller, J., Mai, K.: A feature-based algorithm for detecting and classifying production eﬀects. IEEE Trans. on Pattern Analysis and Machine Intelligence 7(2), 119–128 (1999) 4. Ardebilian, M., Tu, X., Chen, L.: Robust 3d clue-based video segmentation for video indexing. J. of visual communication and Image Representation 11(1), 58–79 (2000) 5. Porter, S.V., Mirmehdi, M., Thomas, B.T.: Detection and classiﬁcation of shot transitions (2000) 6. Canny, J.: A computational approach to edge-detection 8(6), 679–698 (1986) 7. Theo, G., Arnold, S.: Color-based object recognition. Pattern Recognition Journal 32, 453–464 (1999) 8. Kobla, V., DeMenthon, D., Doermann, D. (eds.): Special eﬀect edit detction using VideoTrails: a comparison with existing techniques. SPIE Conference on Storage and Retrieval for Image and Video Databases VII (1999) 9. Trec video retrieval evaluation, http://www-nlpir.nist.gov/projects/trecvid

Label Dependent Evolutionary Feature Weighting for Remote Sensing Data Daniel Mateos-Garc´ıa, Jorge Garc´ıa-Guti´errez, and Jos´e C. Riquelme-Santos Department of Computer Science, Avda. Reina Mercedes S/N, 41012 Seville (Spain) {mateosg,jgarcia,riquelme}@lsi.us.es http://www.lsi.us.es

Abstract. Nearest neighbour (NN) is a very common classiﬁer used to develop important remote sensing products like land use and land cover (LULC) maps. Evolutive computation has often been used to obtain feature weighting in order to improve the results of the NN. In this paper, a new algorithm based on evolutionary computation which has been called Label Dependent Feature Weighting (LDFW) is proposed. The LDFW method transforms the feature space assigning diﬀerent weights to every feature depending on each class. This multilevel feature weighting algorithm is tested on remote sensing data from fusion of sensors (LIDAR and orthophotography). The results show an improvement on the NN and resemble the results obtained with a neural network which is the best classiﬁer for the study area. Keywords: remote sensing, feature weighting, evolutionary computation, label dependence.

1

Introduction

Remote sensing is a very important discipline for many tasks like resource management, environmental monitoring, disaster response, etc. Since long time ago, machine learning techniques have been used to improve remote sensing performance and applicability. In addition, the use of active sensors like LIDAR (Light Detection and Ranging) has recently spread to improve the classical remote sensing products [1] which were mainly based on images. This fact involves a data complexity increase and makes machine learning even more important in order to extract meaningful information from remote sensing data. Remote sensing knowledge can be gathered in several products where land use and land cover (LULC) maps can be found as one of the most important. This product is based on a classiﬁcation of the terrain by means of its own morphologic or functional characteristics and it is a main tool to develop policies to manage the natural environment. An automatic pixel classiﬁcation which is generally supervised is usually the ﬁrst step to extract LULC maps from remote sensing data. Several techniques from machine learning have been used to develop LULC maps with satisfactory results, e.g., k-NN [2], Naive Bayes [3], SVM [4], etc. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 272–279, 2010. c Springer-Verlag Berlin Heidelberg 2010

Label Dependent Evolutionary Feature Weighting

273

Although machine learning validity has widely been proved in the remote sensing context, more research is needed in order to fulﬁll the standard requirements of many products from remote sensing and specially for LULC map development [5]. In this way, some researchers [6] have started to exploit optimization techniques (genetic algorithms) on their approaches showing that a weighted execution produces an improvement on the results. In addition, machine learning often applies evolutionary computation to search optimal weighting on both structural and functional aspects in order to improve the predictive models. From the standpoint of unsupervised learning, we can see some works that focus on the determination of weights for clustering algorithms. Generally, the considered model is the k-means algorithm and traditional evolutionary techniques [7] with diﬀerences in the ﬁtness function, which can be distance-based or even based on information given by a combination of diﬀerent algorithms. Additionally, there are basically three main areas of weighting application in supervised machine learning: support vector machines optimization, artiﬁcial neural networks (training and topology) and feature weighting. Thus, SVM kernel [8] or artiﬁcial neural networks [9] parameters can be optimized by means of genetic algorithms or genetic programming with good results. In this context, evolutionary algorithms are usually employed to ﬁnd a set of weights for the feature space, allowing greater accuracy in the classiﬁcation process [10]. A common individual encoding is a set of real values that represent the weights of each feature. The ﬁtness is deﬁned by the classiﬁcation process itself. Therefore, the search process can be viewed as a global task in which the optimal weights are considered with respect to the features regardless of the label that each instance belongs. In this work, a novel proposal of applying evolutionary algorithms to search optimal weights for each feature depending on the label is shown. Existing methods in the literature usually work in a global way, i.e., the same weights are applied to all features. In opposition, this work shows that the importance of each feature can depend on the class to predict. Thus, for the LULC maps development, the features provided by orthophotos may have more leverage to distinguish vegetation textures, and the features provided by LIDAR can discriminate better diﬀerent structures like buildings and roads since they include height measures. To the best of our knowledge, the application of this multiple weighting level has not been exploited enough and it can improve the results when a classical classiﬁer is applied on remote sensing data. In this way, a new evolutionary method based on distances and a double weighting level is described with three main objectives: – Improve the general quality of a well-known machine learning technique like the k-NN classiﬁer when it is applied on remote sensing data. – Obtain new information about what features are more important to classify each class by means of the study of the resulted weights per label. – Provide a new tool to develop high accuracy LULC maps from fusion of sensors (LIDAR and imagery).

274

D. Mateos-Garc´ıa, J. Garc´ıa-Guti´errez, and J.C. Riquelme-Santos

The rest of the paper is organized as follows. Section 2 describes the general process to select the feature weighting, highlighting the most interesting features of the applied evolutionary algorithm. The results achieved are shown in Section 3. Finally, Section 4 shows a summary of the conclusions and the future lines of work.

2 2.1

Method Data Description

The data for this study belongs to a geographical area in the north of Galizia (Spain) and it was obtained from the fusion of LIDAR and orthophotography information. LIDAR is an active sensor technology that measures properties of light (usually laser) to register distant targets. After a LIDAR ﬂight, a cloud point database is available in which for every point, it is possible to ﬁnd: spatial position(i.e., x, y and z coordinates), intensity of return, number of the return in a sequence (if a pulse caused multiple impacts), etc. This features and the RGB values in an orthophoto are used in this work to obtain statistics on which the instances for the model are based. A Digital Elevation Model (DEM) is needed to correct the height of objects. In this case, a DEM was extracted from the LIDAR data to make the correction. The orthophoto is used to extract features from the visible spectrum band. It was taken from the same area with similar weather conditions at the time of the LIDAR ﬂight acquisition. From the original data set, 500 instances are classiﬁed manually to build the training set. Every instance from the training set has a total of 61 basic statistics (average, variance, minimum, maximum, standard deviation, etc.) from ﬁve diﬀerent bands of the LIDAR and the image data: height, intensity, red band, green band and blue band; and 5 diﬀerent classes, one for each land type: road, farming land, middle vegetation, high vegetation and buildings. 2.2

Preprocess

Before the generation of the model, a preprocess has to be carried out. Thus, three diﬀerent ﬁlters are executed. First, every attribute missing value is replaced with the corresponding averaged value. Then, the data are standardized. Finally, a Correlation Feature Selection (CFS) method is applied in order to reduce the search space. With the 18 selected features already generated, the next phase is the execution of the evolutionary algorithm which is characterized in the next subsections. 2.3

Initial Population

The goal of the proposed evolutionary algorithm is to ﬁnd an optimal set of weights in order to apply a lineal transformation to the feature space depending on each label and to improve the overall classiﬁcation process. Thus, after the

Label Dependent Evolutionary Feature Weighting

275

evolutionary execution (see Fig. 1), a weight is obtained for each label and feature which is used to complete the classiﬁcation process in two steps: 1. The weights are applied to the training instances according to its label. 2. Given a test instance, the transformed nearest neighbour label is chosen to classify the test instance. To carry out this idea, the population representation is as follows: An individual is a matrix which represents the weights per label for every feature. Hence, there is a row for each label which has as many columns as features. The initial population is then a matrix of weights where every value is randomly chosen. 2.4

Fitness Function

As previously said, the training data consist of a matrix P with n rows (each one represents a pixel) and f columns (one per feature). A class label is assigned by using the label function to each point of P . For simplicity, we assume that the label is an integer between 1 and b. Thus, a point pi is a row of P and a vector of Rf such that label(pi) = l ∈ {1..b}. A transformation is given by a matrix of weights W (wij ), with b rows (number of diﬀerent labels), and f columns (number of features). Thus, pi is transformed through W in pi so that each feature is ”weighted” with a value depending on the class to which it belongs as follows: ∀j = 1...f, pij = wlabel(pi )j ∗ pij

(1)

As seen in Fig. 1, each training set P , is divided into n bags (3), so that the weights of the individual which is being evaluated are applied to n − 1 bags (5), and the remaining is used as initial test (6 et seq.). The transformation of each label is applied to each pixel of the test bag (6-8) and then, the nearest pixel from P is calculated (9). Once the point has been tested, it becomes part of P reinforcing the training (10). The label that makes the process return the shorter distance is chosen (12). If this label does not match the point test label, the ﬁtness will be increase (13, 14). Therefore, to calculate the ﬁtness function, the input parameters are a matrix P , the label function and a matrix W . The output is a function to measure the classiﬁcation error rate which is the objective function to be minimized. 2.5

Crossover and Mutation

The crossover operation for two individuals is applied to every corresponding row (ith row of an individual is crossed with the ith row of the other) since they have the same label. The roulette-wheel method is selected as the method to obtain the individuals to cross. Besides, two techniques have been selected for the generation of the new individuals: the uniform crossover and the BLX-α crossover [11]. The uniform crossover consists of the pick out of a gene from one parent at random. The BLX-α crossover is described as follows: if g1 and g2 are

276

D. Mateos-Garc´ıa, J. Garc´ıa-Guti´errez, and J.C. Riquelme-Santos

3 w11 · · · w1f 7 6 W is the matrix 4 ... . . . ... 5 wb1 · · · wbf 1: ﬁtness=0 2: for i = 1 to m do 3: We divide P into n bags: B1 , ..., Bn 4: for all bag Bk do 5: According to Equation 1, we apply the W transformation to every point from the remaining n − 1 bags, obtaining the set of points P 6: for all point pi in Bk do 7: for all label l ∈ {1..b} do 8: We construct the tranformed point pli so that plij = wlj ∗ pij 9: We calculate dl = minimum distance from pli to the points of P 10: We apply the W transformation to pi according to its label, and we add it to P 11: end for 12: We calculate the minimum from the distances dl . Let h ∈ {1..b}, the label of the point of P which makes dl . 13: if the label of pi = h then 14: f itness = f itness + 1 15: end if 16: end for 17: end for 18: end for 2

Fig. 1. Fitness function

the ith gene from each parent, the new gene is a real number randomly selected in the interval [Gmin − Iα, Gmax + Iα], where: α=positive real number, Gmax = max(g1 , g2 ), Gmin = min(g1 , g2 ), I = Gmax − Gmin The mutation operator has been deﬁned to increase or decrease the value of a weight according to a probability p. The increase or decrease is a random value Δ that satisﬁes: Δ = r/10z , where : r ∈ R : (0 ≤ r ≤ 1) and z ∈ Z : (0 ≤ z ≤ n)

3

Results

To assess the quality of our approach, a comparison among several classiﬁers is carried out. The classiﬁers Naive Bayes, Support Vector Machines (obtained

Label Dependent Evolutionary Feature Weighting

277

Table 1. Averaged error rate for each studied algorithm Algorithm

Accuracy

Naive Bayes

0.15

SMO

0.14

Nearest Neighbour

0.13

Neural Network

0.10

Nearest Neighbour LDFW

0.10

by Sequential Minimal Optimization), Artiﬁcial Neural Networks (Multilayer Perceptron) and Nearest Neighbour (NN) with and without LDFW are chosen to compare their performance. Every model is built using the WEKA software [12]. For the experiments, the LDFW evolutionary algorithm is set up with the following parameters: a population of 20 individuals, 100 generations, a 10% of elitism and a 20% of mutation probability. To establish a fair comparison among the performances of the diﬀerent algorithms, a stratified n-fold cross-validation method is used. Concretely, three 10-fold cross-validation with diﬀerent random seeds are executed and the results for each fold are then registered. In Tab. 1, the overall error rate for each algorithm can be seen. In order to evaluate the statistical signiﬁcance of the measured diﬀerences in algorithm ranks, we use a method for comparing classiﬁers across multiple data sets. In this case, there is only one data set since remote sensing data has high costs to be obtained. Thus, the set of measures are the partial results of the previous 10-fold cross-validation (30 measures for each classiﬁer) and the Friedman test is selected to analyze those measures. The Friedman test is a nonparametric statistical test which evaluates the diﬀerences among more than two related sample means. The null hypothesis is that every classiﬁer performs the same, regardless the diﬀerences among the registered results. In Equation 2, the statistic used can be seen. XF2 =

2 12n 2 k(k + 1) ( ) rj − k(k + 1) j 4

(2)

The Friedman test checks whether the average ranks are signiﬁcantly diﬀerent from the mean rank r = 2.5 expected under the null hypothesis. Leaning on a statistical package (MATLAB), p value for the Friedman test has resulted on a value of 7.0351E − 7 so the null hypothesis is rejected and the measured average ranks are signiﬁcantly diﬀerent (at α = 0.05 ). With this in mind, the results show that the performance of the Nearest Neighbour with LDFW is very similar to the resulted from the neural network which is the best classiﬁer for the study area. However, as will be seen later, the

278

D. Mateos-Garc´ıa, J. Garc´ıa-Guti´errez, and J.C. Riquelme-Santos Table 2. Most important features according to its weight for the study zone

Class

Features

Road

MINSNDVI

PEC

IMAX

HCV

Farming Land

IMEAN

IGMEAN

HMAX

HCV

Middle Vegetation

HSTD

IGKURT

MINSNDVI

IGVAR

High Vegetation

IMAX

IRVAR

IGVAR

IGMEAN

Buildings

IGKURT

PCT32

EMP

MINSNDVI

H*: height statistic, I*: Intensity statistic, IG*: Intensity green band stat., IR*: Intensity red band stat., *SNDVI: Simulated Normalized Diﬀerence Vegetation Index stat., PEC: Penetration coef., PCT32: Percentage third or later returns over second returns.

LDFW technique provides a descriptive information about the most important features per class. Neural networks supply an approximation about the features importance too, but in a much less explicit manner. The rest of the classiﬁers show a lower accuracy. If the LDFW-NN is compared with the classical NN, it results in a 3% of improvement. In Tab. 2, the importance of the feature weighting according to the label can be seen. After the application of the LDFW, every class has its own set of features that determines its label the best. This information provides a very important feature selection tool and allows us to establish a more accurate class separation; e.g., vegetation classes are principally determined by the orthophoto, specially by the features that correspond to the green band (IG features) whilst roads are characterized better by LIDAR features.

4

Conclusions

In this paper, a new algorithm based on evolutionary computation which was called Label Dependent Feature Weighting (LDFW) was proposed. The LDFW method transforms the feature space assigning diﬀerent weights to every feature depending on each class. This multilevel feature weighting algorithm was tested on remote sensing data from fusion of sensors (LIDAR and orthophotography) in order to improve a NN which is a very used classiﬁer in the context of the LULC map development. The results showed an improvement of the 3% on the NN and resemble the results obtained with a neural network which was the best classiﬁer for the study area. Additionally, the LDFW was able to provide qualitative and quantitative information about the importance of each feature in order to distinguish among the diﬀerent classes. In future work, the use of other measures like entropy in lieu of distance will be a very interesting way to improve the results and should be taken into account. In addition, diﬀerent transformation functions on the attributes which, at the

Label Dependent Evolutionary Feature Weighting

279

moment, are limited to linear kernels should be explored. Finally, the deﬁnition of this algorithm as an independent preprocess method is a primary objective so that more complex classiﬁers like ensembles could be tested.

References 1. Erdody, T., Moskal, L.: Fusion of LIDAR and imagery for estimating forest canopy fuels. Remote Sensing of Environment (to appear, 2010) 2. Atkinson, P.M.: Spatially weighted supervised classiﬁcation for remote sensing. International Journal of Applied Earth Observation and Geoinformation 5(4), 277– 291 (2004) 3. Bork, E.W., Su, J.G.: Integrating lidar data and multispectral imagery for enhanced classiﬁcation of rangeland vegetation: A meta analysis. Remote Sensing of Environment 111(1), 11–24 (2007) 4. Mazzoni, D., Garay, M.J., Davies, R., Nelson, D.: An operational misr pixel classiﬁer using support vector machines. Remote Sensing of Environment 107(1-2), 149–158 (2007) 5. Shao, G., Wu, J.: On the accuracy of landscape pattern analysis using remote sensing data. Landscape Ecology (23), 505–511 (2008) 6. Tomppo, E.O., Gagliano, C., Natale, F.D., Katila, M., McRoberts, R.E.: Predicting categorical forest variables using an improved k-nearest neighbour estimator and landsat imagery. Remote Sensing of Environment (113), 500–517 (2009) 7. Krishna, K., Narasimha Murty, M.: Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 29(3), 433–439 (2002) 8. Howley, T., Madden, M.G.: The genetic kernel support vector machine: Description and evaluation. Artiﬁcial Intelligence Review 24, 379–395 (2005) 9. Herv´ as-Mart´ınez, C., Mart´ınez-Estudillo, F., Carbonero-Ruz, M.: Multilogistic regression by means of evolutionary product-unit neural networks. Neural Networks 21(7), 951–961 (2008) 10. Komosinski, M., Krawiec, K.: Evolutionary weighting of image features for diagnosing of CNS tumors. Artiﬁcial Intelligence in Medicine 19(1), 25–38 (2000) 11. Eshelman, L.J., Schaﬀer, J.D.: Real-coded genetic algorithms and intervalschemata. In: Whitley, D.L. (ed.) Foundation of Genetic Algorithms 2, pp. 187–202. Morgan Kaufmann, San Mateo (1993) 12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1) (2009)

Evolutionary q-Gaussian Radial Basis Functions for Binary-Classification F. Fern´ andez-Navarro1, C. Herv´ as-Mart´ınez1, P.A. Guti´errez1, 1 M. Cruz-Ram´ırez , and M. Carbonero-Ruz2 1 2

Department of Computer Science and Numerical Analysis, University of Cordoba, ordoba, Spain Rabanales Campus, Albert Einstein building 3 ﬂoor, 14071, C´ Department of Management and Quantitative Methods, ETEA, Escritor Castilla Aguayo 4, 14005, Cordoba, Spain

Abstract. This paper proposes a Radial Basis Function Neural Network (RBFNN) which reproduces diﬀerent Radial Basis Functions (RBFs) by means a real parameter q, named q-Gaussian RBFNN. The architecture, weights and node topology are learnt through a Hybrid Algorithm (HA) with the iRprop+ algorithm as the local improvement procedure. In order to test its overall performance, an experimental study with eleven datasets, taken from the UCI repository is presented. The RBFNN with the q-Gaussian is compared to RBFNN with Gaussian, Cauchy and Inverse Multiquadratic RBFs.

1

Introduction

Diﬀerent types of neural networks, are being used for classiﬁcation purposes [1], including, among others: Multilayer Perceptron Neural Networks (MLPNN) where the transfer functions are Sigmoidal Unit Basis Functions; Radial Basis Function Neural Networks (RBFNNs) with kernel functions where the transfer functions are usually Gaussian [2]; Product Unit Neural Networks (PUNNs)[3] with multiplicative units, or Neural Network where the hidden layer is composed by a mixture of basis functions [4]. We focus on RBFNNs which have been succesfully employed in diﬀerent pattern recognition problems in the last years [5]. There are several common types of functions used as the transfer functions, for example, the standard Gaussian 1 (SRBF), φ(z) = e−z , the Multiquadratic (MRBF), φ(z) = (1 + z) 2 , the In1 verse Multiquadratic (IMRBF), φ(z) = (1 + z)− 2 , and the Cauchy (CRBF), −1 φ(z) = (1 + z) . In the output layer, the activations of the hidden units are combined in order to produce a classiﬁcation of the input pattern. In this study, we investigate the q-Gaussian RBFNN, which can reproduce diﬀerent RBFs, by changing a real parameter q. A Hybrid Algorithm (HA) is employed to select the parameters of the Radial Basis Functions (RBFs): the number of hidden nodes and the centers, width and the value of the parameter q of each q-Gaussian RBFNN of the population. This paper is organized as follows: a brief analysis of some works related with the models proposed is given in Section 2; Section 3 describes base classiﬁer E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 280–287, 2010. Springer-Verlag Berlin Heidelberg 2010

Evolutionary q-Gaussian Radial Basis Functions for Binary-Classiﬁcation

281

applied to binary-classiﬁcation problems; In section 4, a methodology to optimize the RBF parameters based on Hybrid Algorithms is presented ; Section 5 explains the experiments carried out; and ﬁnally, Section 6 summarizes the conclusions of our work.

2

Related Works

A RBFNN is a three-layer feed-forward Neural Network. Let the number of nodes of the input layer, of the hidden layer and of the output layer be p, m and 1 respectively. For any sample x = [x1 , x2 , . . . , xp ], the output of the RBFNN is f (x). The model of a RBFNN can be described with the following equation: f (x) = β0 +

m

βi · φi (di (x))

(1)

i=1

where φi (di (x)) is a non-linear mapping from the input layer to the hidden layer, β = (β1 , β2 , . . . , βm ) is the connection weight between the hidden layer and the output layer, β0 is the bias. The function di (x) can be deﬁned as: di (x) =

x − ci 2 θi2

(2)

where θi is the scalar parameter that deﬁnes the width for the i-th radial unit, . represents the Euclidean norm and ci = [c1 , c2 , . . . , cp ] the centers of the RBFs. The standard RBF (SRBF) is the Gaussian function, which is given by: φi (di (x)) = e−di (x) ,

(3)

The radial basis function φi (di (x)) can take diﬀerent forms, including the Cauchy RBF (CRBF) deﬁned by: φi (di (x)) =

1 1 + di (x)

(4)

and the Inverse Multiquadratic RBF (IMRBF), given by: φi (di (x)) =

1 1

(1 + di (x)) 2

(5)

Fig. 1 ilustrates the inﬂuence of the choice of the RBF in the hidden unit activation. One can observe that the Gaussian function presents a higher activation close to the radial unit center than the other two RBFs. In this paper, we propose the use of the q-Gaussian function as RBF. The q-Gaussian can be deﬁned as: 1 (1 − (1 − q)di (x)) 1−q if (1 − (1 − q)di (x)) ≥ 0 φi (di (x)) = (6) 0 Otherwise.

282

F. Fern´ andez-Navarro et al.

1

1 RBFs SRBF CRBF IMRBF

0.9

q-Values q=0.25 q=0.75 q=1.00 q=1.50 q=2.00 q=3.00 q=4.00

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 -4

-3

-2

-1

0

1

2

3

4

-4

-3

-2

(a)

-1

0

1

2

3

4

(b)

Fig. 1. Radial unit activation in one-dimensional space with c = 0 and θ = 1 for diﬀerent RBFs: (a) SBRF, CRBF and IMRBF and (b) q-Gaussian with diﬀerent values of q

The q-Gaussian can reproduce diﬀerent RBFs for diﬀerent values of the real parameter q. As an example, when the q parameter is close to 2, the q-Gaussian is the CRBF, for q = 3, the activation of a radial unit with an IMRBF for di (x) turns out to be equal to the activation of a radial unit with a q-Gaussian RBF for di (x) and, ﬁnally, when the value of q converges to 1, the q-Gaussian converges to the Gaussian function (SRBF). Fig. 1b presents the radial unit activation for the q-Gaussian RBF for diﬀerent values of q. As we can see in Fig. 1b, a small change in the value of q represents a smooth modiﬁcation on the shape of the RBF.

3

q-Gaussian RBF for Classification

To apply evolutionary neural network techniques, we consider a RBFNNs with softmax outputs and the standard structure: an input layer with a node for every input variable; a hidden layer with several RBFs; and an output layer with 1 node. There are no connections between the nodes of a layer and none between the input and output layers either. The activation function of the i-th node in the hidden layer (φi (di (x))) is given by Eq. 6 and the activation function of the output node (f (x)) is deﬁned in Eq 1. The transfer function of all output nodes is the identity function. In this work, the outputs of the neurons are interpreted from the point of view of probability through the use of the softmax activation function. g(x) =

exp f (x) 1 + exp f (x)

(7)

where g(x) is the probability that a pattern x belongs to class 1. The probability a pattern x has of belonging to class 2 is 1 − g(x). The error surface associated with the model is very convoluted. Thus, the parameters of the RBFNNs are estimated by means of a HA (detailed in Section

Evolutionary q-Gaussian Radial Basis Functions for Binary-Classiﬁcation

283

1: Hybrid Algorithm: 2: Generate a random population of size N 3: repeat 4: Calculate the ﬁtness of every individual in the population 5: Rank the individuals with respect to their ﬁtness 6: The best individual is copied into the new population 7: The best 10% of population individuals are replicated and they substitute the worst 10% of individuals 8: Apply parametric mutation to the best (pm )% of individuals 9: Apply structural mutation to the remaining (100 − pm )% of individuals 10: until the stopping criterion is fulﬁlled 11: Apply iRprop+ to the best solution obtained by the EA in the last generation. Fig. 2. Hybrid Algorithm (HA) framework

4). The HA was developed to optimize the error function given by the negative log-likelihood for N observations, which is deﬁned for a classiﬁer g: l(g) =

1 N

N

n=1 [−yn f (xn )+ + log exp f (xn )] .

(8)

where yn is the class that the pattern n belongs to.

4

Hybrid Algorithm

The basic framework of the HA is the following: the search begins with an initial population of RBFNNs and, in each iteration, the population is updated using a population-update algorithm which evolves both its structure and weights. The population is subject to operations of replication and mutation. Figure 2 describes the procedure to select the parameters of the radial units. The main characteristics of the algorithm are the following: 1. Representation of the Individuals. The algorithm evolves architectures and connection weights simultaneously, each individual being a fully speciﬁed RBFNN. The neural networks are represented using an object-oriented approach and the algorithm deals directly with the RBFNN phenotype. 2. Error and Fitness Functions. We consider l(g) (see Eq. 7) as the error function of an individual g of the population. The ﬁtness measure needed for evaluating the individuals is a strictly decreasing transformation of the error 1 , where 0 < A(g) ≤ 1 . function l(g) given by A(g) = 1+l(g) 3. Initialization of the Population. The initial population is generated trying to obtain RBFNNs with the maximun possible ﬁtness. First, 5, 000 random RBFNNs are generated. The centers of the radial units are ﬁrstly deﬁned by the k-means algorithm for diﬀerent values of k, where k ∈ [Mmin , Mmax ], being Mmin and Mmax the minimum and maximum number of hidden nodes allowed for any RBFNN model in the HA. The widths of the RBFNNs

284

F. Fern´ andez-Navarro et al.

are initialized to the geometric mean of the distance to the two nearest neighbourhoods and the q parameter to values near to 1, since when q → 1 the q-Gaussian reduces to the standard Gaussian RBFNN. A random value in the [−I, I] interval is assigned for the weights between the hidden layer and the output layer. The obtained individuals are evaluated using the ﬁtness function and the initial population is ﬁnally obtained by selecting the best 500 RBFNNs. 4. Structural Mutation. Structural mutation implies a modiﬁcation in the structure of the RBFNNs and allows the exploration of diﬀerent regions in the search space, helping to keep the diversity of the population. There are four diﬀerent structural mutations: hidden node addition, hidden node deletion, connection addition and connection deletion. These four mutations are applied sequentially to each network, each one with a speciﬁc probability. If the structural mutator adds a new node in the RBFNN, the q parameter is assigned to a γ value, where γ ∈ [0.75, 1.25], since when q → 1 the q-Gaussian reproduce to the SRBF. 5. Parametric Mutation. Diﬀerent weight mutations are applied: – Centre, Radii and q Mutation. These parameters are modiﬁed in the following way: • Centre creep. The value of each centre is modiﬁed by adding a Gaussian noise, cji (t + 1) = cji (t) + ξ(t), where ξ(t) ∈ N (cji , ri ) and N (cji , ri ) represents a one-dimensional normally distributed random variable with mean cji and with variance the radius of the RBF hidden node. • Radius creep. The value of each radii is modiﬁed by adding another Gaussian noise, ri (t + 1) = ri (t) + ξ(t), where ξ(t) ∈ N (ri , d) and N (ri , d) represents a one-dimensional normally distributed random variable with mean ri and with variance the width of the range of each dimension (d). • Mutation of the q parameter. The q parameter is updated by adding an ε value, where ε ∈ [−0.25, 0.25], since the modiﬁcation of the qGaussian RBFNN is very sensible to q variation (as we can see in Fig. 1b). – Output-to-Hidden Node Connection Mutations [3]. These connections are modiﬁed by adding another Gaussian noise, w(t+1) = w(t)+ξ(t), where ξ(t) ∈ N (0, T (g)) and N (0, T (g)) represents a one-dimensional normally distributed random variable with mean 0 and variance the network temperature (T (g) = 1 − A(g)). 6. iRprop+ Local Optimizer. The local optimization algorithm used in our paper is the iRprop+ [6] optimization method. The iRprop+ is believed to be a fast and robust learning algorithm. This algorithm applies a backtracking strategy (i.e. it decides whether to take a step back along a weight direction or not by means of a heuristic). In the proposed methodology, we run the EA and then apply the local optimization algorithm to the best solution obtained by the EA in the last generation.

Evolutionary q-Gaussian Radial Basis Functions for Binary-Classiﬁcation

285

Table 1. Characteristics of the eleven datasets used for the experiments: number of instances (Size), number of Real (R), Binary (B) and Nominal (N) input variables, total number of inputs (#In.), number of classes (#Out.), per-class distribution of the instances (Distribution), minimum and maximum number of hidden nodes used for each dataset ([Mmin , Mmax ]) and the number of generations (#Gen.) Dataset Size R B N In Out Distribution [Mmin , Mmax ] Gen Labor 57 8 3 5 29 2 (30, 27) [2, 5] 20 Promoters 106 − − 57 114 2 (53, 53) [2, 5] 100 Hepatitis 155 6 13 − 19 2 (32, 123) [2, 5] 20 Sonar 208 60 − − 60 2 (98, 110) [2, 5] 40 Heart 270 13 − − 13 2 (150, 120) [2, 5] 100 BreastC 286 4 3 2 15 2 (201, 85) [2, 5] 40 Heart-C 302 6 3 4 26 2 (164, 138) [2, 5] 100 Liver 345 6 − − 6 2 (145, 200) [2, 5] 40 Vote 435 − 16 − 16 2 (267, 168) [2, 5] 20 Card 690 6 4 5 51 2 (307, 308) [4, 7] 40 German 1000 6 3 11 61 2 (700, 300) [1, 3] 200 All nominal variables are transformed to binary variables. BreastC: Breast-Cancer; Heart-C: Heart-disease (Cleveland).

5 5.1

Experiments Experimental Design

The proposed methodologies are applied to eleven datasets taken from the UCI repository [7], to test its overall performance when compared to other radial basis functions (SRBF, CRBF and the Inverse Multiquadratic RBF (IMRBF)). The selected datasets present diﬀerent numbers of instances and features (see Table 1). The experimental design was conducted using a 10-fold cross validation, with 10 repetitions per each fold. The performance of each method has been evaluated using the correct classiﬁcation rate in the generalization set (CG ). All the parameters used in the evolutionary algorithm except the maximun and minimun number of RBFs in the hidden layer and the number of generations have the same values in all problems analyzed below (see Table 1). We have done a simple linear rescaling of the input variables in the interval [−2, 2], Xi∗ being the transformed variables. The connections between hidden and output layer are initialized in the [−5, 5] interval (i.e. [−I, I] = [−5, 5]). The size of the population is N = 500. For the structural mutation, the number of nodes that can be added or removed is within the [1, 2] interval, and the number of connections to add or delete in the hidden and the output layer during structural mutations is within the [1, 7] interval.

286

F. Fern´ andez-Navarro et al.

Table 2. Comparison of the proposed basis functions to other basis functions: Mean and Standard Deviation (SD) of the accuracy results (CG (%)) from 100 executions, mean accuracy (C G (%)), mean ranking (R), p-Value and α for the Hommel post-hoc non-parametric tests in CG with α = 0.1 (q-Gaussian is the control method) Method(CG (%)) SRBF CRBF IMRBF Labor 91.33 ± 12.09 95.00 ± 11.24 91.66 ± 8.78 Promoters 75.54 ± 13.56 80.18 ± 6.66 81 .09 ± 8 .69 Hepatitis 86.33 ± 8.09 83.16 ± 7.15 85.12 ± 7.52 Sonar 78.38 ± 9.03 74.09 ± 10.20 76.02 ± 11.16 Heart 81.85 ± 8.97 83.70 ± 8.76 84.81 ± 8.45 BreastC 72.04 ± 6.39 71.35 ± 8.00 73.10 ± 6.39 Heart-C 85.44 ± 3.83 85.45 ± 5.59 85 .77 ± 3 .05 65.52 ± 6.31 Liver 68 .41 ± 5 .15 65.23 ± 8.23 Vote 96.32 ± 3.97 95.39 ± 3.59 94.94 ± 2.36 Card 86.08 ± 3.14 86 .52 ± 3 .55 85.94 ± 3.80 German 74.80 ± 3.82 74 .90 ± 3 .17 74.40 ± 2.50 C G (%) 81.50 81.36 81.67 R 2.72 2.99 2.72 p-Value 0.03 0.00 0.03 αHommel 0.10 0.03 0.05 The best result is in bold face and the second best result

5.2

q-Gaussian 93 .33 ± 11 .65 84.00 ± 6.15 85 .30 ± 7 .54 76 .04 ± 13 .56 84 .07 ± 7 .20 73 .06 ± 6 .77 85.79 ± 5.20 71.30 ± 6.50 96 .08 ± 3 .45 87.87 ± 0.37 75.25 ± 2.98 82.91 1.54 in italics.

Comparison to Other Radial Basis Functions

In Table 2, the mean and the standard deviation of the correct classiﬁcation rate in the generalization set (CG ) is shown for each dataset and a total of 100 executions. From the analysis of the results, it can be concluded, from a purely descriptive point of view, that the q-Gaussian model obtained the best results for ﬁve datasets, the SRBF achieved the best performace for three dataset and the CRBF and IMRBF methods yield the higher performance for one and two datasets respectively. To determine the statistical signiﬁcance of the rank diﬀerences observed for each method in the diﬀerent datasets, we have carried out a non-parametric Friedman test [8] with the ranking of CG of the best models as the test variable (since a previous evaluation of the CG values results in rejecting the normality and the equality of variances’ hypothesis). The test shows that the eﬀect of the method used for classiﬁcation is statistically signiﬁcant at a signiﬁcance level of 10%, as the conﬁdence interval is C0 = (0, F0.10 = 2.89) and the F-distribution / C0 for CG . Consequently, we reject the nullstatistical values are F ∗ = 8.34 ∈ hypothesis stating that all algorithms perform equally in mean ranking. Based on this rejection, the Hommel post-hoc test is used to compare all classiﬁer to each other. The Hommel test was applied with the best performing model (q-Gaussian) as the control method. The results of the Hommel tests for α = 0.10 can be seen in Table 2, using the corresponding p and αHommel values.

Evolutionary q-Gaussian Radial Basis Functions for Binary-Classiﬁcation

287

From the results of these tests, it can be concluded that the q-Gaussian model obtains a signiﬁcantly higher ranking of CG when compared to the remaining of RBFs, which justiﬁes the proposal.

6

Conclusions

In this paper, we have proposed a new approach to determine the optimized parameters for the q-Gaussian RBF applied to multi classiﬁcation problems. These models have been designed with a HA constructed speciﬁcally to take into account the characteristics of this kernel model. The evaluation of the model and the algorithm for the eleven datasets considered, showed that the q-Gaussian RBF obtained a higher accuracy when compared to the remaining RBFs. Finally, some suggestions for future research are the following: to study other radial basis functions and to adapt the algorithm to deal with multi-class problems.

Acknowledgement This work has been partially subsidized by the TIN 2008-06681-C06-03 project of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT), FEDER funds and the P08-TIC-3745 project of the “Junta de Andaluc´ıa” (Spain). The research of Francisco Fern´ andez-Navarro has been funded by the “Junta de Andalucia” Predoctoral Program, grant reference P08-TIC-3745.

References 1. Lippmann, R.: Pattern classiﬁcation using neural networks. IEEE Communications Magazine 27, 47–64 (1989) 2. Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, Oxford (1996) 3. Mart´ınez-Estudillo, F.J., Herv´ as-Mart´ınez, C., Guti´errez, P.A., Mart´ınez-Estudillo, A.C.: Evolutionary product-unit neural networks classiﬁers. Neurocomputing 72(12), 548–561 (2008) 4. Guti´errez, P.A., Herv´ as-Mart´ınez, C., Carbonero, M., Fern´ andez, J.C.: Combined Projection and Kernel Basis Functions for Classiﬁcation in Evolutionary Neural Networks. Neurocomputing 72(13-15), 2731–2742 (2009) 5. Freeman, J.A.S., Saad, D.: Learning and generalization in radial basis function networks. Neural Computation 7(5), 1000–1020 (1995) 6. Igel, C., H¨ usken, M.: Empirical evaluation of the improved rprop learning algorithms. Neurocomputing 50(6), 105–123 (2003) 7. Asuncion, A., Newman, D.: UCI machine learning repository (2007) 8. Friedman, M.: A comparison of alternative tests of signiﬁcance for the problem of m rankings. Annals of Mathematical Statistics 11(1), 86–92 (1940)

Evolutionary Learning Using a Sensitivity-Accuracy Approach for Classification Javier S´ anchez-Monedero1, , C. Herv´ as-Mart´ınez1, F.J. Mart´ınez-Estudillo2, 2 Mariano Carbonero Ruz , M.C. Ram´ırez Moreno2 , and M. Cruz-Ram´ırez1 1

2

Department of Computer Science and Numerical Analysis, University of C´ ordoba, Spain Department of Management and Quantitative Methods, ETEA, Spain

Abstract. Accuracy alone is insuﬃcient to evaluate the performance of a classiﬁer especially when the number of classes increases. This paper proposes an approach to deal with multi-class problems based on Accuracy (C) and Sensitivity (S). We use the diﬀerential evolution algorithm and the ELM-algorithm (Extreme Learning Machine) to obtain multi-classiﬁers with a high classiﬁcation rate level in the global dataset with an acceptable level of accuracy for each class. This methodology is applied to solve four benchmark classiﬁcation problems and obtains promising results.

1

Introduction

To evaluate a classiﬁer, the machine learning community has traditionally used the correct classiﬁcation rate or accuracy to measure its default performance. In the same way, accuracy has been frequently used as the ﬁtness function in evolutionary algorithms when solving classiﬁcation problems. However, the pitfalls of using accuracy have been pointed out by several authors [1]. Actually, it is enough to simply realize that accuracy cannot capture all the diﬀerent behavioural aspects found in two diﬀerent classiﬁers. Assuming that all misclassiﬁcations are equally costly and that there is no penalty for a correct classiﬁcation, we start from the premise that a good classiﬁer should combine a high classiﬁcation rate level in the testing set with an acceptable level for each class. Concretely, we consider traditionally used accuracy (C) and the minimum of the sensitivities of all classes (S), that is, the lowest percentage of examples correctly predicted as belonging to each class with respect to the total number of examples in the corresponding class. Recently, in [2], Huang et al. proposed an original algorithm called extreme learning machine (ELM) which randomly chooses hidden nodes and analytically determines (by using Moore-Penrose generalized inverse) the output weights of the network. The algorithm tends to provide good testing performance at

Corresponding author at: E-mail address: [email protected] Phone: +34-957218349. This work has been partially subsidized by TIN 2008-06681-C06-03 (MICYT), FEDER funds and the P08-TIC-3745 project of the “Junta de Andaluc´ıa” (Spain).

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 288–295, 2010. c Springer-Verlag Berlin Heidelberg 2010

Evolutionary Learning Using Sensitivity-Accuracy Approach

289

an extremely fast learning speed. However, ELM may need a higher number of hidden nodes due to the random determination of the input weights and hidden biases. In [3], a hybrid algorithm called Evolutionary ELM (E-ELM) was proposed by using the diﬀerential evolution algorithm [4]. The experimental results obtained show that this approach reduces the number of hidden nodes and obtains more compact networks. In this paper, the simultaneous optimization of accuracy and sensitivity is carried out by means of the E-ELM algorithm combination. The key point of the algorithm is the ﬁtness function considered, as the convex linear combination of accuracy and sensitivity, which tries to achieve a good balance between the classiﬁcation rate level in the global dataset and an acceptable level for each class. The base classiﬁer considered is the standard multilayer perceptron (MLP) neural network. The paper is structured as follows. First, we present our approach based on the sensitivity versus accuracy pair (S, C). The third section contains the evolutionary approach. Finally, the paper concludes with an analysis of the results obtained in four benchmark classiﬁcation problems.

2

Accuracy and Sensitivity

We consider a classiﬁcation problem with Q classes and N training or testing patterns with as a classiﬁer obtaining a Q × Q contingency or confusion matrix g Q M (g) = nij ; i,j=1 nij = N where nij represents the number of times the patterns are predicted by classiﬁer g to be in class j when they really belong to class i. Q Let us denote the number of patterns associated with class i by fi = j=1 nij , i = 1, . . . , Q. We start by deﬁning two scalar measures that take the elements of the confusion matrix into consideration from diﬀerent points of view. Let Si = nii /fi be the number of patterns correctly predicted to be in class i with respect to the total number of patterns in i (sensitivity for class i). Therefore, the sensitivity for class i estimates the probability of correctly predicting a class i example. From the above quantities we deﬁne the sensitivity S of the classiﬁer as the minimum value of the sensitivities for each class, S = min {Si ; i = 1, . . . , Q}. We deﬁne the Correct Classiﬁcation Rate or Accuracy, C = (1/N ) Q j=1 njj , which is the rate of all the correct predictions. Speciﬁcally, we consider the two-dimensional measure (S, C) associated with classiﬁer g. The measure tries to evaluate two features of a classiﬁer: global performance and the performance in each class. We represent S on the horizontal axis and C on the vertical axis. One point in (S, C) space dominates another if it is above and to the right, i.e. it has more accuracy and greater sensitivity. straightforward It is straightforward to prove the following relationship between C and S (see [5]). Let us consider a Q-class classiﬁcation problem. Let C and S be respectively the accuracy and sensitivity associated with a classiﬁer g, then S ≤ C ≤ 1 − (1 − S) p∗ , where p∗ = fQ /N is the minimum of the estimated prior probabilities.

290

J. S´ anchez-Monedero et al.

Therefore, each classiﬁer will be represented as a point outside the shaded region in Fig. 2 (Fig. 2 is built from experimental data, see Section 4.2). Several points in (S, C) space are important to note. The lower left point (0, 0) represents the worst classiﬁer and the optimum classiﬁer is located at the (1, 1) point. Furthermore, the points on the vertical axis correspond to classiﬁers that are not able to predict any point in a concrete class correctly. Note that it is possible to ﬁnd among them classiﬁers with a high level of C, particularly in problems with small p∗ [6]. Our objective is to build an evolutionary algorithm that tries to move the classiﬁer population towards the optimum classiﬁer located in the (1, 1) point in the (S, C) space. We think an evolutionary algorithm could be an adequate scheme allowing us to improve the quality of the classiﬁers, measured in terms of C and S, directing the solutions towards the (1, 1) point.

3 3.1

The Proposed Method Diﬀerential Evolution and Extreme Learning Machine

Let us consider the training set given by N samples D = {(xj , yj ) : xj ∈ RK , yj ∈ RQ , j = 1, 2, . . . , N }, where xj is a k × 1 input vector and yj is a Q × 1 target vector. Let us consider the MLP with M nodes in the hidden layer given by f = (f1 , f2 , . . . , fQ ): fl (x, θ l ) = β0l +

M

j=1

βjl σj (x, wj ), l = 1, 2, . . . , Q

where θ = (θ1 , . . . , θJ )T is the transpose matrix containing all the neural net l , w1 , . . . , wM ) is the vector of weights of the l outweights, θl = (β0l , β1l , . . . , βM put node,wj = (w1j , . . . , wKj ) is the vector of weights of the connections between the input layer and the jth hidden node, Q is the number of classes in the problem, M is the number of sigmoidal units in the hidden layer, x is the input pattern and σj (x, wj ) the sigmoidal function. Suppose we are training a MLP with M -nodes in the hidden layer to learn the N samples of set D. The linear system f (xj ) = yj , j = 1, 2, . . . , N , can be written in a more compact format as Hβ = Y, where H is the hidden layer output matrix of the network. The ELM algorithm randomly selects the wj = (w1j , . . . , wKj ) weights and biases for hidden nodes, and analytically determines the output weights β0l , β1l , . . ., l βM by ﬁnding the least square solution to the given linear system. The minimum ˆ = H† Y, where H† is norm least-square solution (LS) to the linear system is β the Moore-Penrose (MP) generalized inverse of matrix H. The minimum norm LS solution is unique and has the smallest norm among all the LS solutions. The Evolutionary Extreme Learning Machine (E-ELM) [3] improves the original ELM by using a Diﬀerential Evolution (DE) algorithm. Diﬀerential Evolution was proposed by Storn and Price [4] and it is known as one of the most eﬃcient evolutionary algorithms.

Evolutionary Learning Using Sensitivity-Accuracy Approach

291

Require: P (Training Patterns), T (Training Tags) 1: Create a random initial population θ = [w1 , . . . , wk , b1 , . . . , bk ] of size N 2: for each individual do ˆ = ELM output(w, P, T ) {Calculate output weights} 3: β ˆ {Evaluate individual} 4: φλ = F itness(w, β 5: end for 6: Select best individual of initial population 7: while Stop condition is not met do 8: Mutate random individuals and apply crossover 9: for each individual in the new population do ˆ = ELM output(w, P, T ) {Calculate output weights} 10: β ˆ {Evaluate model} 11: φλ = F itness(w, β) 12: Select new individuals for replacing individuals in old population 13: end for 14: Select the best model in the generation 15: end while ˆ = ELM output(w, P, T ) 16: function β Calculate the hidden layer output matrix H ˆ = H† Y Calculate the output weight β ˆ 17: function φλ = F itness(w, β, λ, P, T ) Build training confusion matrix M Calculate C and S from M Get classiﬁer ﬁtness with (1) Fig. 1. E-ELM-CS algorithm pseudocode

3.2

The E-ELM-CS Algorithm

As mentioned in Section 2, our approach tries to build classiﬁers with C and S simultaneously optimized. These objectives are not always cooperative. This fact justiﬁes the use a multi-objective approach for the evolutionary algorithm [6]. To obtain the maximization of objectives C and S we use a linear combination of these objectives. This option is a good method when there are two objectives and when the ﬁrst Pareto front has a very small number of models, in some cases only one (see results from MPANN methodology in Balance and Newthyroid datasets in Table 2). In addition, its computational cost is noticeably lower. Weighted linear combination proves to be very eﬃcient in practice for certain types of problems, for example in combinatorial multi-objective optimization. Some of the applications of this technique are schedule evaluation of a resource scheduler or design multiplierless IIR ﬁlters. We consider the ﬁtness function deﬁned by φλ = (1 − λ)C + λS, where λ is a user parameter in [0, 1]. This function evaluates the performance of a classiﬁer depending on a weighted Accuracy level and a weighted Sensitivity. Our proposed method is implemented by using the Evolutionary ELM (EELM)[3]. E-ELM for classiﬁcation problems only considers the misclassiﬁcation rate of the classiﬁer. We have extended the E-ELM to consider both C and S (E-ELM-CS, Evolutionary ELM considering C and S). Since E-ELM considers

292

J. S´ anchez-Monedero et al. Table 1. Datasets used for the experiments Dataset BreastC BreastCW Balance Newthyroid

Size #Input #Classes Distribution p∗ 286 15 2 (201,85) 0.2957 699 9 2 (458,241) 0.3428 625 4 3 (288,49,288) 0.0641 215 5 3 (150,35,30) 0.1296

an error measure as the ﬁtness which should be minimized, we reformulate our ﬁtness function as: 1 (1) φλ = (1 − λ)C + λS The E-ELM-CS algorithm pseudocode is shown in Fig.1. Mutation, crossover and selection operations work as described in [3].

4

Experiments

We consider four datasets with diﬀerent features taken from the UCI repository (see Table 1). The experimental design was conducted using a stratiﬁed holdout procedure with 30 runs, where approximately 75% of the patterns were randomly selected for the training set and the remaining 25% for the test set. 4.1

Comparison Procedure

E-ELM-CS is compared to two popular classiﬁcation algorithms using ANNs: 1. MPANN (Memetic Pareto Artiﬁcial Neural Networks) [7]. MPANN is a Multi-objective Evolutionary Algorithm based on Diﬀerential Evolution [8] with two objectives; one is to minimize the mean squared error (MSE) and the other is to minimize ANN complexity (the number of hidden units). We have implemented a Java version using the pseudocode shown in [7] and the framework for evolutionary computation JCLEC 1 . Thus, the methodology is named MPANN-MSE when the extreme the Pareto front chosen, that is provided by the algorithm, has better MSE; or is called MPANN-HN if the extreme that is chosen provided by the algorithm has better complexity value. 2. TRAINDIFFEVOL (Diﬀerential evolution training algorithm for NeuralNetworks) [8]. TRAINDIFFEVOL is an algorithm to train feed forward multilayer perceptron neural networks based on Diﬀerential Evolution [4]. This algorithm uses the MSE and mean squared weights and biases for training the networks. To obtain the sensitivity for each class, a modiﬁcation of the source code provided by the author2 has been implemented. 1 2

http://jclec.sourceforge.net/ http://www.it.lut.fi/project/nngenetic/

Evolutionary Learning Using Sensitivity-Accuracy Approach

293

Table 2. Statistical results for E-ELM-CS, E-ELM, TRAINDIFFEVOL , MPANNMSE and MPANN-HN C(%) S(%) Means Ranking Means Ranking Algorithm Mean±SD Mean±SD of the C of the S µELMCS ≥ E-ELM-CSλ=0.4 68.97±3.19 33.97±6.82 µELMCS ≥ μTDIF ≥ μELM > μMPANHN ≥ E-ELM 68.36±1.98 23.33±6.42 μMPANHN ≥ μMPAN ≥ μTDIF > TDIF 68.92±2.89 26.35±11.71 μMPAN μELM ; µELMCS > MPANN-MSE 66.53±3.07 28.73±14.23 μMPANHN , (◦) MPANN-HN 66.53±3.07 28.41±14.34 (T-test) µELMCS ≥ BreastCW E-ELM-CSλ=0.4 96.32±0.86 93.87±2.28 µELMCS ≥ μMPANHN ≥ μMPANHN ≥ E-ELM 95.68±1.19 92.61±3.21 μMPAN ≥ μELM > μMPAN ≥ μELM > TDIF 93.98±1.75 86.22±4.69 μTDIF μTDIF MPANN-MSE 96.04±1.08 92.75±3.40 MPANN-HN 96.27±1.00 93.30±3.36 µELMCS > Balance E-ELM-CSλ=0.7 91.48±1.50 86.74±10.01 µMPANHN ≥ μMPANHN , (*) MW E-ELM 90.56±1.38 14.00±17.73 μELMCS , (*) MW test test TDIF 87.12±2.56 2.00±6.10 MPANN-MSE 92.94±1.81 60.00±14.14 MPANN-HN 92.94±1.81 60.00±14.14 µELMCS ≥ Newthy E-ELM-CSλ=0.9 96.23±2.31 80.85±11.88 µELMCS ≥ μMPANHN , MW E-ELM 94.26±2.35 75.77±10.16 μMPANHN ≥ μM P AN ≥ μELM > test µELMCS ≥ TDIF 91.11±4.77 59.47±22.74 μTDIF ; µELMCS > μMPANHN , MW MPANN-MSE 94.87±3.82 72.11±22.29 μMPANHN , test MPANN-HN 94.87±3.82 72.11±22.29 (◦)(T-test) (*) (◦) The average diﬀerence is signiﬁcant with p-values= 0.05 or 0.10 Dataset BreastC

From a statistical point of view, these comparisons are possible because we use the same partitions of the datasets. If not, it would be diﬃcult to justify the equity of the comparison procedure. Regarding the settings of each algorithm that has been compared to E-ELM-CS, we have used the algorithm values advised by the authors in their respective studies. 4.2

Experimental Results

In Table 2 we present the values of the mean and the standard deviation (SD) for C and S for 30 runs obtained in generalization for the best models in each run over the generalization set in each dataset. In E-ELM-CS, the λ parameter is a user parameter. The λ parameter has been obtained for each dataset as the best result of a preliminary experimental design with λ ∈ [0.0, 0.1, . . . , 1.0]. If we analyze the results for C in the generalization set, we can observe that the E-ELM-CS methodology obtains results that are, in mean, better than or similar to the results of the second best methodology (MPANN-HN in three data sets, TRAINDIFFEVOL in two data sets or E-ELM in one data set). On the other hand, the results in mean of S show that the E-ELM-CS methodology obtains a performance that is better than the second best methodology (MPANN-HN in four data sets and TRAINDIFFEVOL or E-ELM in one data set respectively). In order to determine the best methodology for training MLP neural networks (in the sense of its inﬂuence on C and S in the test dataset), an ANalysis Of the VAriance of one factor (ANOVA I) statistical method or the non parametric

294

J. S´ anchez-Monedero et al. unfeasible region E-ELMCS λ=0.7

E-ELM TRAINDIFFEVOL

MPAN-MSE MPAN-HN Best classifier

1 (1-p*=0.94)

C (Accuracy)

0.8

0.6

0.4

0.2

0 Worst 0 classifier

0.2

0.4

0.6

0.8

1

S (Sensitivity)

Fig. 2. Comparison of E-ELM-CS, E-ELM, TDIF, MPAN-MSE and MPAN-HN methods for Balance database

Kruskal-Wallis (KW) tests were chosen depending on the satisfaction of the normality hypothesis of C and S values. The factor methodologies analyze the eﬀect on the C (or S) of each methodology applied with levels i = 1 . . . 5 to E-ELM-CS (ELMCS), E-ELM (ELM), TRAINDIFFEVOL (TDIF), MPANNMSE (MPAN) and MPANN-HN (MPANHN). The results of the ANOVA or KW analysis for C and S show that, for the four data sets, the eﬀect of the methodologies is statistically signiﬁcant at a level of 5%. Because there is a signiﬁcant diﬀerence in mean for C and S using the Snedecor’s F or the KW test; for the former, under the normality hypothesis, a post hoc multiple comparison test is performed of the mean C and S obtained with the diﬀerent levels of the factor. We perform a Tukey test [20] under normality, and a pair-wise T-test, or a pair-wise Mann-Whitney test in other cases. Table 2 shows the results obtained. Columns 5 and 6 present the post hoc Tukey test; and T-test or Mann-Whitney (M-W) tests. The mean diﬀerence is signiﬁcant with p-values= 0.05 (*) or 0.10 (◦). The μA ≥ μB : methodology A yields better results than methodology B, but the diﬀerence is not signiﬁcant; μA > μB : methodology A yields better results than methodology B with signiﬁcant diﬀerences. The binary relation ≥ is not transitive. Observe that there is a relationship between the imbalanced degree of the dataset and the results obtained by the E-ELM-CS algorithm. It is worthwhile to point out that, for imbalanced datasets, E-ELM-CS gets the best performance results and the highest diﬀerences in S when comparing the algorithms (see Balance and Newthyroid results in Table 2). On the other hand, even for two class problems we can observe the same behaviour (compare the S results of BreastCW and BreastC datasets). Finally, our approach improves Sensitivity levels with respect to the original Evolutionary Extreme Learning Machine (EELM), while maintaining Accuracy at the same level.

Evolutionary Learning Using Sensitivity-Accuracy Approach

295

Fig. 2 depicts the sensitivity-accuracy results of the four methodologies for the Balance dataset in the (S, C) space. A visual inspection of the ﬁgure allows us to easily observe the diﬀerence in the performance of E-ELM-CS with respect to E-ELM, TDIF and MPANN.

5

Conclusions

This work proposes a new approach to deal with multi-class classiﬁcation problems. Assuming that a good classiﬁer should combine a high classiﬁcation rate level in the global dataset with an acceptable level for each class, we consider traditionally used Accuracy C and the minimum of the sensitivities of all classes, S. The diﬀerential evolution algorithm and the fast ELM algorithm are used to optimize both measures in a multi-objective optimization approach, by using a ﬁtness function built as a convex linear combination of S and C. The procedure obtains multi-classiﬁers with a high classiﬁcation rate level in the global dataset with a good level of accuracy for each class. In our opinion, the (S, C) approach reveals an interesting point of view for dealing with multi-class classiﬁcation problems since it improves sensitivity levels with respect to the Evolutionary Extreme Learning Machine, while maintaining accuracy at similar levels.

References 1. Provost, F., Fawcett, T.: Analysis and visualization of classiﬁer performance: Comparison under imprecise class and cost distributions. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 43–48 (1997) 2. Huang, G.B., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks 17(4), 879–892 (2006) 3. Zhu, Q.Y., Qin, A., Suganthan, P., Huang, G.B.: Evolutionary extreme learning machine. Pattern Recognition 38(10), 1759–1763 (2005) 4. Storn, R., Price, K.: Diﬀerential evolution – a simple and eﬃcient heuristic for global optimization over continuous spaces. J. of Global Opt. 11(4), 341–359 (1997) 5. Mart´ınez-Estudillo, F., Guti´errez, P., Herv´ as-Mart´ınez, C., Fern´ andez, J.: Evolutionary learning by a sensitivity-accuracy approach for multi-class problems. Accepted in the Proceedings of the 2008 IEEE Congress on Evolutionary Computation (CEC 2008), Hong Kong, China (2008) 6. Fern´ andez, J., Mart´ınez-Estudillo, F., Herv´ as, C., Guti´errez, P.: Sensitivity versus accuracy in multi-class problems using memetic pareto evolutionary neural networks. IEEE Transacctions on Neural Networks (accepted, 2010) 7. Abbass, H.A.: Speeding up backpropagation using multiobjective evolutionary algorithms. Neural Computation 15(11), 2705–2726 (2003) 8. Ilonen, J., Kamarainen, J.K., Lampinen, J.: Diﬀerential evolution training algorithm for feed-forward neural networks. Neural Process. Lett. 17(1), 93–105 (2003)

An Hybrid System for Continuous Learning Aldo Franco Dragoni, Germano Vallesi, Paola Baldassarri, and Mauro Mazzieri Department of Ingegneria Informatica, Gestionale e dell’Automazione (DIIGA), Università Politecnica delle Marche, Via Brecce Bianche, 60131 Ancona, Italy {a.f.dragoni,g.vallesi,p.baldassarri,m.mazzieri}@univpm.it

Abstract. We propose a Multiple Neural Networks system for dynamic environments, where one or more neural nets could no longer be able to properly operate, due to partial changes in some of the characteristics of the individuals. We assume that each expert network has a reliability factor that can be dynamically re-evaluated on the ground of the global recognition operated by the overall group. Since the net’s degree of reliability is defined as the probability that the net is giving the desired output, in case of conflicts between the outputs of the various nets the re-evaluation of their degrees of reliability can be simply performed on the basis of the Bayes Rule. The new vector of reliability will be used for making the final choice, by applying two algorithms, the Inclusion based and the Weighted one over all the maximally consistent subsets of the global outcome. Keywords: Multiple Neural Networks, Hybrid System, Bayesian Conditioning.

1 Introduction Several researches indicate that some complex recognition problems cannot be effectively solved by a single neural network but by “Multiple Neural Networks” systems [1]. The idea is to decompose a large problem into a number of subproblems and then to combine the subsolutions into the global one. Normally modules are domain specific and have specialized computational architectures to recognize certain subsets of the overall task [2]. Each module is typically independent and does not influence or become influenced by the others. Being simpler architectures, modules can respond to given input faster [2]. The combination of expert neural networks can be competitive, cooperative or totally decoupled, but is particularly critical when there are incompatibilities between them. In this case it is necessary to use mechanisms to deal with contradictions. In this work we apply a multiple neural network system to the problem of face recognition and we propose a model for detecting and solving eventual contradictions into the global outcome. Each neural network is trained to recognize a significant region of the face and is assigned an arbitrary a-priori degree of reliability (that may depend on the region of face that must be recognized). This reliability factor can be dynamically re-evaluated on basis of the Bayesian Rule after that contradictions eventually arise. The conflicts depend on the fact that there may be no global agreement about the recognized subject, may be for s/he changed some features of her/his face. The new vector of reliability obtained through the Bayes Rule will be E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 296–303, 2010. © Springer-Verlag Berlin Heidelberg 2010

An Hybrid System for Continuous Learning

297

used for making the final choice, by applying the “Inclusion based” algorithm [3] or another “Weighted” algorithm over all the maximally consistent subsets of the global output. Networks that do not agree with this choice are required to retrain themselves automatically on the basis of the recognized subject. In this way, the system should be able to follow the changes of the faces of the subjects, while continuing to recognize them even after many years thanks to this continuous process of self training.

2 Theoretical Background In this section we introduce some theoretical background from Belief Revision field. Belief Revision occurs when a new piece of information inconsistent with the present belief set is added in order to produce a new consistent belief system [4].

Source: Information

Nogoods

U: β

U: β V: α T: α→⌐β

Bayesian Conditioning

Source

T:α→⌐β

U: β T: α→⌐β

V: α U: β

V: α Goods

U V T

V: α T: α→⌐β

Source

A-posteriori Reliability 0.7983869 0.566774 0.3951612 Selection algorithms

Knowledge Base

U V T

A-priori Reliability 0.9 0.8 0.7

V: α U: β

Fig. 1. A “Belief Revision” mechanism

In Figure 1, we see a Knowledge Base (KB) which two pieces of information: α, which come from source V, and the rule ”if α, then not β” that comes from source T. Unfortunately, another piece of information β, from source U, is coming, causing a conflict in KB. To solve it we find all the “maximally consistent subsets”, called Goods, inside the inconsistent KB, and we choose one of them as the most believable one. In our case there are three Goods: {α, β}; {β, α →¬β}; {α, α→¬β}. Each source of information is associated with an a-priori “degree of reliability”, which is intended as the a-priori probability that provides correct information. In case of conflicts the “degree of reliability” of the involved sources should decrease after “Bayesian Conditioning” which is obtained as follows. Let S = {s1, ..., sn} be the set of the sources, each source si is associated with an a-priori reliability R(si). Let φ be an element of 2S. If the sources are independent, the probability that only the sources belonging to the subset φ ⊆ S are reliable is:

R (φ ) =

∏ R ( si ) * ∏ (1 − R ( si )) . si ∈φ

si ∉φ

(1)

This combined reliability can be calculated for any φ providing that:

∑ R(φ ) = 1 . φ ∈2

S

(2)

298

A.F. Dragoni et al.

Of course, if the sources belonging to a certain φ give incompatible information, then R( φ ) must be zero. Nogoods are defined to be “minimally inconsistent subsets”, so that Goods and Nogoods are dual notions, and finding all the Goods means also finding all the Nogoods. Now, what we have to do is: • summing up into RContradictory the a-priori reliability • putting at zero the reliabilities of all the Nogoods and their supersets • dividing the reliability of all the other set of sources by 1 − RContradictory.

The last step assures that the constrain (2) is still satisfied and it is well known as “Bayesian Conditioning”. The revised reliability NR(si) of a source si is defined to be the sum of the reliabilities of the elements of 2S that contain si. If a source has been involved in some contradictions, then NR(si) ≤ R(si), otherwise NR(si) = R(si). For instance, the application of this Bayesian conditioning to the case of Figure 1 produces the following new reliability: NR(U) = ∑NR(U ∈ S) = 0.7983869, NR(V) = ∑NR(V ∈ S) = 0.566774 and finally, NR(T) = ∑NR(T ∈ S) = 0.3951612. 2.1 Selection Algorithms

These new “degrees of reliability” will be used for choosing the most credible Goods as the one suggested by “the most reliable sources”. There are tree algorithms to perform this task: 1) Inclusion based (IB) This algorithm works as follows: a) select all the Good which contains information provided by the most reliable source b) if the selection returns just one Good, STOP, that’s the searched most credible Good c) else, if there are more than one Good then pop the most reliable source from the list and go to step 1 d) if there are no more Goods in the selection, the ones that were selected at the previous iteration will be returned as the most credible ones with the same degree of credibility. 2) Inclusion based weighted (IBW) is a variation of Inclusion based: each Good is associated with a weight derived from the sum of Euclidean distances between the neurons of the networks (i.e. the inverse of the credibility of the recognition operated by each net). If IB select more than one Good, then IBW selects as winner the Good with a lower weight. 3) Weighted algorithm (WA) combines the a-posteriori reliability of each network with the order of the answers provided. Each answer has a weight 1 n where

n ∈ [1; N ] represents its position among the N responses. Every Good is given a weight obtained by joining together the reliability of each network that supports it with the weight of the answer given by the network itself, as shown in the following equation, where WGoodj is:

Mj

An Hybrid System for Continuous Learning

299

(

(3)

)

WGood j = ∑ 1 * Reli . ni i =1 WGoodj: Weight of Goodj Reli: Reliability of network i ( i ∈ Good j )

ni: Position in the list of answers provided by the network i. Mj: Number of network that compose Goodj If there are more than one Good with the same reliability then the winner is the Good with the highest weight.

3 Face Recognition System an Example Many methods of face recognition have been proposed during the past 30 years [5, 6]. These methods are broadly classified into Holistic methods, Local methods and Hybrid methods [7]. In the Holistic methods each face image is represented as a single high-dimensional vector by concatenating the grey values of all the pixels in the face; Local methods use the local facial features, and, finally Hybrid methods use both local and holistic features to recognize a face. We focus the attention on the Local methods that provide flexibility to recognize a face based on its parts. Each face is represented by a set of four rectangular masks representing her main facial features, i.e., eyes, nose, mouth and hair [8]. We apply the Belief Revision method to the problem of recognizing faces by means of a “Multiple Neural Networks” system. There are four neural nets specialized to perform a specific task: eyes recognition (E), nose recognition (N), mouth recognition (M) and, finally, hair recognition (H). Their outputs are the recognized subjects, and conflicts are simple disagreements regarding the subject recognized. The group should be able to recognize the face even if partial changes occurred. Neural networks are able to upgrade themselves in presence of changes in the input pattern. As an example, let’s suppose that during the testing phase, the system has to recognize the face of four persons: Andrea (A), Franco (F), Lucia (L) and Paolo (P). According to the value of the weights of each trained network, each net will provide in output a list of names of subjects, ordered from the most probable to the least one. For the purpose of this example, we take into account only the first two outputs (i.e. let’s limit the uncertainty to the first two most probable names). Let’s suppose that, after the testing phase, the outputs of the networks are as follows: E gives as output “A or F”, N gives “A or P”, M gives “L or P” and, finally H gives “L or A”. So, the 4 networks do not globally agree since the intersection of the four outputs is void. The problem is to establish the most credible individual corresponding to this contradictory global output. To solve this problem we adopt the Belief Revision method. First of all we need to give an a-priori degree of reliability to each network. Then we have to find Goods and Nogoods. In our example the Goods (the largest subsets of {E,N,M,H} which agree in the choice of at least one subject) are: {E,N,H} corresponding to Andrea, {N,M} corresponding to Paolo and, finally {M,H} corresponding to Lucia. Besides, we identify two Nogoods (the smallest subsets of {E,N,M,H} which have no subject in common): {N,M,H} and {E,M}. Now we have to choose the most credible Good, i.e. the one “provided by the most reliable

300

A.F. Dragoni et al.

networks”. However the reliability of the networks are changed due to the fact they felt in conflict. Starting from an undifferentiated a-priori reliability factor of 0.9, and applying the method described in the previous section we get the following new vector of reliability: NR(E)=0.7684, NR(N)=0.8375, NR(M)=0.1459 and NR(H)=0.8375. The networks N and H have the (same) highest reliability, and by applying the “Inclusion based” algorithm it turns out that the most credible Goods is {E,N,H}, which corresponds to Andrea. So Andrea results from the collective image processing.

Fig. 2. Schematic representation of the Face Recognition System (FRS)

Figure 2 shows a schematic representation of this Face Recognition System (FRS). Which is able to recognize the most probable individual even in presence of serious conflicts among the outputs of the various nets.

4 A Never-Ending Learning Phase Back to the example in Section III, let’s suppose that the network M is not able to recognize Andrea from is mouth. There can be two reasons for the fault of M: either the task of recognizing any mouth is objectively harder, or Andrea could have recently changed the shape of his mouth (perhaps because of the grown of a goatee or moustaches). The second case is interesting because it shows how our FRS could be useful for coping with dynamic changes in the features of the subjects. In such a dynamic environment, where the input pattern partially changes, some neural networks could no longer be able to recognize them. However, if the changes are minimal, we guess that most of the networks will still correctly recognize the face. So, we force each faulting network to re-train itself on the basis of the recognition made by the overall group. On the basis of the a-posteriori reliability and of the Goods, our idea is to automatically re-train the networks that did not agree with the others. The network that do not support the most credible Good are forced to re-train themselves in order to “correctly” (according to the option of the group) recognize the changed face. Each iteration of the cycle applies Bayesian conditioning to the a-priori “degrees of reliability” producing an a-posteriori vector of reliability. To take into account the history of the responses that came from each network, we maintain an “average vectors of reliability” produced at each recognition, always starting from the a-priori degrees of reliability. This average vector will be given as input to the two algorithms, IBW and WA, instead of the a-posteriori vector of reliability produced in the current recognition. In other words, the difference with respect to the BR mechanism described in section 2 is that we do not give an a-posteriori vector of reliability to the two algorithms (IBW and WA), but the average vector of reliability calculated since the FRS started to work with that set of subjects to recognize.

An Hybrid System for Continuous Learning

301

With this feedback, our FRS performs a continuous learning phase adapting itself to partial continuous changes of the individuals in the population to be recognized.

Fig. 3. Schematic representation of re-learning to the system when the input is partially changed

Figure 3 shows the behaviour of the system when the testing image partially changes. Now the subject has moustaches and goatee. So OM network (specialized to recognize the mouth) is no longer able to correctly indicate the tested subject. Since all the others still recognize Andrea, OM will be retrained with the mouth of Andrea as new input pattern.

5 Partial Experimental Results This section shows only partial results: those obtained without the feedback, discussed in the previous section. We compare two groups of neural networks: the first consisting of four networks and the second with five (the additional network is obtained by separating the eyes in two separate networks). All the networks are Learning Vector Quantization, LVQ 2.1 [9], a variation of Kohonen’s LVQ [10], each one specialized to respond to individual template of the face. Learning rate used is shown in the following equation:

α (t ) = η e ( − β t ) .

(4)

where α (t ) decreases monotonically with the number of iterations t (η=0.25 and β=0.001, values obtained after a series of tests to optimize networks). The training set is composed of 20 subjects (taken from FERET database [11]), for each one 4 pictures were taken for a total of 80. Networks were trained, during the learning phase, with three different epochs: 3000, 4000 and 5000. To find Goods and Nogoods, from the networks responses we use two methods: 1) Static method: the cardinality of the response provided by each net is fixed a priori. We choose values from 1 to 5, 1 meaning the most probable individual, while 5 meaning the most five probable subjects 2) Dynamic method: the cardinality of the response provided by each net changes dynamically according to the minimum number of “desired” Goods to be searched among. In other words, we set the number of desired Goods and reduce the cardinality of the response (from 5 down to 1) till we eventually reach that number (of course, if all the nets agree in their first name there will be only one Goods).

302

A.F. Dragoni et al.

In the next step we applied the Bayesian conditioning [12], on the Nogoods obtained with the two previous techniques, obtaining an a-posteriori vector of reliability. These new “degrees of reliability” will be used for choosing the most credible Good (i.e. the name of subject). We use two algorithms to perform this task: 1) Inclusion based weighted (IBW). 2) Weighted algorithm (WA).

Fig. 4. Rate of correct recognition obtained by two algorithms: IBW and WA. Calculated both with 4 and 5 neural networks at different epochs of training (3000, 4000 and 5000)

To test our work, we took 488 different images of the 20 subjects. Figure 4 reports the rate of correct recognition for this test. It shows how WA is better than IBW and the best solution for WA is achieved with five neural networks and 5000 epochs in both the methods (Static and Dynamic). Figure 5 shows the average values of correct recognition of WA with 5000 epochs obtained by the two methods. These results show how the union of the Dynamic method with the WA and five neural networks gives the best solution to reach a 78.20% correct recognition rate of the subjects.

Fig. 5. Average rate of correct recognition with four and five neural networks for Weighted Algorithm and 5000 epochs in both cases Static and Dynamic

6 Conclusion and Future Work Our hybrid method integrates multiple neural networks with a symbolic approach to Belief Revision to deal with pattern recognition problems that:

An Hybrid System for Continuous Learning

303

1) require the cooperation of multiple neural networks specialized on different topics 2) the individuals to recognize change dynamically some of their features so that some nets occasionally fail. We tested this hybrid method with a face recognition problem, training each net on a specific region of the face: eyes, nose, mouth, and hair. Every output unit is associated with one of the persons to be recognized. Each net gives the same number of outputs. We consider a constrained environment in which the image of the face is always frontal, lighting conditions, scaling and rotation of the face being the same. We accommodated the test so the changes of the faces are partial, for example the mouth and hair do not change simultaneously. The system assigns a reliability factor to each neural network, which is recalculated on the basis of conflicts that occur among them. The new “degrees of reliability” are obtained through the Bayesian Conditioning. These new “degrees of reliability” can be used to select the most likely subject. The networks that do not agree with the choice made by the overall group will be forced to re-train themselves on the basis of the global output. So, the overall system is engaged in a never ending loop of testing and re-training that makes it able to cope with dynamic partial changes in the features of the subjects.

References 1. Shields, M.W., Casey, M.C.: A theoretical framework for multiple neural network systems. Neurocomputing 71, 1462–1476 (2008) 2. Li, Y., Zhang, D.: Modular neural networks and their applications in biometrics. Trends in Neural Computation 35, 337–365 (2007) 3. Benferhat, S., Cayrol, C., Dubois, D., Lang, J., Prade, H.: Inconsistency management and prioritized syntax-based entailment. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI 1993), pp. 640–645 (1993) 4. Gärdenfors, P.: Belief Revision. Cambridge Tracts in Theoretical Computer Science, vol. 29 (December 2003) 5. Tolba, A.S., El-Baz, A.H., El-Harby, A.A.: Face recognition a literature review. International Journal of Signal Processing 2, 88–103 (2006) 6. Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: a literature survey. ACM Computing Surveys 35(4), 399–458 (2003) 7. Tan, X., Chen, S., Zhou, Z.H., Zhang, F.: Face recognition from a single image per person: a survey. Pattern Recognition 39, 1725–1745 (2006) 8. Brunelli, R., Poggio, T.: Face recognition: features versus template. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(10), 1042–1052 (1993) 9. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., Torkkola, K.: Lvq Pak: The Learning Vector Quantization Program Package (1995) 10. Kohonen, T.: Learning vector quantization. In: Self-Organising Maps, 3rd edn. Springer Series in Information Sciences. Springer, Heidelberg (1995) 11. Philips, P.J., Wechsler, H., Huang, J., Rauss, P.: The FERET Database and Evaluetion Procedure for Face-Recognition Algorithms. Image and Vision Computing J. 16(5), 295– 306 (1998) 12. Dragoni, A.F.: Belief revision: from theory to practice. The Knowledge Engineering Review, 147–179 (2001)

Support Vector Regression Algorithms in the Forecasting of Daily Maximums of Tropospheric Ozone Concentration in Madrid E.G. Ortiz-Garc´ıa, S. Salcedo-Sanz, A.M. P´erez-Bellido, J. Gasc´on-Moreno, and A. Portilla-Figueras Universidad de Alcal´ a, Madrid, Spain

Abstract. In this paper we present the application of a support vector regression algorithm to a real problem of maximum daily tropospheric ozone forecast. The support vector regression approach proposed is hybridized with an heuristic for optimal selection of hyper-parameters. The prediction of maximum daily ozone is carried out in all the station of the air quality monitoring network of Madrid. In the paper we analyze how the ozone prediction depends on meteorological variables such as solar radiation and temperature, and also we perform a comparison against the results obtained using a multi-layer perceptron neural network in the same prediction problem.

1

Introduction

Ozone (O3 ) is one of the most relevant air pollutants in urban areas of all medium and large cities of the world [1,2]. It is well known that ozone is a secondary pollutant, since it is not directly emitted into the air. On the contrary, tropospheric ozone is produced when the primary pollutants, mainly nitrogen oxides (N Ox ) and Volatile Organic Compounds (VOC), interact under the action of the sunlight. In addition, O3 is recognized as one of the key pollutants degrading the air quality in urban areas, and it is responsible for increases in mortality rates during episodes of high concentration, mainly in summer. The study of the O3 concentrations, and specially O3 maxima, is, therefore, of major interest. Several works on O3 modeling and forecasting can be found in the literature [3,4], many of them tackle the problem of the modeling or forecasting the complete concentration of O3 in a column, or the distribution of the pollutant in a study area. There are also speciﬁc works on ground O3 forecasting from air quality stations in diﬀerent cities of the world [5,6]. Recently the computation paradigm of Support Vector Machines (SVMs) has gained importance in forecasting problems related to environment [7]. Speciﬁcally, the Support Vector Regression algorithms (SVMrs) – SVMs speciﬁcally developed for regression problems – are appealing algorithms for a large variety of regression problems [8], since they do not only take into account the error approximation to the data, but also the generalization of the model, i.e., their capability to maintain a good performance when new data are evaluated E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 304–311, 2010. c Springer-Verlag Berlin Heidelberg 2010

Support Vector Regression Algorithms

305

by it. Several previous works have applied the SVM or SVMr methodology for O3 forecasting or related problems. In [7] the SVM approach is applied to the forecasting of the O3 at ground level in Hong-Kong. The authors propose an interesting modiﬁcation of the standard SVM for classiﬁcation problems in order to be able to tackle regression problems with it. In [6] an online forecasting system for pollutants based on SVMs is presented. The experimental test of the approach is also carried in Hong-Kong and surrounding areas. In [9] the prediction of retention time of VOC at ground level is carried out with a SVM. The performance of the SVM algorithm is compared with that of a heuristic algorithm for the same purpose. In [10] the performance of a SVM algorithm is tested in the forecasting of diﬀerent atmospheric pollutants, including O3 , and in [11] the SVM is mixed with wavelets for improving the performance of the SVM approach in a problem of meteorological pollutants forecasting. In this paper we present the application of a SVMr algorithm in the forecasting of O3 daily maximums from data of the Madrid air quality network. We use a SVMr algorithm which incorporates a mechanism based on bounds to better estimate the corresponding hyper-parameters of the SVMr machine [12]. We study the possibility of using meteorological variables to improve the forecasting of the O3 concentrations and include a comparison with the results obtained by a multi-layer perceptron. The structure of the rest of the paper is the following: next section presents the description of the -SVMr approach and the criterion to choose the SVMr hyper-parameters. Next we describe the Madrid air quality network. Section 4 presents the experimental part of the paper, where we provide the main results obtained with the SVMr. Section 5 closes the paper giving some ﬁnal conclusions.

2

SVMr Formulation and Parameters Search Space Reductions

Although there are several versions of SVMr, in this case we use the classic model presented in [13], i.e., the -SVMr. This method for regression consists of, given a set of training vectors S = {(xi , yi ), i = 1, . . . , l}, training a model of the form y(x) = f (x) + b = wT φ(x) + b. Basically, the training of this model is carried out by means of solving the following optimization problem, coming from a dual formulation of a quadratic problem (see [13] for details): ⎛ l 1 (αi − α∗i )(αj − α∗j )K(xi , xj )− max ⎝− 2 i,j=1 −

l

(αi +

α∗i )

i=1

+

l

yi (αi −

α∗i )

(1)

i=1

subject to l i=1

(αi − α∗i ) = 0

(2)

306

E.G. Ortiz-Garc´ıa et al.

αi , α∗i ∈ [0, C]

(3)

The ﬁnal form of the regression function f (x) depends on the variables αi , α∗i , as follows: l f (x) = (αi − α∗i )k(xi , x) (4) i=1

In this way it is possible to obtain a SVMr model by means of the training of a quadratic problem for a given hyper-parameters C, and for the kernel parameter γ, which controls the width of the Gaussian function which we select as kernel function. To obtain the optimal hyper-parameters we use the grid search algorithm with search space reduction described in [12] because it allows to obtain very good results with a small training time. In that paper a novel methodology which obtains a good balance between training time and accuracy is introduced. This methodology is based on a classical grid search, which divides the search space in an uniform distribution of points around the whole space. Then, it evaluates the validation accuracy of the model trained by using the hyper-parameter in each point, and ﬁnally the model with smaller validation error is chosen as the ﬁnal one. The most important characteristic in this novel algorithm is the addition of hyper-parameters search space reductions. These reductions enclose the search space in an smaller subspace where the grid search is carried out. The search space reductions proposed in [12] are described by the following equations: C≤

yimax − b − l 1 (1 − l−1 j=1,j =i K(xj , xi ))

γ≤−

( 1l

l

loge (0.001)

i=1

2 minj,i =j d(xj , xi ))

< σy

(5)

(6) (7)

Equation (5) describes the relationship between the regularization hyperparameter C and the rest of hyper-parameters. It is specially important the relationship of parameter C with parameter γ because it generates the most important reduction in the search space. The rest of bounds (Equations (6) and (7)) are related to the characteristic of minimum inﬂuence between support vector and the closed relationship between the hyper-parameter and the variance of noise in the data. After applying these reductions, a grid search algorithm is used in the experimental part of the paper to ﬁnd the hyper-parameters of the SVMr, in the O3 prediction problem.

3

The Air Pollution Monitoring Network of Madrid

The air pollution monitoring network of Madrid is the largest in Spain, and one of the largest in Europe. It is currently formed by 27 measuring (ﬁxed) stations spread out in the city (see Fig. 1). At the beginning the network was formed by 16

Support Vector Regression Algorithms

307

Fig. 1. Location of the measuring stations of the air quality monitoring network of Madrid

measuring stations, connected by the telephonic network with a center of data control, depending on the department of air quality of Madrid City Council. In 1989 the network was completely renewed, new “intelligent” stations were acquired. At this point the systematic measurement of NOx and O3 started. The monitoring network in its current form was ﬁnished in 2001, when the last 2 stations were added to the network and several other stations were moved from their original location due to technical reasons. The network can be considered as a routine network [14] because it has been designed to develop long-term concentration of contaminants studies. In fact, the available database is formed by hourly measures for several contaminants during years from 2002 to 2007 in all measuring stations. On the other hand, the network provides meteorological variables in this period of time by means of other 6 meteorological stations which are able to measure solar radiation, temperature and others.

4 4.1

Experiments and Results General Description of the Experiments

To carry out a daily prediction of maxima concentration, it is necessary to apply a maximum function to the hourly measures provided by the network, obtaining 365 ozone concentrations a year. To obtain multiple training and test sets to develop the experiments and to be able to compare the accuracy of the models by using statistical tests, we divide each year in ﬁve subsets. Each experiment is formed by two consecutive subsets, using the ﬁrst one as training set and the second one as test set. In this way, we keep the temporal relationship between data, except in the case of the last subset which is tested in the ﬁrst one. With all these subsets we obtain a total of 30 experiments which allows to perform statistical test to compare them. These statistical tests consist of a t-test with

308

E.G. Ortiz-Garc´ıa et al.

α = 0.05 signiﬁcance level after a Kolmogorov-Smirnov normality test (positive test in all of the experiments carried out). On the other hand, to reduce the training time for the experiments we use only 5 out of 27 available measuring stations. We choose the measuring stations which present highest maximums of ozone concentration for the six studied years. That is because we expect this characteristic will remain in future years, being thus more important to forecast those stations than others. 4.2

Analysis of Dependency with Solar Radiation and Temperature

Now we compare the accuracy of the models trained by using four previous ozone measures (in four previous days) and some meteorological variables, in our case, solar radiation and temperature. Note that alternative analysis not included in this paper have shown that other meteorological variables such as wind direction, wind speed, etc., are not statistically related to daily maximum ozone prediction. The mean and standard deviation of the accuracy for the 30 experiments are shown in Table 1 and the statistical test carried out are displayed in Table 2. These results show that the solar radiation can improve statistically in 2 out of the 5 stations (stations 10 and 24) and a good number of winner experiments

Table 1. Mean and standard deviation of the SVMr accuracy for the 30 experiments, considering ozone measures from four previous days without meteorological variables, and also including solar radiation, temperature or both

Station 5 9 10 14 24

None Mean Std 17.56 4.80 15.69 4.06 17.38 4.91 16.84 3.72 17.29 4.01

Solar radiation Mean Std 17.53 4.59 15.61 4.18 17.13 4.83 16.53 3.11 17.00 3.79

Temperature Mean Std 17.82 4.19 15.78 4.17 17.39 4.50 17.01 4.11 17.23 3.66

Both Mean Std 17.68 4.22 15.83 4.13 17.13 4.49 16.87 3.88 17.04 3.74

Table 2. Statistical tests for the 30 experiments considering ozone measures from four previous days. We compare the case of not including meteorological variables with the case of including solar radiation, temperature or both. W-L-T stands for Win-Lost-Tie results in the comparison. Solar radiation Station P-value W-L-T 5 0.80∗ 15-15-0 9 0.65∗ 16-14-0 10 0.04∗ 21-9-0 14 0.22∗ 17-13-0 24 0.02∗ 18-12-0 ∗ t-test α =0.05.

Temperature P-value W-L-T 0.26∗ 15-15-0 0.62∗ 15-15-0 0.97∗ 17-13-0 0.56∗ 15-15-0 0.71∗ 14-16-0

Both P-value W-L-T 0.69∗ 19-11-0 0.53∗ 16-14-0 0.09∗ 19-11-0 0.92∗ 17-13-0 0.06∗ 18-12-0

Support Vector Regression Algorithms

309

in the station 14. However, by using temperature or both of them, it is only possible to obtain similar results to the standard case. Therefore, following these results, the optimal selection of features in this case includes solar radiation, apart from the ozone samples of four days before. A graphic example of the prediction obtained with the model trained by using this features and the real ozone concentration values are shown in Fig. 2. In this ﬁgure we show the prediction in the six years studied: 2002 to 2007. Note that the values concern to the prediction of each input vector in the test sets used in the statistical tests. It is possible to see how the trend of the prediction and the real data is very similar. 150 O3(max)

O3(max)

150 100 50 0

100

200 2002

O3(max)

O3(max)

50

100

200 2004

200 2003

300

100

200 2005

300

100

200 2007

300

100 50 0

300

150 O3(max)

150 O3(max)

100

150

100

100 50 0

50 0

300

150

0

100

100

200 2006

300

100 50 0

Fig. 2. Comparison of the forecast and measures of maximum daily ozone concentrations in diﬀerent years (Station 9)

4.3

Comparison between SVMr Model and MLP

Finally, we compare the performance of the SVMr model presented with a neural network based on a multi-layer perceptron (MLP). To train the MLP we use a variable number of neurons in the hidden layer, from 6 to 20, and it is trained by using a Levenberg-Marquardt algorithm and repeating the training for 20 repetitions. In addition, a hold-out validation process is used in order to control the generalization of the model and to choose the best MLP from all repetitions. The mean and standard deviation of the RMSE (root mean square error) of the MLP model and the best SVMr found in the previous subsection and the statistical tests are shown in Table 3. These results clearly show that the SVMr model obtains better performance than MLPs, obtaining better mean accuracy and statistical diﬀerences in all the evaluated stations.

310

E.G. Ortiz-Garc´ıa et al.

Table 3. Mean and standard deviation of the accuracy for the 30 experiments considering ozone measures from four previous days and solar radiation, by using a multi-layer perceptron or SVMr model MLP Station Mean Std 5 34.60 14.75 9 32.90 16.12 10 34.99 15.97 14 31.58 14.13 24 33.28 15.26 ∗ t-test α =0.05.

5

SVMr Mean Std 17.53 4.59 15.61 4.18 17.13 4.83 16.53 3.11 17.00 3.79

SVMr t-test 0.00∗ 0.00∗ 0.00∗ 0.00∗ 0.00∗

vs MLP W-L-T 29-1-0 29-1-0 28-2-0 28-2-0 29-1-0

Conclusions

In this paper we have presented a complete study of the application of support vector regression (SVMr) algorithms to daily maximum ozone prediction in Madrid urban area. Comparison with the results of a multi-layer perceptron has shown the good performance of the approach. The results obtained are promising, and show that this methodology can be useful to face this type of problems. The application of the SVMr to other prediction ozone problems with diﬀerent time horizon, such as hourly or long-time prediction could be an interesting future line of research.

Acknowledgement This work is partially supported by Comunidad de Madrid and Universidad de Alcal´ a through Project CCG08-UAH/AMB-3993. E. G. Ortiz-Garc´ıa is supported by the FPU Predoctoral Program (Spanish Ministry of Innovation and ´ M. P´erez-Bellido is supported with Science) grant reference AP2008-00248. A. a FPI fellowship from Junta de Comunidades de Castilla la Mancha.

References 1. Massart, B., Kvalheim, O.M.: Ozone forecasting from meteorological variables: Part I. Predictive models by moving window and partial least squares regression. Chemometrics and Intelligent Laboratory Systems 42(1-2), 179–190 (1998) 2. Massart, B., Kvalheim, O.M.: Ozone forecasting from meteorological variables: Part II. Daily maximum ground-level ozone concentration from local weather forecasts. Chemometrics and Intelligent Laboratory Systems 42(1-2), 191–197 (1998) 3. Garﬁas-V´ azquez, M., Audry-S´ anchez, J., Garﬁas-Ayala, F.J.: Tropospheric ozone prediction in Mexico city. Journal of the Mexican Chemistry Society 49(1), 2–9 (2005) 4. Brunelli, U., Piazza, V., Pignato, L., Sorbello, F., Vitabile, S.: Two-days ahead prediction of daily maximum concentrations of SO2 , O3 , P M10 , N O2 , CO in the urban area of Palermo, Italy. Atmospheric Environment 41, 2967–2995 (2007)

Support Vector Regression Algorithms

311

5. Wang, D., Lu, W.: Ground-level ozone prediction using multilayer perceptron trained with an innovative hybrid approach. Ecological Modelling 198, 332–340 (2006) 6. Wang, W., Men, C., Lu, W.: Online prediction model based on support vector machine. Neurocomputing 71(4-6), 550–558 (2008) 7. Lu, W., Wang, D.: Ground-level ozone prediction by support vector machine approach with a cost-sensitive classiﬁcation scheme. Science of the Total Environment 395, 109–116 (2008) 8. Mohandes, M.A., Halawani, T.O., Rehman, S., Hussain, A.A.: Support vector machines for wind speed prediction. Renewable Energy 29(6), 939–947 (2004) 9. Luan, F., Xue, C., Zhang, R., Zhao, C., Liu, M., Hu, Z., Fan, B.: Prediction of retention time of a variety of volatile organic compounds based on the heuristic method and support vector machine. Analytica Chimica Acta 537(1-2), 101–110 (2005) 10. Lu, W.-Z., Wang, W.-J.: Potential assessment of the support vector machine method in forecasting ambient air pollutant trends. Chemosphere 59(5), 693–701 (2005) 11. Osowski, S., Garanty, K.: Forecasting of the daily meteorological pollution using wavelets and support vector machine. Engineering Applications of Artiﬁcial Intelligence 20(6), 745–755 (2007) 12. Ortiz-Garc´ıa, E., Salcedo-Sanz, S., P´erez-Bellido, A., Portilla-Figueras, J.A.: Improving the training time of support vector regression algorithms through novel hyper-parameters search space reductions. Neurocomputing 72, 3683–3691 (2009) 13. Smola, A.J., Schlkopf, B.: A tutorial on support vector regression. Statistics and Computing (1998) 14. Hoek, G., Beelen, R., de Hoogh, K., Vienneau, D., Gulliver, J., Fischer, P., Briggs, D.: A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmospheric Environment 42, 7561–7578 (2008)

Neuronal Implementation of Predictive Controllers José Manuel López-Guede∗, Ekaitz Zulueta, and Borja Fernández-Gauna Computational Intelligence Group UPV/EHU {jm.lopez,ekaitz.zulueta,manuel.grana,alicia.danjou}@ehu.es www.ehu.es/ccwintco

Abstract. In spite of the multiple advantages that Model Predictive Control offers (for example, they can control systems that classical control schemes can’t), it has a main drawback: it is computationally expensive in its working phase. In this paper we deal with the problem of getting an implementation of predictive controllers that implements its operations in an efficient way, so we use a neuronal implementation. We show how we have trained these neural networks, and how we exploit their generalization property and their robustness when there are control and measurement disturbances.

1 Introduction In this paper we show a work to implement Model Predictive Controllers using a neuronal implementation to overcome the drawbacks derived from the classical analytical implementation of these controllers. Model Predictive Control is an interesting advanced control schema because classic and well known control schemas as PID controllers could have problems to control some systems, and in this situation, we can use other advanced control systems which try to emulate the human brain, as Predictive Control. This kind of control works using a world model and calculating some predictions about the response that it will show under some stimulus, and it obtains the better way of control the system knowing which is the desired behavior from this moment until a certain instant later. The predictive controller tuning is a process that is done using analytical and manual methods. Such tuning process is expensive in computational terms, but it is done once and in this paper we don’t deal with this problem. However, in spite of the great advantage of predictive control, which contributes to control systems that the classic control is unable to do, it has a great drawback: it is very computationally expensive while it is working. In section 2 we will revise the cause of this problem. A way of avoiding this drawback is to model the predictive controller using neural networks, because once these devices are trained they perform the calculus at great speed and with very small computational requirements. In this paper we propose a learning model to be used with Time Delayed Neural Networks, so once the neural network is trained, the neuronal predictive controller is ready and it responds properly showing its generalization capabilities in environments that it hasn’t seen in the training phase, ∗

This work was supported in part by the Spanish Ministerio de Educación y Ciencia under grant DPI2006-15346-C03-03.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 312–319, 2010. © Springer-Verlag Berlin Heidelberg 2010

Neuronal Implementation of Predictive Controllers

313

and it shows its robustness when there are control and measurement disturbances. On the other hand, they are a powerful implementation because they aren’t so computationally expensive as the classical analytical predictive control. In the literature there are works talking about the comparison between PID and MPC controllers [1], and there are interesting approximations to the prediction capacity of neuronal models when predictive control is present [2-4]. The stability of these neural networks is an important issue [5], so we will analyze their robustness in this paper. Section 2 gives background information about Predictive Control and a technique called Dynamic Matrix Control, discussing its advantages and drawbacks. Section 3 talks about a kind of neural nets called Time Delayed Neural Networks and about their structural parameters. Section 4 show how a neuronal implementation can be found, and finally, conclusions are covered in Section 5.

2 Model Predictive Control This section gives a brief introduction about a general technique called Model Predictive Control, and about a concrete technique called Dynamic Matrix Control. We consider that it is necessary to understand the advantages of this kind of control, that make it very useful in some circumstances, and their drawbacks, and then understand how a neural network based implementation can eliminate these drawbacks. 2.1 Model Predictive Control (MPC) and Dynamic Matrix Control (DMC) Model Predictive Control (MPC) is an advanced control technique used to deal with systems that are not controllable using classic control schemas as PID. This kind of controller works like the human brain in the sense that instead of using the past error between the output of the system and the desired value, it controls the system predicting the value of the output in a sort time, so the system output is as closer as possible to its desired value for these moments. Predictive Control isn’t a concrete technique. It’s a set of techniques that have several common characteristics: there is a world model that is used to predict the system output from the actual moment until p samples, an objective function that must be minimized and a control law that minimizes the objective function. The predictive controllers follow these steps: •

• •

Each sampling time, through the system model, the controller calculates the system output from now until p sampling times (prediction horizon), which depends on the future control signals that the controller will generate. A set of m control signals is calculated optimizing the objective function to be used along m sampling times (control horizon). In each sampling time only the first of the set of m control signals is used, and in the next sampling time, all the process is repeated again.

The technique called Dynamic Matrix Control (DMC) is a concrete MPC algorithm that uses: •

As subsystem model, the step response of the subsystem,

314

• •

J.M. López-Guede, E. Zulueta, and B. Fernández-Gauna

As objective function, it measures the difference between the reference signal and the subsystem output, As control law, the shown in the equation (1), being G a matrix that contains the systems dynamics, λ a parameter about the following capacity of the subsystem, w the reference signal and f the free response of the subsystem.

(

Δu = G t G + λI

)

−1

G t (w − f ) .

(1)

To learn more about Predictive Control in general, and about Dynamic Matrix Control in particular, see [6-9]. 2.2 Model Predictive Control Advantages From a theoretical point of view, model predictive based controllers have some advantages: - It is an open methodology, with possibility of new algorithms. - They can include constraints on manipulated variables as well as on controlled variables. This is important to save energy and to get the working point as near as possible from the optimum. - They can deal with multivariable systems in a simplest way than other algorithms. From a practical point of view, model predictive based controllers have the advantage that they can deal with systems that show stability problems with classical control schemes. To show this property we are going to suppose that the model of a system is described by the following discrete transfer function: H (z ) =

1 . z − 0.5

(2)

Although it is a stable system because its pole is inside the unit circle, its response is unstable if we try to control it using a PID controller tuned through classic and wellknown methods as Ziegler-Nichols. However, using a properly tuned DMC predictive controller, for example, with the values for its parameters p = 5 , m = 3 and λ = 1 , a right control is obtained. To get this control it has been mandatory to tune the DMC controller. This phase is very expensive in computationally terms, but it’s carried out only one time. To know more about this tuning phase, see [10]. 2.3

Model Predictive Control Drawbacks

The main drawback of predictive controllers isn’t that it was very expensive in computationally terms in the tuning phase, because it is carried out only one time. The main drawback is that the computational requirements of the shown controller are great when it’s in its working phase. Each sample time the controller must calculate the control law of the equation (1), and there are involved several matrix operations, as several multiplication, an addition and a subtraction. Performing these operations we obtain a set of m control signals, but only the first of them is used in this sample

Neuronal Implementation of Predictive Controllers

315

time, the rest are ignored. The algorithm woks in this way, but it is computationally inefficient.

3 Neural Implementation Following with the discussion about the computationally inefficiency of the analytical predictive control shown in the previous section, we think that it would be convenient to have a mechanism that could implement such controller requiring less computational power. An alternative to get this is to use neural networks, and more precisely, Time Delayed Neural Networks, because as the rest of neural networks, they are very fast, computationally inexpensive and they have the ability of generalizing their responses. We do not consider a hierarchical structure [21]. This section gives a brief introduction about a kind of neural networks called Time Delayed Neural Networks, that we have used to model the previous model predictive controller to eliminate the shown drawbacks. 3.1 Time Delayed Neural Networks

Time Delayed Neural Networks (TDNN) are a kind of multi-layer perceptron neural networks. The TDNN special feature is that they are a kind of dynamic neural networks, because delayed versions of the input signals are introduced to the input layer. Due to this, the outputs don’t depend only on the actual values of the signals, they depend on the past value of the signals too. This kind of neural network can be trained using the Backpropagation algorithm or the Generalized Delta Rule. In the experiments that we show in this paper the Levenberg-Marquardt method has been used. To learn more about neural networks in general see [11-13]. To learn more about Time Delayed Neural Networks, see [14-16]. 3.2 Structural Parameters

As we are worried about the computational cost of our implementation of the predictive controller, we limit the number of hidden layers to one, so we assume that we are working with a time delayed neural network that has the simplest structure. Once we have established this constraint, the main parameters that configure the structure of this TDNN are the number of neurons of the hidden layer and the size of the time delay line, in other words, the number of delayed versions of the input signals are introduced to the input layer. We will try to get these parameters as small as possible to minimize the computational cost of the resultant implementation. The last main parameter to establish is the kind of the function that will be executed in each neuron, and we will take into account that the linear function is the least expensive from the computational point of view. In Fig. 1 we show the structure of the Time Delayed Neural Network that we have used to get our purpose, in which we have fitted the size of time delay line d and the size of hidden layer h parameters.

316

J.M. López-Guede, E. Zulueta, and B. Fernández-Gauna Input

Hidden

Out

T D L (d)

(3)

(h)

(1)

Fig. 1. Time Delayed Neural Network structure with 3 layers: input layer with 3 inputs, and the output layer with 1 output

4 Results In this section we will deal with the concrete problem of get a neuronal predictive controller that could control the system described by the discrete transfer function of the equation (2) using Time Delayed Neural Networks. We have done training experiments with multiple structures, varying the two main structural parameters: the number of the hidden layer neurons h and the number of delays of the time delay line d , having in mind that linear function is computationally efficient. We have used the Levenberg-Marquardt method to carry out the training of each structure, and the training model has consisted of a target vector ′ P = [w(k ) , y (k ) , Δu (k − 1)] and an output Δu (k ) , to get the same control that equation (1). As it has be shown in Fig. 2, the control is right with known references, and in Fig. 3, we can see that the neuronal controller is right even with noisy references that hadn’t been used in the training phase due to the generalization property of neural networks. Now we exploit more characteristics of this neural predictive controller once we have proved the generalization property of neural networks: we are going to explore the robustness of the learned predictive controller. To get this we have designed some experiments where we apply some perturbations to the control signal Δu (k ) generated by the neural predictive controller, and on the other hand, we apply the same perturbations to the measured output of the system y (k ) .

The perturbation that we apply is white noise of mean zero and variance σ 2 = 10 −3 . In Fig. 4 there is shown the control that the neuronal controller executes when the reference signal is the same that is used in Fig. 3. As we can see, the performance is very close to the performance of the analytic predictive controller that is shown in Fig. 3. To learn more about identification and control of dynamical systems, see [17-18], and about neural identification applied to predictive control, see [19-20].

Neuronal Implementation of Predictive Controllers

317

Fig. 2. Control of a system with a Time Delayed Neural Network with a time delay line of d = 7 delays in the input, and h = 5 neurons in the hidden layer. The reference to follow is a signal that the neural network has been used in the training phase.

Fig. 3. Control of a robot system with a Time Delayed Neural Network with a time delay line of d = 7 delays in the input, and h = 5 neurons in the hidden layer. The reference to follow is a signal that the neural network hasn’t been used in the training phase.

318

J.M. López-Guede, E. Zulueta, and B. Fernández-Gauna

Fig. 4. Control executed by a neural predictive control with control and measurement disturbance

5 Conclusions This paper has started showing that there are systems that cannot be controlled using classical control schemas as PID. So, there are advanced control schemes as Predictive Control that can control them, and we have show how a system of this kind is controlled. Predictive Control has some advantages and a very clear drawback: it is computationally expensive in its tuning and working phases. To overcome this drawback in the control phase the authors propose a neural network based implementation. We show how a concrete predictive controller can be learned with a concrete learning structure, showing that its performance is very good even with unknown references and with control and measurement disturbances. It would be interesting to apply them to multicomponent sytems [22] and linked systems [23].

References 1. Voicu, M., Lazär, C., Schönberger, F., Pästravanu, O., Ifrim, S.: Predictive Control vs. PID Control of Thermal Treatment Processes. In: Control Engineering Solution: a Practical Approach, pp. 163–174 2. McKinstry, J.L., Edelman, G.M., Krichmar, J.L.: A cerebellar model for predictive motor control tested in a brain-based device. Proceedings of the National Academy of Sciences of The United States of America 103(9), 3387–3392 (2006) 3. Aleksic, M., Luebke, T., Heckenkamp, J., Gawenda, M., et al.: Implementation of an Artificial Neural Network to Predict Shunt Necessity in Carotid Surgery. Annals if Vascular Surgery 22(5), 635–642 (2008)

Neuronal Implementation of Predictive Controllers

319

4. Kang, H.: A neural network based identification-control paradigm via adaptative prediction. In: Proceedings of the 30th IEEE Conference on Decision and Control, vol. 3, pp. 2939–2941 (1991) 5. Wilson, W.H.: Stability of Learning in Classes of Recurrent and Feedforward Networks. In: Proceedings of the Sixth Australian Conference on Neuronal Networks (ACNN 1995), pp. 142–145 (1995) 6. Camacho, E.F., Bordons, C.: Model Predictive Control. Springer, London (2004) 7. Camacho, E.F., Bordons, C.: Model Predictive Control in the Process Industry. Springer, London (1995) 8. Maciejowski, J.M.: Predictive Control with Constraints. Prentice Hall, London (2002) 9. Sunan, H., Kok, T., Tong, L.: Applied Predictive Control. Springer, London (2002) 10. López-Guede, J.M., Zulueta, E., Graña, M., Oterino, F.: Ajuste Manual de Controladores Predictivos. In: III Jornadas de Inteligencia Computacional. Ed. Universidad del País Vasco, pp. 472–475 (2009), http://www.ehu.es/ccwintco/index.php/ Libros_y_monograf%C3%ADas_editadas 11. Braspenning, P.J., Thuijsman, F., Weijters, A.J.M.M.: Artificial Neural Networks. Springer, Berlin (1995) 12. Chester, M.: Neural Networks. Prentice Hall, New Jersey (1993) 13. Widrow, B., Lehr, M.A.: 30 Years of Adaptative Neural Networks: Perceptron, Madaline, and Backpropagation. Proceedings of IEEE 78(9), 1415–1441 (1990) 14. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme Recognition Using Time Delay Neural Networks. IEEE Transactions on Accoustics, Speech and Signal Processing 37, 328–339 (1989) 15. Wang, Y., Kim, S.-P., Principe, J.C.: Comparison of TDNN training algorithms in brain machine interfaces. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2005), vol. 4, pp. 2459–2462 (2005) 16. Taskaya-Temizel, T., Casey, M.C.: Configuration of Neural Networks for the Analysis of Seasonal Time Series. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 297–304. Springer, Heidelberg (2005) 17. Narendra, K.S., Parthasarathy, K.: Indentification and Control of Dynamical Systems Using Neural Networks. IEEE Tran. Neural Networks 1(1), 491–513 (1990) 18. Norgaard, M., Ravn, O., Poulsen, N.K., Hansen, L.K.: Neural Networks for Modelling and Control of Dynamic Systems. Springer, London (2003) 19. Arahal, M.R., Berenguel, M., Camacho, E.F.: Neural identification applied to predictive control of a solar plant. Control Engineering Practice 6, 333–344 (1998) 20. Huang, J.Q., Lewis, F.L., Liu, K.: A Neural Net Predictive Control for Telerobots with Time Delay. Journal of Intelligent and Robotic Systems 29, 1–25 (2000) 21. Graña, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 22. Duro, R.J., Graña, M., de Lope, J.: On the potential contributions of hybrid intelligent approaches to multicomponent robotic system development. Information Sciences (in pres, 2010) 23. Echegoyen, Z., Villaverde, I., Moreno, R., Graña, M., d’Anjou, A.: Linked multicomponent mobile robots: modeling, simulation and control. Robotics and Autonomous Systems (submitted, 2010)

α-Satisfiability and α-Lock Resolution for a Lattice-Valued Logic LP(X) Xingxing He1 , Yang Xu1 , Yingfang Li2 , Jun Liu3 , Luis Martinez4 , and Da Ruan5 1

3 5

Intelligent Control Development Center, Southwest Jiaotong University, Chengdu 610031, Sichuan, PR China 2 Department of Mathematics, Southwest Jiaotong University, Chengdu 610031, Sichuan, PR China School of Computing and Mathematics, University of Ulster, Northern Ireland, UK 4 Department of Computing, University of Ja´en, E-23071 Ja´en, Spain Belgian Nuclear Research Centre (SCK◦CEN), Mol, and Ghent University, Belgium [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. This paper focuses on some automated reasoning issues for a kind of lattice-valued logic LP (X) based on lattice-valued algebra. Firstly some extended strategies from classical logic to LP (X) are investigated in order to verify the α-satisﬁability of formulae in LP (X) while the main focus is given on the role of constant formula played in LP (X) in order to simply the veriﬁcation procedure in the semantic level. Then, an α-lock resolution method in LP (X) is proposed and the weak completeness of this method is proved. The work will provide a support for the more eﬃcient resolution based automated reasoning in LP (X). Keywords: lattice-valued logic; α-resolution principle; α-satisﬁability; α-lock resolution method.

1

Introduction

Automated theorem proving [1,2,3,4] is an important research topic in the realm of artiﬁcial intelligence. Theorem proving is a procedure that can be used to check whether a given logical formula F (the “goal”) is a logical consequence of a set of formulae N (the “theory”), the equivalent treatment is to validate the unsatisﬁability of the set N ∪ {¬F }. Resolution principle in classic logic, proposed by Robinson [1], is a single rule of inference for a test of unsatisﬁability of a logical formula, proceeds by constructing refutation proofs, i.e., proofs by contradiction. A resolution algorithm in classical logic has three obvious features: (1) the formulas are basically sets of clauses each of which is a disjunction of literals. The forms of literals are simple because they usually contain neither constants nor implication connectives; (2) a resolution algorithm is constructed to prove the unsatisﬁability of a logical formula, i.e., only one level of truth-value status is considered (i.e., false, denoted as O), called O-resolution; and (3) to judge if two E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 320–327, 2010. c Springer-Verlag Berlin Heidelberg 2010

α-Satisﬁability and α-Lock Resolution for a Lattice-Valued Logic LP(X)

321

literals being O-resolvent can be simpliﬁed into judging if the two literals are the complementary pair. Furthermore, in order to avoid excessive redundant clauses production during the resolution process, some restrictions should be imposed on resolution. Of course, the completeness of resolution principle should also be presupposed as far as possible. In this sense, semantic resolution, lock resolution and linear resolution based on classic logic have been proposed given [2,3,4]. Incomparability is a kind of uncertainty often associated with human’s intelligent activities not only in the processed object itself, but also in the course of the object being dealt with. It is a kind of overall uncertainty of objects caused due to the complexity of objects itself associated with many factors and the inconsistent token among those factors, which occurs inevitably in the process of dealing with the complex objects. In order to deal with the uncertain information especially for incomparability in the intelligent computation from the logical view, lattice implication algebra [5,6] (LIA), lattice-valued logic [6] based on LIA, and approximate reasoning [6] (i.e. uncertainty reasoning and automated reasoning) in lattice-valued logic based on LIA were proposed and studied. For automated reasoning aspect, αresolution principle, were deeply investigated in [7,8,9,10]. Lattice-valued propositional logic LP (X)[6] based on LIA extends the classical logic in many ways such as the valuation ﬁeld, the implication connective and the language. The valuation ﬁeld extends the set {0, 1} to a lattice L and the implication connective is not Kleene’s but a more general operator in latticevalued logic. For the language aspect, the symbols in LP (X) include the set of constants, but not in classical logic. They make the capability of expression and transmission be improved. Moreover, the logical formula in LP (X) can be expressed in diﬀerent truth value levels, so it can depict and process the information more naturally in the logical view. However, the logical formula which includes constants makes the system more complex. For example, the propositional variables in classical logic can be taken values in {0, 1} freely, but the valuation of the logical formulae in LP (X) may not be taken in the whole valuation ﬁeld L. Therefore, when we discuss the α-resolution and some properties in LP (X), many conclusions in classical logic may not hold here. In this paper, after a brief overview about lattice-valued logic based on LIA in Section 2, the logical formula which includes constants is discussed in Section 3, the logical formulae which are comparable with the resolution level α can be deleted or not be considered before the α-resolution, which aims to simplify the structure of the generalized literals and extend the resolution methods in classical logic to LP (X) more conveniently. Furthermore, some rules in classical logic are extended to LP (X) to verify the α-satisﬁability of formulae in LP (X) in Section 4. Finally, in order to improve the eﬃciency of α-resolution in latticevalued logic, an α-lock resolution method based on LP (X) is proposed and the weak completeness of this method is also proved in Section 5. Section 6 concludes the paper.

322

2

X. He et al.

Preliminaries

Among extensive research results on LIA and their corresponding lattice-valued propositional logic LP (X)[6] and lattice-valued ﬁrst-order logic LF (X)[6], we only outline elementary concepts which are closely relevant to this work for the convenience of readers. For further details about the background and properties of LIA and LP (X), we refer to the related references, e.g., [6,7,8,9]. Definition 1. [6] Let (L, ∨, ∧, 0, 1) be a bounded lattice with an order-reversing involution “ ”, 1 and 0 the greatest and the smallest element of L, respectively, and →: L × L −→ L be a mapping. (L, ∨, ∧, , →, 0, 1) is called a implication algebra if the following conditions hold for any x, y, z ∈ L: (I1 ) (I2 ) (I3 ) (I4 ) (I5 ) (I6 ) (I7 )

x → (y → z) = y → (x → z)(exchange property), x → x = 1(identity), x → y = y → x (contraposition), x → y = y → x = 1 implies x = y, (x → y) → y = (y → x) → x, (x ∨ y) → z = (x → z) ∧ (y → z), (x ∧ y) → z = (x → z) ∨ (y → z).

Definition 2. [6] (Lukasiewicz implication algebra on ﬁnite chains) Consider the set Ln = {ai |i = 1, 2, . . . , n}. For any 1 ≤ j, k ≤ n, deﬁne aj ∨ ak = amax{j,k} , aj ∧ ak = amin{j,k} ,

(aj ) = an−j+1 , aj → ak = amin{n−j+k,n} . then (Ln , ∨, ∧, , →, 0, 1) is a lattice implication algebra. In the following text, we always assume that (L, ∨, ∧, , →, 0, 1) is an LIA, in short L. Definition 3. [6] Let X be a set of propositional variables, T = L ∪ {, →} be a type with ar()=1, ar(→)=2 and ar(a)=0 for every a ∈ L. The propositional algebra of the lattice-valued propositional calculus on the set X of propositional variables is the free T algebra on X and is denoted by LP(X). Proposition 1. [6] LP (X) is the minimal set Y which satisﬁes: (1) X ∪ L ⊆ Y . (2) If p, q ∈ Y , then p , p → q ∈ Y .

α-Satisﬁability and α-Lock Resolution for a Lattice-Valued Logic LP(X)

323

Note that L and LP (X) are the algebras with the same type T , where T = L ∪ { , →}. Definition 4. [6] A valuation of LP(X) is a propositional algebra homomorphism γ : LP (X) → L. If γ is a valuation of LP(X), we have γ(α) = α for every α ∈ L. Remark 1. Specially, when L = Ln , LP (X) is denoted as Ln P (X). Definition 5. [6] Let p ∈ LP (X), α ∈ L. If there exists a valuation γ of LP(X) such that γ(p) ≥ α, p is satisﬁable by a truth-value level α, in short, α − satisf iable; if γ(p) ≥ α for every valuation γ of LP(X), p is valid by the truth-value level α, in short, α − valid. If α = I, then p is valid simply. Definition 6. [6] Let p ∈ LP (X), α ∈ L. If γ(p) ≤ α for every valuation γ of LP(X), p is always false by the truth-value level α, in short α − f alse. If α = O, then p is invalid. In the following, for the convenience, F ∈ LP (X), stands for F is a logical formula in lattice valued propositional system LP (X) based on LIA. Definition 7. [6] A lattice-valued propositional logical formula F is called an extremely simple form, in short ESF, if a lattice-valued propositional logical formula F ∗ obtained by deleting any constant or literal or implication term appearing in F is not equivalent to F . Here, the deﬁnition of literal is the same as that in classical logic. Definition 8. [6] A lattice-valued propositional logical formula F is called an indecomposable extremely simple form, in short IESF, if (1) F is an ESF containing connective → and at most. (2) For any G ∈ F, if G ∈ F in LP (X), then G is an ESF containing connective → and at most. Definition 9. [6] An IESF F is called an k-IESF if there exist exactly k implication connectives occurring in F . Definition 10. [6] All the constants, literals and IESFs are called generalized literals. Definition 11. [6] A lattice-valued propositional logical formula G is called a generalized clause (phrase), if G is a formula of the form: G = g1 ∨ . . . ∨ gi ∨ . . . ∨ gn (G = g1 ∧ . . . ∧ gi ∧ . . . ∧ gn ) where gi (i = 1, ..., n) are generalized literals. A conjunction (disjunction) of ﬁnite generalized clauses (phrases) is called a generalized conjunctive (disjunctive) normal form.

324

X. He et al.

Definition 12. [6] (α-Resolution). Let α ∈ L, and G1 and G2 be two generalized clauses of the forms: G1 = g1 ∨ . . . ∨ gi ∨ . . . ∨ gm G2 = h1 ∨ . . . ∨ hj ∨ . . . ∨ hn If gi ∧ hj , then G = g1 ∨ . . . ∨ gi−1 ∨ . . . ∨ gi+1 ∨ . . . ∨ h1 ∨ . . . ∨ hj−1 ∨ . . . ∨ hj+1 ∨ . . . ∨ hn is called an α-resolvent of G1 and G2 , denoted by G = Rα (G1 , G2 ), and gi and hj form an α-resolution pair, denoted by (gi , hj ) − α. Generation of an αresolvent from two clauses, called α-resolution, is the sole rule of inference of the α-resolution principle. Definition 13. Let S = G1 ∧ . . . ∧ Gi ∧ . . . ∧ Gn be a generalized conjunctive normal form in LP(X), where Gi (i = 1, 2, . . . , n) be generalized clauses. S is called a generalized clause set, if the logical symbol “∧” in S is rewritten as the logical symbol “,” and S has this form S = {G1 , G2 , . . . , Gn }. Definition 14. A generalized clause G in LP(X) is called a unit generalized clause, if G is composed of only one generalized literal g, and this generalized literal g is called a unit generalized literal. Definition 15. Let S be a generalized clause set in LP(X), a generalized literal g is called a pure generalized literal, if g does not exist in S.

3

α-Satisfiability of Formulae in LnP (X)

In this section, we discuss the generalized literals which include constants in Ln P (X) and extend some classical satisﬁability veriﬁcation rules to Ln P (X). Theorem 1. (α-Valid Rule) Let S = {G1 , G2 , . . . , Gn } be a generalized clause set in Ln P (X), α ∈ Ln . If there exists a generalized clause Gi > α(i = 1, 2, . . . , n), then S − Gi ≤ α if and only if S ≤ α Proof. Let S = Gi ∪ {S − Gi }, for any valuation γ, if γ(S − Gi ) ≤ α, it follows that γ(S) ≤ γ(S − Gi ). On the contrary, if there exists a valuation γ0 , such that γ0 (S) > α, then γ0 (Gi ∪ {S − Gi }) > α. Hence, γ0 (S − Gi ) > α by Gi ≥ α. It is a contradiction to S − Gi ≤ α. Similar to Theorem 1, we can establish the following corollary. Corollary 1. Let S = {G1 , G2 , . . . , Gn } be a generalized clause set in Ln P (X), gj is one of generalized literals of Gi , α ∈ Ln . Then the following conclusions hold: (1) If gj > α, then S − Gi ≤ α if and only if S ≤ α.

α-Satisﬁability and α-Lock Resolution for a Lattice-Valued Logic LP(X)

325

(2) If gj ≤ α, then there exist two cases as follows: (i) If gj is a unit generalized literal, then S ≤ α. (ii) If gj is not a unit generalized literal, then S − {gj } ≤ α if and only if S≤α Proposition 2. Let g be an IESF in Ln P (X), Ln = {ai |i = 1, 2, . . . , n}, the following conclusions hold: (1) Let a be a constant in Ln P (X). If g includes a, the truth value of g can be taken in one of the following ﬁve cases, that is, g ∈ [a, an ), g ∈ (a1 , a], g ∈ [a , an ), g ∈ (a1 , a ] and g ∈ Ln . (2) Let a1 , b1 be constants in Ln P (X). If g includes k constants (where k ∈ Z + , and k ≥ 2). The truth value of g can be taken in one of the following four cases, that is, g ∈ [a1 , b1 ], g ∈ (a1 , b1 ], g ∈ [a1 , an ) and g ∈ Ln . Remark 2. From the discussion above, the logical formulae which are comparable with the resolution level α should not be considered (i.e. deleted or determined), so two types of generalized literals are remained in Ln P (X) in the semantic view: (1) Propositional variables. These generalized literals can be taken values in the whole valuation ﬁeld Ln . (2) Some IESFs which include no constant and are incomparable with the resolution level α. These generalized literals can be taken values greater than α or less than α. Therefore, after the pretreatment, the valuations of the remained generalized literals in Ln P (X) can be greater than α or less than α. Theorem 2. (Unit generalized literal rule). Let S = {G1 , G2 , . . . , Gn } be a generalized clause set in Ln P (X), α ∈ Ln and ∨a∈Ln (a ∧ a ) ≤ α < I. If there exists a unit generalized literal g in S, then delete all generalized clauses which include g, and get a generalized clause set S1 . The following conclusions hold: (1) If S1 = φ, then S is α-satisﬁable. (2) If S1 = φ, then delete the generalized literal g in S1 , and get a generalized clause set S2 . S ≤ α if and only if S2 ≤ α. Corollary 2. Let S = {G1 , G2 , . . . , Gn } be a generalized clause set in Ln P (X), α ∈ Ln . If S ≤ α, then the following conclusions hold: (1) Delete the generalized literal g and the generalized clauses which include g , and get a generalized clause set S1 , then S1 ≤ α. (2) Delete the generalized literal g and the generalized clauses which include g, and get a generalized clause set S2 , then S2 ≤ α. Theorem 3. (Pure generalized literal rule). Let S be a generalized clause set in Ln P (X), α ∈ Ln . If there exists a pure generalized literal g in S, then delete all generalized clauses which include g, and get a generalized clause set S1 . The following conclusions hold. (1) If S1 = φ, then S is α-satisﬁable. (2) If S1 = φ, then S ≤ α if and only if S1 ≤ α.

326

X. He et al.

Theorem 4. (Splitting rule). Let S be a generalized clause set in Ln P (X), α ∈ Ln . S can be written in the following form: (A1 ∨ g) ∧ . . . ∧ (Am ∨ g) ∧ . . . ∧ (B1 ∨ g ) ∧ . . . ∧ (Bn ∨ g ) ∧ R, where Ai , Bi , R do not include g and g . Let S1 = A1 ∧ . . . ∧ Am ∧ R, S2 = B1 ∧ . . . ∧ Bn ∧ R, then S ≤ α if and only if S1 ≤ α and S2 ≤ α.

4

α-Lock Resolution Method in LnP (X)

In this section, we give the deﬁnitions of lock generalized clause, α-lock resolution and α-lock deduction, and discuss the soundness and weak completeness of αlock resolution method in Ln P (X). Definition 16. Let G be a generalized clause in Ln P (X), each occurrence of a generalized literal in G is assigned a positive integer in the lower left corner (the same generalized literals can be labeled diﬀerent positive integer), this speciﬁc generalized clause G is called a lock generalized clause, and the positive integer in the generalized literal is called a lock index. Definition 17. Let G be a lock generalized clause in Ln P (X). Suppose that G contains generalized literals which have the same name with diﬀerent indices, then delete the generalized literals with larger indices. This process is called amalgamation. Definition 18. Let G1 and G2 be two generalized clauses in Ln P (X), α ∈ Ln . G = RαL (G1 , G2 ) is called an α-lock resolvent of G1 and G2 if it satisﬁes the following conditions. (1) G is the α-resolvent of G1 and G2 . (2) The α-resolvent generalized literals in G1 and G2 have the minimal indices respectively. Definition 19. Let S be a ﬁnite generalized clause set in Ln P (X), and all generalized literals in S are assigned lock indices. An α-resolution deduction from S is called an α-lock deduction if each α-resolution in the deduction process is an α-lock resolution. An α-lock deduction of from S to α-empty clause is called an α-lock proof of S. Theorem 5. (Soundness Theorem). Let S be a ﬁnite generalized clause set in Ln P (X), and all generalized literals in S are assigned lock indices. {D1 , D2 , . . . , Dm } is an α-lock resolution deduction from S to a generalized clause Dm . If Dm ≤ α, then S ≤ α. Theorem 6. (weak completeness theorem). Let S be a ﬁnite generalized clause set in Ln P (X), and all generalized literals in S are assigned lock indices. Let α ∈ Ln and ∨a∈Ln (a ∧ a ) ≤ α < I. If S ≤ α, then there exists an α-lock deduction of from S to α-empty clause.

α-Satisﬁability and α-Lock Resolution for a Lattice-Valued Logic LP(X)

5

327

Conclusion

In this paper, the generalized literals in LP (X) which include constants are discussed in details, it is shown that the generalized literals which are comparable with the resolution level α can be deleted or not be considered in the resolutionbased automated reasoning process. Finally, an α-lock resolution method based on Ln P (X) is proposed and the weak completeness of this method is proved. The further research will be concentrated on contriving an algorithm to achieve the eﬃciency of the α-lock resolution method in Ln P (X), and extending the α-lock resolution method to more general logic systems such as LP (X) and LF (X). The proposed work aimed at handling imprecise and incomparable information, especially when the truth-valued ﬁeld is taken as a lattice-ordered linguistic valued structure, i.e., the truth-value is assigned as linguistic term (e.g., possibly true, very true, more or less true etc) instead of using numerical value in [0, 1]. Hence this will lead to an application in qualitative reasoning and qualitative linguistic valued decision making. The proposed automated reasoning algorithm placed a foundation for linguistic-valued based symbolic reasoning, logic programming, and decision making. Acknowledgments. This work is partially supported by the National Natural Science Foundation of China (Grant No. 60875034) and the Specialized Research Foundation for the Doctoral Program of Higher Education of China (Grant No. 20060613007), and the research project TIN-2009-08286 and P08-TIC-3548.

References 1. Robinson, J.P.: A machine-oriented logic based on the resolution principle. J. ACM 12, 23–41 (1965) 2. Liu, X.H.: Resolution-Based Automated Reasoning. Academic Press, Beijing (1994) (in Chinese) 3. Wos, L.: Automated Reasoning: 33 Basic Research Problems. Prentice Hall, New Jersey (1988) 4. Wang, G.J., Zhou, H.J.: Introduction to Mathematical Logic and Resolution Principle, 2nd edn. Science Press, Beijing (2006) 5. Xu, Y.: Lattice implication algebra. J. Southwest Jiaotong University 1, 20–27 (1993) 6. Xu, Y., Ruan, D., Qin, K.Y., Liu, J.: Lattice-Valued Logic: An Alternative Approach to Treat Fuzziness and Incomparability. Springer, Berlin (2003) 7. Xu, Y., Ruan, D., Kerre, E.E., Liu, J.: α-resolution principle based on lattice-valued propositional logic LP (X). Information Science 130, 195–223 (2000) 8. Xu, Y., Ruan, D., Kerre, E.E., Liu, J.: α-resolution principle based on ﬁrst-order lattice-valued logic LF (X). Information Science 132, 221–239 (2001b) 9. Ma, J., Li, W.J., Ruan, D., Xu, Y.: Filter-based resolution principle for latticevalued propositional logic LP (X). Information Sciences 177, 1046–1062 (2007) 10. Liu, J., Ruan, D., Xu, Y., Song, Z.M.: A resolution-like strategy based on a latticevalued logic. IEEE Transaction on Fuzzy System 11(4), 560–567 (2003)

On Compactness and Consistency in Finite Lattice-Valued Propositional Logic Xiaodong Pan1 , Yang Xu1 , Luis Martinez2 , Da Ruan3 , and Jun Liu4 1

Intelligent Control Development Center, Southwest Jiaotong University, Chengdu 610031, Sichuan, PR China 2 Department of Computing, University of Ja´en, E-23071 Ja´en, Spain

3 Belgian Nuclear Research Centre (SCKCEN), Mol, and Ghent University, Belgium 4 School of Computing and Mathematics, University of Ulster, Northern Ireland, UK

Abstract. In this paper, we investigate the semantical theory of finite latticevalued propositional logic based on finite lattice implication algebras. Based on the fuzzy set theory on a set of formulas, some propositions analogous to those in the classical logic are proved, and using the semantical consequence operation, the consistence and compactness is investigated. Keywords: Lattice-valued logic; Consequence operation; Compactness; Fuzzy theory; Consistency.

1 Introduction In recent years, the theory of fuzzy sets have been applied widely to research of fuzzy logic (e.g. see [8, 10, 15, 16, 20]). Using fuzzy sets of formulas in the semantical and syntactic inference. In [14], Pavelka incorporated internal truth value in the language, defined a semantical consequence operation as an extension of the classical case and proved a lot of important results about its axiomatizability. According to the discussion presented in [16], Pavelka’s fuzzy logic is a fuzzy logics with evaluated syntax, in which each formula is in the syntax assigned a value, as a consequence, the concept of proof in classical setting becomes evaluated proof in his fuzzy logic, i.e., proving a formula to be true to some degree. This is a generalization of many-valued logic in that in the former we infer new facts along with their truth values whereas in many-valued logic one infers only those facts that are absolutely true (have truth value 1). In [17, 18], Nov´ak extended Pavelka’s approach to Łukasiewicz first order logic. In [22], Xu extended Pavelka’s approach in a relatively generalied lattice, some important conclusions about uncertain reasoning and automated reasoning have been obtained (see e.g. [23-25]). In [11, 12], Ma et al. presented a filter-based resolution principle for this kind of logic and also considered the application problem of this kind of logic in machine intelligence. In addition, it is well known that whether a theory is consistent or inconsistent is one of the crucial questions in fuzzy logic, in [21, 26-28], Zhou. etc. investigated the consistent E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 328–334, 2010. c Springer-Verlag Berlin Heidelberg 2010

Compactness and Consistency in Lattice-Valued Logic

329

degree of theory by means of deduction theorems, standard completeness theorems and satisfiability degrees of formulas in several typical fuzzy logic systems. In [19], closure operators introduced by Tarski in 1930 and their use in classical logic are well-known, in Pavelka’s fuzzy logic theory, the closure operators play a significant role, the semantic and syntactic closure operators have been defined as mappings from fuzzy sets of formulas to fuzzy sets of formulas, which extended the concept of closure operator in Tarski’s sense in a natural way(see [14]). Concerning closure operators, one of the first works was done by Mich´alek in [13] in the framework of fuzzy topological spaces. From the point of view of fuzzy set theory, Bˇelohl´avek investigated closure operators and related structrues in [1, 2, 3]. During the last decades or so, the closure operators have been also studied in the context of the fuzzy logic taking the chain L [0 1] as a special case [4-7, 9]. In order to make fuzzy logic to work for approximate reasoning better, in this position paper, based on the fuzzy theory on a set of formulas, we investigate the consistence and compactness of semantical consequence operation in finite lattice-valued propositional logic.

2 Preliminaries In this section, for the purpose of reference, we introduce some basic definitions and results about closure operation, lattice implication algebras and lattice-valued logic, and notation conventions we shall use throughout this paper. Definition 1. [22] Let (L O I) be a bounded lattice with an order-reversing involution ¼ , I and O the greatest and smallest element of L respectively, and : L L L be a mapping. (L ¼ O I) is called a lattice implication algebra if the following conditions hold for any x y z L: (I1 ) x (y z) y (x z), (I2 ) x x I, (I3 ) x y y¼ x¼ , (I4 ) x y y x I x y, (I5 ) (x y) y (y x) x (L1 ) (x y) z (x z) (y z), (L2 ) (x y) z (x z) (y z). Remark 1. In a lattice implication algebra (L ¼ O I), by defining a binary operation as follows: for any x y L, x y (x y¼ )¼ , we can show that (L I) is a residuated lattice, but the converse is not always true. For example, let ([0 1] min ) be a G¨odel structure, where x y is 1 for x y and y elsewhere. It is easy to prove that ([0 1] min ) is a residuated lattice, but since 0 06 0 06 (08 07) 08 (06 07) 08 1 1, thus it is not a lattice implication algebra. (More details, please refer to [22]). In what follows, unless otherwise stated, L always represents any given finite lattice implication algebra. The set of all natural numbers will be denoted by N, the set N 0 will be denoted by N . Let M be a nonempty set, L M denotes all L-fuzzy set on M. If the set supp(A) x A(x) O is finite, then A is called a finite fuzzy set.

330

X. Pan et al.

By LP we denote the lattice-valued propositional logic based on finite lattice implication algebras L. In LP, the formula set F is a ( & )-type free algebra generated ¯ where S is the set of propositional variables, L¯ a¯ a L, a¯ L¯ is a by set S L, nullary operation. Definition 2. The mapping v : F L is called a valuation if v( A) v(A)¼ v(A&B) v(A) v(B) (v(A) v(B)¼)¼ v(A B) v(A) v(B), and v(¯a) a for any a L. The set T of all valuations is called the semantics of LP. Definition 3. [14] Let X LF A F a L. The mapping CT : LF

v T v X is called the Lsemantic consequence operation on F .

LF X

Remark 2. Let X LF , X is called an fuzzy theory on F , for any A F , X(A) L denote the initial truth value of the formula A, it has been given in advance, so X can also be viewed as an information qua premiss.

3 Semantical Consequence Operation and Consistency of Fuzzy Theory In this section, we generalize the classical set of formulae to L-set in the level of D, discuss the properties of L-semantic consequence operation, and study the problem of the consistency. In what follows, we always admit that is a subset of F and D be a subset of L satisfying (i). I D O D; (ii). for any x y L such that x y, x D implies y D. ¯ then Definition 4. Let F , define D X LF A F , if A and A L, ¯ then X(A) ; otherwise, X(A) O. X(A) D; if A and A ¯ L, Remark 3. In definition above, is a finite subset of F as usual. Let X D , for every A F , by X(A) we mean a truth value for A in X in the level of D, it have been given in advance. In fact, this can be viewed as a generalization of a premise formulas set in the classical semantic deduction. Definition 5. Let F , v T . v is called as a model of in the level of D, or v ¯ is called satisfiable satisfies in the level of D if v(A) D for every A and A L. in the level of D if there exists a valuation v, which satisfies in the level of D. By definition 4 and 5, the following proposition is obvious. Proposition 1. is satisfiable in the level of D if and only if there exist X v T such that v X.

D

and

In definition 3, Pavelka extended semantical consequence operation in the classical logic to L-consequence operation. In what follows, we discuss the properties of the semantical consequence operation CT . Definition 6. Let X LF , if v T v X , then we assign IF to X, i.e. CT X IF , where IF is the greatest element of LF which is the constant map equal to I on the whole of F . In this case, X is said to be inconsistent with regard to T ; otherwise, X is said to be consistent. If for all X D , X is consistent, then is said to be consistent with regard to T in the level of D.

Compactness and Consistency in Lattice-Valued Logic

331

By Definition 5 and 6, the following proposition is obvious. Proposition 2. is satisfiable in the level of D if and only if there exists X that X is consistent with regard to T .

LF , if there exist A F

Proposition 3. For any X

CT X (A) CT X ( A)

D

such

and A L¯ such that

«¾L

¼

then X is inconsistent with regard to T . Proof. Assume that X is consistent, then there exists v0 T such that v0 X, it follows that v0 (A) v0 ( A) CT X (A) CT X ( A) for any A F , since there exist A F and A L¯ such that

CT X (A) CT X ( A)

so v0 (A) v0 ( A) «¾ L

¼

, but «¾L

¼

«¾L

¼

v0(A)(v0(A))

¼

v0 (A) v0 ( A),

this is impossible. Hence, X is inconsistent. By definition 6 and proposition 3, it is easy to prove the following conclusion. Corollary 1. For any X exists A F such that

LF , X is inconsistent with regard to T

CT X (A)

CT X ( A)

if and only if there

I

In classical logic, by M A we mean that v(A) 1 for any v T satisfying v(M) 1. That is to say, A Dv M Dv v T , where Dv

p F v(p) 1. In the following, by the semantical consequence operation, we extend the above concept to lattice-valued logic, we can obtain the following conclusion:

Theorem 1. Let A F , if CT X (A) v T satisfying v( ) D.

D for any X

D

, then v(A)

D for any

4 The Compactness of Semantical Consequence Operation It is well known that compactness is an important property of classical logic, which establishes a link between infinity and finity. In this section, we discuss the compactness of semantical consequence operation CT . Proposition 4. Let BCT

Y LF CT Z Y for any Z Y , then for any N BCT , N BCT ; that is to say, BCT is a closure system. Proof. Assume that N BCT , let B N . If N , then B N IF , thus B IF BCT ; otherwise, for any Y1 LF , if Y1 B, then Y1 Y for any Y N , and so CT Y1 Y for any Y N . Hence, CT Y1 N B. Sum up, B BCT , BCT is a closure system, ending the proof.

332

X. Pan et al.

Y LF CT Y

Corollary 2. BCT points of CT .

Corollary 3. For any Y

Y , that is to say, BCT consists of all fixed

LF , CT Y

Z Y Z Z BC . T

Definition 7. The mapping CT is said to be compact if

CT Y Y

CT X

LF

X and Y isa f inite f uzzyset

Y

for any X LF . CT is said to have the property of preserving directed joins if CT i¾I Yi

Yi i I of subsets of LF , where U is i¾I CT Yi for any directed family U said to be a directed family if for any Yi Y j U , there exists Yk U such that Yi Yk and Y j Yk .

Theorem 2. CT is compact if and only if it have the property of preserving directed joins.

Yi i I is a directed family of Proof. (Necessity)Assume that CT is compact, U subsets of LF . Let Y0 Y , on the one hand, since CT is a closure operation, so i i¾I C Y C Y . On the other hand, C is compact, it follows that T i T 0 T i¾I

CT Y0

CT Z Z

LF

Y0 and Z isa f inite f uzzyset

Z

and then for any A F ,

CT Y0(A) CT Z(A) Z LF Z Y0 and Z isa f inite f uzzyset

as CT Y0 (A) CT Z (A) L and L is finite lattice implication algebra, so there exist Z1 Z2

Zn

Y0 (n N

1 n L ) such that

in CT Y0 (A) CT Zi(A)

i1

For any Z j Y0 i¾I Yi , since Z j is finite fuzzy set, so there exist Y j1 Y j2 Y jk ( j1 jk I) such that Z j Y j1 Y j2 Y jk , by U is directed, it follows that there exists Y j0 U such that Y ji Y j0 (i 1 k), thus Z j Y j0 . Similarly, we can prove that there exists Y£ U such that Y j0 Y£ ( j 1 n), hence Z j Y£ ( j 1 n). Therefore,

CT Y0 (A) CT Z1(A) CT Z2(A) CT Y (A) CT Yi (A)

CT Zn(A)

£

i¾I

It shows that CT Y0 (A) i¾I CT Yi (A). Sum up, CT Y0 (A) i¾I CT Yi (A), that is to say, CT have the property of preserving directed joins. (SuÆciency) Assume that CT have the property of preserving directed joins. Due to the fact that for any Y LF , Y is the union of some finite subsets of LF , obviously, the family of all finite subsets of LF is directed, hence CT is compact. This proof is completed.

Compactness and Consistency in Lattice-Valued Logic

333

5 Conclusion The semantical consequence operation, the consistency and compactness of a latticevalued prepositional logic LP(X) are investigated in this paper, which enhance the theoretical foundation of this logic system and provide a theoretical support for approximate reasoning to handle fuzziness and incomparability. Acknowledgments. The work is partially supported by the Natural Science Foundation of China (Grant no. 60875034) and the research projects TIN-2009-08286 and P08TIC-3548.

References 1. Bˇelohl´avek, R.: Fuzzy closure operators. Journal of Mathematical Analysis and Applications 262, 473–489 (2001) 2. Bˇelohl´avek, R.: Fuzzy closure operators II: induced relations, representation, and examples. Soft computing 7, 53–64 (2002) 3. Bˇelohl´avek, R.: Fuzzy Relational Systems: Foundations and Principles. Kluwer, New York (2002) 4. Biacino, L., Gerla, G.: An extension principle for closure operators. Journal of Mathematical analysis and applications 198, 1–24 (1996) 5. Biacino, L., Gerla, G.: Closure Operators for Fuzzy Subsets. In: Proceeds First European Congress on Fuzzy and Intelligent Technologies, Aachen (1993) 6. Castro, J.L., Trillas, E.: Tarski’s fuzzy consequences. In: Proc. Internat. Fuzzy Eng. Symp. 1991, vol. 1, pp. 70–81 (1991) 7. Castro, J.L., Trillas, E., Cubillo, S.: On consequence in approximate reasoning. J. Appl. NonClassical Logics 4(1), 91–103 (1994) 8. Cintula, P.: From fuzzy logic to fuzzy mathematics. Ph.D.Thesis, Czech Technical University, Prague (2005) 9. Gerla, G.: Comparing fuzzy and crisp deduction systems. Fuzzy Sets and Systems 67, 317– 328 (1994) 10. H´ajek, P.: Metamathematics of Fuzzy Logic. Kluwer Academic Publishers, Dordrecht (1998) 11. Ma, J., Li, W., Ruan, D., Xu, Y.: Filter-based resolution principle for lattice-valued propositional logic LP(X). Information Sci. 177, 1046–1062 (2007) 12. Ma, J., Chen, S., Xu, Y.: Fuzzy logic from the viewpoint of machine intelligence. Fuzzy Sets Syst. 157, 628–634 (2006) 13. Mich´alek, J.: Fuzzy Topologies. Kibernetika II 5, 345–354 (1975) 14. Pavelka, J.: On fuzzy logic I: Many-valued rules of inference, II: Enriched residuated lattices and semantics of propositional calculi, III: Semantical Conpleteness of some many-valued propositional calculi. Zeitschr. F. Math. Logik. Und. Grundlagend Math. 25, 45–52 (1979) 15. Nov´ak, V., Perfilieva, I., Moˇckoˇr, J.: Mathematical Principles of Fuzzy Logic. Kluwer, Boston (1999) 16. Nov´ak, V.: Which logic is the real fuzzy logic? Fuzzy Sets and Systems 157, 635–641 (2006) 17. Nov´ak, V.: On the syntactico-semantical completeness of first-order fuzzy logic. Part I, syntax and semantics, Kybernetika 26, 47–66 (1990) 18. Nov´ak, V.: On the syntactico-semantical completeness of first-order fuzzy logic. Part II, main results, Kybernetika 26, 134–154 (1990) 19. Tarski, A.: Logic, Semantics and Metamathematics. Clarendon Press, Oxford (1956)

334

X. Pan et al.

20. Turunen, E.: Mathematics behind fuzzy logic. In: Advances in Soft Computing. PhysicaVerlag, Heidelberg (1999) 21. Wang, G.J., Zhang, W.X.: Consistency degrees of finite theories in£ukasiewicz propositional fuzzy logic. Fuzzy Sets and Systems 149, 275–284 (2005) 22. Xu, Y., Ruan, D., Qin, K.Y., Liu, J.: Lattice-Valued Logic-An Alternative Approach to Treat Fuzziness and Incomparability. Springer, Heidelberg (2003) 23. Xu, Y., Ruan, D., Kerre, E.E., Liu, J.: «-Resolution principle based on lattice-valued propositional logic LP(X). Information Sci. 130, 1–29 (2000) 24. Xu, Y., Ruan, D., Kerre, E.E., Liu, J.: «-Resolution principle based on first-order latticevalued logic LF(X). Information Sci. 132, 221–239 (2001) 25. Xu, Y., Liu, J., Ruan, D.: Tsu-Tian Lee: On the consistency of Rule Bases Based on latticevalued first-order logic LF(X). Internat. J. Intelligent Systems 21, 399–424 (2006) 26. Zhou, X.N., Wang, G.J.: Consistency degrees of theories in some systems of propositional fuzzy logic. Fuzzy Sets and Systems 152, 321–331 (2005) 27. Zhou, H.J., Wang, G.J.: Generalized consistency degrees of theories w.r.t. formulas in several standard complete logic systems. Fuzzy Sets and Systems 157, 2058–2073 (2006) 28. Zhou, H.J., Wang, G.J.: Characterizations of maximal consistent theories in the formal deductive system L (NM-logic) and Cantor space. Fuzzy Sets and Systems 158, 2591–2604 (2007)

Lattice Independent Component Analysis for Mobile Robot Localization Ivan Villaverde, Borja Fernandez-Gauna, and Ekaitz Zulueta Computational Intelligence Group Dept. CCIA, UPV/EHU, Apdo. 649, 20080 San Sebastian, Spain www.ehu.es/ccwintco

Abstract. This paper introduces an approach to appearance based mobile robot localization using Lattice Independent Component Analysis (LICA). The Endmember Induction Heuristic Algorithm (EIHA) is used to select a set of Strong Lattice Independent (SLI) vectors, which can be assumed to be Aﬃne Independent, and therefore candidates to be the endmembers of the data. Selected endmembers are used to compute the linear unmixing of the robot’s acquired images. The resulting mixing coeﬃcients are used as feature vectors for view recognition through classiﬁcation. We show on a sample path experiment that our approach can recognise the localization of the robot and we compare the results with the Independent Component Analysis (ICA).

1

Introduction

Navigation is the ability of an agent to move around its environment with a speciﬁc purpose. A necessary ability for navigation is self-localization: the capacity of the robot to ascertain, more or less accurately, “where it is” from the information provided by its sensors. This knowledge makes possible other navigation related tasks like path planning. But this ability also requires a previously known model of the environment, a map, which can be built oﬀ-line, in a previous training step, or on-line, as the robot explores new space. Topological maps are one of the most common types of such maps. These kind of maps do not store any metric relationship between environment elements, but mere neighborhood relationships between reference locations inside the environment. One popular approach to topological maps are the ones based on appearance based models [8]. Appearance based models are based on view matching, being the maps formed by collections of images taken at diﬀerent spots of the environment, usually along with some information of their relative position. Those images are stored in the nodes of a graph in which the links between nodes indicate an appearance or spatial neighborhood, building in this way a topological map. Localization will be, then, to ﬁnd the stored image resembling more closely the view at the robot’s current location. This resembling will be based on global image features like color histograms [17], edge density [15] or PCA [10] or ICA [12] descriptors, instead of local features tracked from image to image. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 335–342, 2010. c Springer-Verlag Berlin Heidelberg 2010

336

I. Villaverde, B. Fernandez-Gauna, and E. Zulueta

In this paper we propose the application to appearance based localization of the approach called Lattice Independent Component Analysis (LICA), introduced in [4]. This approach consists of two steps: First it selects Strong Lattice Independent (SLI) vectors from the input dataset using an heuristic algorithm, the Endmember Induction Heuristic Algorithm (EIHA) [5]. Second, because of the conjectured equivalence between SLI and Aﬃne Independence, it performs the linear unmixing of the input dataset based on these endmembers, obtaining the feature vectors of each input data. Therefore, the approach is a mixture of linear and nonlinear methods. The original work using this approach was devoted to unsupervised hyperspectral image segmentation, therefore the use of the name endmember for the selected vectors. We maintain the basic assumption that the data is generated as a convex combination of a set of endmembers which are the vertices of a convex polytope covering some region of the input data. This assumption is similar to the linear mixture assumed by the Independent Component Analysis (ICA) [6] approach, however we do not impose any probabilistic assumption on the data. If we try to establish correspondences to the ICA, the endmembers correspond to the unknown sources and the mixing matrix is the one given by the abundance coeﬃcients computed by least squares estimation. The EIHA was ﬁrst proposed in [5]. In this algorithm, our approach to endmember selection from the data is based on the conjectured equivalence between the Strong Lattice Independence and the Aﬃne Independence [14]. The SLI needs two conditions: Lattice Independence and max/min dominance. Lattice Independence is detected based on results on ﬁxed points for Lattice Autoassociative Memories (LAM) [14,13,16], and max/min dominance is tested using algorithms inspired in the ones described in [18]. The LICA approach falls in the ﬁeld of Lattice Computing algorithms, which have been introduced in [3] as the class of algorithms that either apply lattice operators inf and sup or use lattice theory to produce generalizations or fusions of previous approaches. In [3] an extensive and updated list of references that can be labeled Lattice Computing can be found. The outline of the paper is as follows: Section 2 gives a brief recall of ICA. Section 3 introduces the linear mixing model. Section 4 presents results of the proposed approach on a sample path. Finally, section 5 provides some conclusions.

2

Independent Component Analysis

The Independent Component Analysis (ICA) [6] assumes that the data is a linear combination of non Gaussian, mutually independent latent variables with an unknown mixing matrix. The ICA reveals the hidden independent sources and the mixing matrix. That is, given a set of observations represented by a d dimensional vector x, ICA assumes a generative model x = As,

(1)

Lattice Independent Component Analysis for Mobile Robot Localization

337

where s is the M dimensional vector of independent sources and A is the d × M unknown basis matrix. The ICA searches for the linear transformation of the data W, such that the projected variables Wx = s

(2)

are as independent as possible. It has been shown that the model is completely identiﬁable if the sources are statistically independent and at least M −1 of them are non Gaussian. If the sources are Gaussian the ICA transformation could be estimated up to an orthogonal transformation. Estimation of mixing and unmixing matrices can be done maximizing diverse objective functions, among them the non gaussianity of the sources and the likelihood of the sample. We have used the implementations of Mean Field ICA [7] and Molgedey and Schouster ICA based on dynamic decorrelation [11], which are available at [1].

3

Linear Mixing Model and the Lattice Independent Component Analysis

The linear mixing model (LMM) [9] assumes that the data follows a linear model, which can be expressed as: x=

M

ai si + w = Sa + w,

(3)

i=1

where x is the d-dimension pattern vector (the images acquired by the robot’s camera in our case), S is the d × M matrix whose columns are the d-dimension vertices of the convex region covering the data corresponding to the so called endmembers si , i = 1, .., M, a is the M -dimension abundance vector, and w is the d-dimension additive observation noise vector. The LMM is applied when some item is assumed to be the combination of several pure items, called endmembers. In [9] the items are light spectra in the context of hyperspectral image processing, here the items are the singular images which could be used as landmarks. Abundance coeﬃcients correspond to the fraction of the contribution of each endmember to the observed item. From this physical interpretation follows that the linear mixing model is subjected to two constraints on the abundance coeﬃcients. First, to be physically meaningful, all abundance coeﬃcients must Mbe non-negative ai ≥ 0, i = 1, .., M , and, second, they must be fully additive i=1 ai = 1. As a side eﬀect, there is a saturation condition ai ≤ 1, i = 1, .., M . From a geometrical point of view, these restrictions mean that we expect the endmembers in S to be aﬃnely independent and that the convex region deﬁned by them covers all the data points. The model in eq. 3 is shared by other linear analysis approaches, such as the ICA [6], that do not view S as a set of endmembers but as regressors or independent sources. The mixing inversion process (often called unmixing) consists in the estimation of the abundance coeﬃcients, given the endmembers S and the observation

338

I. Villaverde, B. Fernandez-Gauna, and E. Zulueta

data x. The simplest approach is the unconstrained least squared error (ULSE) estimation given by: −1 T a = ST S S x. (4) The coeﬃcients that result from equation (4) do not necessarily fulﬁll the nonnegativity and full additivity conditions. From the physical interpretation point of view, the non-negativity restriction is more fundamental. The heuristic algorithm EIHA described in [5] always produces convex regions that lie inside the data cloud, so that enforcing the non-negative and full additivity restrictions would be impossible for some data points. Enforcing them for some points may introduce undesired distortions of their abundance values. Nevertheless, our attempts to use other unmixing techniques with our data have resulted in prohibitive computational times. Being this a critical issue in mobile robotics, we use systematically the unconstrained estimation of equation (4) to compute the abundance coeﬃcients, looking for a compromise solution. We call Lattice Independent Component Analysis (LICA) the approach grounded in the results and algorithm that have been described in [5,4]. LICA consists in two steps: 1. Induce from the given data a set of Strongly Lattice Independent vectors. In this paper we apply the Endmember Induction Heuristic Algorithm (EIHA) [5]. These vectors are taken as a set of aﬃne independent vectors. The advantages of this approach are (1) that we are not imposing statistical assumptions, (2) that the algorithm is one-pass and very fast because it only uses comparisons and addition, (3) that it is unsupervised and incremental, and (4) that it detects naturally the number of endmembers. 2. Apply the unconstrained least squares estimation to obtain the mixing matrix. The localization results are based on the classiﬁcation of the images using the coeﬃcients of this matrix. Therefore, the approach is a combination of linear and lattice computing: a linear component analysis where the components have been discovered by non-linear, lattice theory based, algorithms. Our reasoning for the application of the LICA to vision based mobile robot localization is as follows: When M d the computation of the convex coordinates can be interpreted as a dimension reduction process, or a feature extraction process as used in the experiment described in section 4. In contrast to the approach followed in [12], in which images were divided in windows, storing the ICA descriptor of each one, we obtain a feature vector for the full image.

4

Experimental Validation

The approach tested can be summarized as follows: we try to perform the visual recognition of designed landmark positions by a supervisedly built classiﬁer. We test two ways to compute the convex coordinates used as feature vectors for the images: the LICA and the ICA approaches. The proposal is based on the features extracted from the images with a two step process.

Lattice Independent Component Analysis for Mobile Robot Localization

339

Table 1. Classiﬁcation results using LICA (α = 7) and 3-nn #Endmembers Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Average 19 20 16 20 18 Average

0.82 0.78 0.78 0.80 0.80 0.80

0.70 0.75 0.67 0.75 0.74 0.72

0.65 0.65 0.62 0.65 0.66 0.65

0.71 0.71 0.68 0.72 0.73 0.71

0.75 0.71 0.73 0.71 0.74 0.73

0.72 0.72 0.70 0.72 0.74 0.72

Table 2. Classiﬁcation results using LICA (α = 8) and 3-nn #Endmembers Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Average 8 0.63 0.59 0.56 0.63 0.60 0.60 9 0.59 0.54 0.46 0.53 0.61 0.55 8 0.67 0.61 0.54 0.60 0.57 0.60 10 0.65 0.55 0.48 0.60 0.57 0.57 8 0.54 0.54 0.43 0.50 0.41 0.48 Average 0.62 0.57 0.49 0.57 0.55 0.56

First step consists in the induction of the endmembers from the data sample formed by the set of images captured by a robot along its travelled path. Those induced endmembers will be, in the second step, the basis for a linear unmixing of the image data. This linear unmixing will give as a result a vector of convex coordinates which will be used as feature vector for the images. We extract the endmembers with the Endmember Induction Heuristic Algorithm (EIHA) [5] and two variations of the ICA approach. 4.1

Map Building and Localization

This approach requires the full image data set that “describes” the path which is going to be mapped, which has to be recorded in a training step and processed afterwards. It is an oﬀ-line mapping algorithm. This image data set will be composed of a sequence of optical images taken at regular spaces all along the path followed by the robot. From this training data set several positions are selected to act as landmarks. Selection of those positions can follow any arbitrary pattern. The optical views from these position are transformed into the convex coordinates computed using the endmembers extracted from the whole image data set of the path by the EIHA, or the sources detected by the ICA. We assume that the selected positions divide the path into segments or regions. These path regions correspond to spatial regions where the reference landmark views are expected to be smoothly recognized1 , being these regions ideally adjacent and dense. Map building includes 1

Smooth recognition means that small displacements do not modify catastrophically the recognition.

340

I. Villaverde, B. Fernandez-Gauna, and E. Zulueta Table 3. Classiﬁcation results using Mean Field ICA and 3-nn #Indep. Comp. Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Average 5 0.32 0.31 0.31 0.30 0.24 0.30 10 0.27 0.30 0.26 0.23 0.24 0.26 15 0.36 0.33 0.32 0.34 0.32 0.33 20 0.27 0.26 0.21 0.25 0.21 0.24 25 0.69 0.62 0.54 0.65 0.53 0.61 Average

0.38

0.36

0.33

0.35

0.31

0.35

Table 4. Classiﬁcation results using Molgedey and Schouster ICA and 3-nn #Indep. Comp. Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Average 5 0.48 0.49 0.50 0.42 0.39 0.45 10 0.70 0.57 0.54 0.55 0.58 0.59 15 0.76 0.61 0.57 0.64 0.62 0.64 20 0.81 0.69 0.62 0.74 0.69 0.71 25 0.82 0.69 0.62 0.73 0.67 0.71 Average 0.71 0.61 0.57 0.62 0.59 0.62

the construction of the feature vector classiﬁer using these images as the training set. The robot self-localization process will be, thus, produced by the classiﬁcation of new acquired images in one of the regions previously deﬁned using the stored images as representatives of the region. For an input image the process is as follows: First, we perform the linear unmixing of the image with the basis of endmembers computed from the training path images. The convex coordinates are the feature vector. Second, we classify the image feature vector with the classiﬁer trained with the training path. For the validation, we count as success if the actual robot position falls in the region deﬁned by the landmark classiﬁer. This mapping approach produces a topological appearance based map, since no metric information is stored in the map and the robot only uses relative, non-precise positioning, and the localization is based on image matching. 4.2

Experimental Results

The experiments were performed over several pre-recorded image datasets, showing in this paper the results obtained over one of those datasets, as sample test. Each dataset was recorded by driving manually a Pioneer robot six times along a predeﬁned path along the corridors of our building, acquiring images at spaces of 5-6 cm, along with its related odometry measurement. The paths try to simulate possible paths that a robot should travel in one hypothetical navigation task in that building. As the robots were guided manually, each one of the travels follows a slightly diﬀerent path.

Lattice Independent Component Analysis for Mobile Robot Localization

341

For each recorded path, the ﬁrst trip was used to train the system parameters, and the ﬁve remaining trips were used as test sequences. The task to perform is to recognize a hand selected set of spatial landmark positions given their respective views of the world as taken by the robot’s camera. The landmark positions were selected on the ﬂoor plane, selecting places of practical relevancy, like doors to other laboratories. Classes of images are identiﬁed for each of the selected landmark position, assigning the images in the sequences to the closest landmark map position with similar orientation, according to its corresponding robot odometry reading. This image labelling is the ground truth for the ensuing processes. Therefore the task becomes the classiﬁcation of the newly acquired images into one of the map classes. The classiﬁcation was done using a 3-NN classiﬁer. Tables 1 and 2 show the classiﬁcation results obtained over the sample path using the proposed LICA approach. The EIHA noise tolerance α parameter has been tuned to α = 7 and α = 8, respectively, to obtain a range of desired number of endmembers. Since the EIHA has a random start, the results of 5 runs of the algorithm are shown, with diﬀerent number of endmembers induced. Tables 3 and 4 show the classiﬁcation results obtained using the abundance coeﬃcients computed from the independent components extracted with two ICA algorithms (Mean Field ICA and Molgedey and Schouster ICA). Tables show the results obtained with several numbers of independent components. It can be appreciated that the LICA approach outperforms clearly the Mean Field ICA in all cases, with similar or even greater dimensionality reduction, while performing slightly better than the Molgedey and Schouster ICA in some cases, with overall similar performance.

5

Summary and Conclusions

We have proposed and applied a Lattice Independent Component Analysis (LICA) to the appearance based mobile robot localization. The LICA is based on the application of a Lattice Computing based EIHA algorithm for the selection of the endmembers, and the linear unmixing of the data based on these endmembers. We have discussed the similarities of our approach to the application of the ICA to the same problem. In our approach the salient views acquired along the path correspond to endmembers detected by the EIHA algorithm and the spatial mixing coeﬃcients correspond to the convex coordinates obtained by unmixing the recorded images on the basis of the found endmembers. The LICA approach then uses this set of vectors to compute the abundance coeﬃcients that characterize the data and the endmember. Over these coeﬃcients we perform the robot localization, consisting in the classiﬁcation of the views on the map classes. Results in section 4 show that the convex coordinates of the data points based on the endmembers induced by the EIHA algorithm can be used as features for pattern classiﬁcation. Results show that this approach improves the Mean Field ICA approach, while on average performs similarly like the Molgedey and Schouster ICA with diﬀerent number of sources tried, improving it in some cases. Hierarchical issues [2] would be considered.

342

I. Villaverde, B. Fernandez-Gauna, and E. Zulueta

References 1. Ica:dtu site, http://isp.imm.dtu.dk/toolbox/ica/index.html 2. Graña, M., Torrealdea, F.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 3. Graña, M.: A brief review of Lattice Computing. In: IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2008 (IEEE World Congress on Computational Intelligence), June 2008, pp. 1777–1781 (2008) 4. Graña, M., Savio, A.M., García-Sebastián, M., Fernandez, E.: A Lattice Computing approach for on-line fMRI analysis. Image and Vision Computing (in Press Corrected Proof, 2009) 5. Graña, M., Villaverde, I., Maldonado, J.O., Hernandez, C.: Two Lattice Computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72(10-12), 2111–2120 (2009) 6. Hyvärinen, A., Karhunen, J., Oja, E.: Independent component analysis. John Wiley and Sons, Chichester (2001) 7. Højen-Sørensen, P., Winther, O., Hansen, L.K.: Mean-ﬁeld approaches to independent component analysis. Neural Computation 14(4), 889–918 (2002) 8. Jones, S., Andresen, C., Crowley, J.: Appearance based process for visual navigation. In: Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1997, September 1997, vol. 2, pp. 551–557 (1997) 9. Keshava, N., Mustard, J.: Spectral unmixing. IEEE Signal Processing Magazine 19(1), 44–57 (2002) 10. Kröse, B., Vlassis, N., Bunschoten, R.: Omnidirectional vision for AppearanceBased robot localization. In: Hager, G.D., Christensen, H.I., Bunke, H., Klein, R. (eds.) Dagstuhl Seminar 2000. LNCS, vol. 2238, pp. 39–50. Springer, Heidelberg (2002) 11. Molgedey, L., Schuster, H.G.: Separation of a mixture of independent signals using time delayed correlations. Physical Review Letters 72, 3634–3637 (1994) 12. Munguia, R., Grau, A., Sanfeliu, A.: Matching images features in a wide base line with ICA descriptors. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 2, pp. 159–162 (2006) 13. Ritter, G.X., Gader, P.: Fixed points of Lattice Transforms and Lattice Associative Memories. In: Advances in Imaging and Electron Physics, vol. 144, pp. 165–242. Elsevier, Amsterdam (2006) 14. Ritter, G.X., Urcid, G., Schmalz, M.: Autonomous single-pass endmember approximation using Lattice Auto-Associative Memories. Neurocomputing 72(10-12), 2101–2110 (2009) 15. Sim, R., Dudek, G.: Learning generative models of scene features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. 406–412 (2001) 16. Sussner, P., Valle, M.: Gray-scale Morphological Associative Memories. IEEE Transactions on Neural Networks 17(3), 559–570 (2006) 17. Ulrich, I., Nourbakhsh, I.: Appearance-based place recognition for topological localization. In: Proceedings of IEEE International Conference on Robotics and Automation, ICRA 2000, vol. 2, pp. 1023–1029 (2000) 18. Urcid, G., Valdiviezo, J.C.: Generation of lattice independent vector sets for pattern recognition applications. In: Ritter, G.X., Schmalz, M.S., Barrera, J., Astola, J.T. (eds.) Proc. of SPIE 2007, Math. of Data/Image Pattern Recog. Compression, Coding and Encrip. with Applications X, vol. 6700, pp. 67000C:1–12. SPIE, San Jose (2007)

An Introduction to the Kosko Subsethood FAM Peter Sussner and Estev˜ao Esmi Department of Applied Mathematics, University of Campinas, Campinas, State of S˜ao Paulo, Brazil

Abstract. Inspired by the fact that in (fuzzy) mathematical morphology a (fuzzy) erosion is defined in terms of a (fuzzy) inclusion measure, we introduce a nondistributive fuzzy morphological associative memory model on the basis of the Kosko subsethood measure. Moreover, we compare the error correction capabilities of the new model and of other fuzzy and gray-scale associative memories in terms of some experimental results concerning gray-scale image reconstruction. Keywords: Fuzzy associative memory, mathematical morphology, fuzzy erosion, Kosko subsethood measure, gray-scale image reconstruction.

We have recently proposed a very general class of fuzzy associative memories (FAMs) called fuzzy morphological associative memories (FMAMs) [24] that includes many well-known FAM models. FMAMs grew out of a gray-scale associative memory model called morphological associative memory (MAM) [14,21]. In this context, the term ”morphological” refers to the fact that the nodes of (fuzzy) morphological associative memories execute elementary operations of mathematical morphology (MM) as defined in a complete lattice setting such as the extended reals or integers in the case of MAMs and the unit interval [0, 1] in the case of FMAMs [24]. Unlike Kosko’s original FAM model, the KS-FAM introduced in this paper does not comply with this definition of FMAM. Instead, we found inspiration in the roots of MM that lie in the processing and analysis of images using “structuring elements” [12]. For example, in fuzzy mathematical morphology, a fuzzy erosion of an image by a SE is given by the degree of inclusion of the translated SE at every pixel [13,20]. The label ”morphological” may be attached to the KS-FAM because each hidden node of this two-layer model performs a type of fuzzy erosion with Kosko’s subsethood [10] playing the role of the inclusion measure. Just like other FAM models and the original MAM, the KS-FAM can be used to store and retrieve gray-scale patterns. Therefore, this paper includes experiments concerning the reconstruction of gray-scale images from corrupted image cues, comparing the KSFAM with other gray-scale AMs such as the Hamming AM, the MAM WXX , the MAM WXX +ν, Kosko’s FAM, the kernel associative memory (KAM), and the optimal linear associative memory (OLAM).

This work was supported by FAPESP under grant no 2006/05868 − 5 and by CNPq under grant no. 306040/2006 − 9.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 343–350, 2010. c Springer-Verlag Berlin Heidelberg 2010

344

P. Sussner and E. Esmi

1 Some Mathematical Background The KS-FAM model introduced in this paper incorporates concepts of mathematical morphology in a different way than the original MAM and FMAM models. The following background information on MM and MAMs is indispensable in order to grant some insight into these issues. 1.1 Some Basic Notions of Mathematical Morphology Complete lattices are generally accepted as the appropriate mathematical framework of MM [8,16,17]. Let us review a few basic concepts of lattice theory and MM on complete lattices. A partially ordered set L is called a complete lattice if and only if every non-empty subset of L has an infimum and a supremum in L [1]. Examples of complete lattices are given by R±∞ = R ∪ {+∞, −∞} andRn±∞ = (R±∞ )n . For any Y ⊆ L, we denote the infimum of Y using the symbol Y and we denote supremum of Y using the symbol Y . An (algebraic) erosion ε is defined as a mapping from a complete lattice L to a complete lattice M that commutes with the infimum operator. Formally, we have ε(y) ; (1) ε( Y ) = y∈Y

Similarly, an (algebraic) dilation δ is defined as a mapping from a complete lattice L to a complete lattice M that commutes with the supremum operator. Instead of providing more details about MM on complete lattices, let us recall the origins of MM as a set-theoretical approach to binary image processing [8,12]. Later, MM was extended to gray-scale image processing. Fuzzy mathematical morphology represents one of the approaches towards gray-scale MM [6,13,20]. Let F (X) = [0, 1]X denote the class of fuzzy sets in X. The fuzzy erosion of an image a ∈ F(X) by a structuring element (SE) s ∈ F(X) arises via the following definition: EF (a, s)(x) = IncF (sx , a) ,

(2)

where IncF is a fuzzy inclusion measure [13,20] that fuzzifies the crisp inclusion measure Inc : P(X)×P(X) → {0, 1} (Inc(A, S) = 1 ⇔ A ⊆ S) and sx is the translation of the fuzzy SE s by x given by sx (y) = s(y−x) for all y ∈ X. Thus the fuzzy erosion EF (a, s) at point x is given by a degree of inclusion of the translated SE sx in the fuzzy image a. 1.2 Basic Concepts of Morphological Associative Memories Morphological associative memories (MAMs) belong to the class of morphological neural networks (MNNs) [15,24]. Despite the name ”morphological neural network”, the first MNN models were based on minimax algebra, a lattice algebra that originated from problems in machine scheduling and operations research [3,4]. J.L. Davidson exposed the close relationship between MM and minimax algebra by embedding classical binary and gray-scale MM into minimax algebra [5].

An Introduction to the Kosko Subsethood FAM

345

Let us consider the following special cases of matrix products that are defined in minimax algebra [3,4]. For A ∈ Rm×p and B ∈ Rp×n ∨ B, also ±∞ , the matrix C = A called the max product of A and B, and the matrix D = A ∧ B, also called the min product of A and B, are defined by cij =

p k=1

(aik + bkj ) , dij =

p

(aik + bkj ) .

(3)

k=1

Let A ∈ Rm×n . If εA and δA are such that εA (x) = A ∧ x and δA (x) = A ∨ x for all x ∈ Rn±∞ . then εA represents an (algebraic) erosion and δA represents an (algebraic) dilation from the complete lattice Rn±∞ into the complete lattice Rm ±∞ . These operations are employed in the recall phase of MAMs. For simplicity, we restrict ourselves to patterns with entries in R ⊆ R±∞ . Suppose that we want to record k vector pairs x1 , y1 , . . . , xk , yk using a morphological associative memory [14,21]. Let X denote the matrix in Rn×k whose column vectors are the vectors xξ ∈ Rn and let Y denote the matrix in Rm×k whose column vectors are the vectors yξ ∈ Rm , where ξ = 1, . . . , k. There are two possible recording schemes for MAMs resulting in weight matrices MXY and WXY . The first recording scheme consists in constructing an m×n matrix MXY as follows: MXY = Y ∨ (−X t ) .

(4)

The second, dual scheme consists in constructing an m × n matrix WXY of the form WXY = Y ∧ (−X t ). The recall phases of MXY and WXY are respectively given in terms of the erosion εMXY and the dilation δWXY . If Y = X then we obtain the auto-associative morphological memories (AMMs) MXX and WXX . Consider MXX in the binary case where X ∈ {0, 1}n×k . In addition, assume that MXX ∈ {0, 1}n×n . We previously observed that each entry of the min product MXX ∧ x can be computed by evaluating the crisp inclusion of a certain SE s ∈ {0, 1}n in x. Specifically, let mi ∈ {0, 1}1×n be the i-th row of MXX and let ¯ ti = 1 − mti denotes the complement of mi then we have [22]: mti = (mi )t . If m ¯ ti ≤ x , 1 if m mi ∀ i = 1, . . . , n . (5) ∧x= 0 otherwise . ¯ ti in x. Hence, mi ∧ x computes the crisp inclusion of the SE m

2 Introduction to the Kosko Subsethood Fuzzy Associative Memory The AMMs MXX and WXX have several desirable properties such as unlimited absolute storage capacity and one-step convergence when used as a dynamic model with feedback [14,21]. On the down-side, these AMMs exhibit a limited error correction capability and many spurious memories. In an attempt to improve the noise tolerance of the binary MXX model, we fuzzified Equation 5 [22]. The following observation proved to be useful for this purpose: If

346

P. Sussner and E. Esmi

f : [0, 1]n → {0, 1}n is the hard-limiting function defined below and if S : [0, 1]n × [0, 1]n → [0, 1] is Kosko’s subsethood measure then the following equations hold for all i = 1, . . . , n. 0 if x < 1 t ¯ i , x)) , where f (x) = (6) (MXX ∧ x)i = f (S(m 1 else . n i=1 xi ∧ yi and S(x, y) = for x = 0 ∈ [0, 1]n . (7) n i=1 xi By leaving away the hard-limiter f in Equation 6 we obtain the fuzzy min product of ˜ x (evaluating Kosko’s subsethood of the complement MXX and x, denoted by MXX ∧ n×k of the i-th row of A ∈ [0, 1] and the j-th column of B ∈ [0, 1]k×m yields cij , where ˜ ˜ x corresponds to the C = A ∧ B). In the terminology of MM, the ith entry of MXX ∧ t ¯ i is a subset of the fuzzy image x. degree to which the SE m Example 1. The following example illustrates the action of the fuzzy min product between the matrix MXX and binary input vectors x and y. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 10 0001 0 1 ⎢1 0⎥ ⎢0 0 0 1⎥ ⎢1⎥ ⎢0⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ X =⎢ (8) ⎣ 1 1 ⎦ , MXX = ⎣ 1 1 0 1 ⎦ , x = ⎣ 1 ⎦ , y = ⎣ 1 ⎦ , 01 1100 1 0 ⎡ ⎡ ⎤ ⎤ 2/3 2/3 ⎢ ⎥ ⎢ ⎥ ˜ x = ⎢ 2/3 ⎥ , MXX ˜ y = ⎢ 2/3 ⎥ . MXX (9) ∧ ∧ ⎣ 1 ⎦ ⎣ 1 ⎦ 1 1/2 The following scheme yields a successful approach towards a binary AMM [22]: ˜ x → Defuzzification → output y input x → MXX ∧

(10)

Another approach consisting of a two-layer binary MAM was previously suggested to reduce the number of spurious memories for the auto- and hetero-associative cases [23]. The Kosko subsethood FAM that we introduce in this paper combines the advantages of both aproaches and provides a generalization to the gray-scale case. Specifically, let {(x1 , y1 ), . . . , (xk , yk )} be the set of (fuzzy) fundamental mem 1 k ories, i.e. the associations of fuzzy patterns to be stored. Let X = x ∈ , . . . , x 1 n×k k m×k [0, 1] and Y = y , . . . , y ∈ [0, 1] denote the matrices whose columns are respectively the input vectors and the output vectors. The KS-FAM model requires the p×k choice of a matrix of auxiliary patterns Z = z1 , . . . , zk ∈ {0, 1} that satisfy the following equations ∨kξ=1 zξ = 1 ∈ [0, 1]p×k , zξ ≤ zγ , and zξ ∧ zγ = 0 ∈ [0, 1]p×k ∀ γ = ξ.

(11)

For all practical purposes, we may select Z to be the k × k identity matrix. Given an input x ∈ [0, 1]n the KS-FAM model produces an output y according the following equations: ˜ x) , y = WZY (12) w = h(MXZ ∧ ∨ w. t

where h (z) = (h (z1 ) , . . . , h (zp )) is given by

An Introduction to the Kosko Subsethood FAM

h (zi ) =

1 if zi ≥ 0 else

p

j=1 zj

, for all i = 1, . . . , p.

347

(13)

The KS-FAM represents a two-layer neural network. After computing the fuzzy min ˜ x, the defuzzification operator h is applied which results in a comproduct MXZ ∧ petition among the hidden neurons. The activation of the hidden nodes exhibiting the highest values leads to the activation of the corresponding patterns yξ . ¯ ti , x)), where m ¯ ti is given by the Note that the ith hidden node calculates h(S(m complemented transpose of the ith row of MXZ . Thus, the aggregation function of the ¯ ti . This ith hidden node evaluates a type of fuzzy inclusion of the input pattern x in m operation can be viewed as an erosion by a structuring element in the wide sense of MM but not as an algebraic erosion since S(·, x) does generally not commute with the infimum operator [20]. The output layer computes a dilation (in both the latticealgebraic and broad sense) given by the operator δWZY .

3 Experimental Results In this section, we compare the KS-FAM with other fuzzy and gray-scale associative memories in some simulations using gray-scale (fuzzy) images (recall that a gray-scale image can be identified with a fuzzy set). Specifally, we recorded the normalized root mean square error (NRMSE) after presentation of an imperfect image cue to the KSFAM, the Hamming net [7,11], the MAM WXX [14], the MAM WXX +ν [21], Kosko’s max-min FAM [10], the KAM [25], and the OLAM [9]. Figure 1 displays images of size 64×64 with 256 gray levels representing downsized versions of images that are contained in the database of the Computer Vision Group, University of Granada, Spain [2]. By applying the row-scan method to each of the four images, we generated fuzzy vectors xξ ∈ [0, 1]4096 of length 4096 for ξ = 1, . . . , 4 which were used in the experiments. Applications of the KS-FAM to each one of these vectors resulted in perfect recall.

Fig. 1. Original images used in the experiments

3.1 Variations in Brightness and Orientation In this experiment we modified the brightness and the orientation of the original images. Specifically, we subtracted a positive constant (0.35 in the fuzzy case and 89 in the grayscale case) from the tree image and added the same constant to the Lena image. The resulting pixel values were thresholded at the lower and upper boundaries of the fuzzy domain [0, 1] or gray-scale domain [0, 255]. In addition, we rotated the church image

348

P. Sussner and E. Esmi

by 10 degrees to the left and the cameraman image by 10 degrees to the right. Figure 2 depicts the results of this experiment for the brightened version of the Lena image. The KS-FAM succeeded in perfectly retrieving all four original images. The Hamming net was unable to deal with the variations in lighting. Apart from the KS-FAM, the best overall performance was achieved by the KAM model. Table 1 summarizes the results of this experiment in terms of the NRMSE.

Fig. 2. From left to right and from top to botton, the first image shows a brightened version of the Lena image. The remaining images correspond to the outputs of the KS-FAM, the Hamming net, the MAM WXX , the MAM WXX + ν, Kosko’s FAM, the KAM, and the OLAM. Table 1. NRMSEs Produced by AM Models in Applications to Patterns Exhibiting Variations in Brightness and Orientation

Tree Lena Church Cameraman

KS-FAM Hamming net 0 0.6347 0 0.8414 0 0 0 0

WXX WXX + ν Kosko’s FAM 0.4771 0.6032 0.4302 0.7354 0.4615 0.8937 1.6015 0.6168 1.1586 0.9509 0.4765 0.7300

KAM 0.1945 0.1499 0.0566 0.0784

OLAM 0.4986 0.6810 0.2892 0.1937

3.2 Noisy Patterns Finally, we corrupted the original images by introducing Gaussian noise of zero mean and variance 0.03. Figure 3 visualizes the outputs produced by the aforementioned associative memories in applications to the corrupted church image that can be found on the top left-hand corner. Table 2 shows the NRMSEs in 100 experiments for each pattern xξ , ξ = 1, . . . , 4. Both the KS-FAM and the Hamming net achieved perfect recall of the four original images. The KAM and the OLAM also exhibited a very satisfactory tolerance with respect to the three types of noise. Table 2. NRMSEs Produced by AM Models in Applications to Noisy Input Patterns

Gaussian Noise (σ 2 = 0.03)

KS-FAM Hamming net WXX WXX + ν Kosko’s FAM KAM OLAM 0 0 0.9005 0.2770 0.8185 0.0137 0.0365

An Introduction to the Kosko Subsethood FAM

349

Fig. 3. From left to right and from top to botton, the first image shows a corrupted version of the church image containing Gaussian noise of zero mean and variance 0.03. The remaining images correspond to the outputs of the KS-FAM, the Hamming net, the MAM WXX , the MAM WXX + ν, Kosko’s FAM, the KAM, and the OLAM.

4 Concluding Remarks This paper presents the Kosko subsethood FAM on the basis of ideas from mathematical morphology. Preliminary experiments in gray-scale image reconstruction indicate its potential utility for applications in pattern recognition. In fact, the KS-FAM was the only model among the distributive and non-distributive associative memories we tested that achieved perfect recall in all our experiments. However, we would like to caution that the KS-FAM model neither tolerates arbitrary variations in brightness and orientation nor excessive amounts of noise.

References 1. Birkhoff, G.: Lattice Theory. American Mathematical Society, Providence (1993) 2. Computer Vision Group Image Database, Dept. of Comp. Sci. and Art. Int., Univ. of Granada, Spain, http://decsai.ugr.es/cvg/index2.php 3. Cuninghame-Green, R.: Minimax Algebra and Applications. In: Hawkes, P. (ed.) Advances in Imaging and Electron Physics, pp. 1–121. Academic Press, New York (1995) 4. Cuninghame-Green, R.: Minimax Algebra: Lecture Notes in Economics and Mathematical Systems 166. Springer, New York (1979) 5. Davidson, J.L.: Foundation and Applications of Lattice Transforms in Image Processing. In: Hawkes, P. (ed.) Advances in Electronics and Electron Physics, vol. 84, pp. 61–130. Academic Press, New York (1992) 6. Deng, T.Q., Heijmans, H.J.A.M.: Grey-scale morphology based on fuzzy logic. Journal of Mathematical Imaging and Vision 16(2), 155–171 (2002) 7. Hassoun, M.H., Watta, P.B.: The Hamming associative memory and its relation to the exponential capacity DAM. In: Proc. IEEE International Conference on Neural Networks, June 1996, vol. 1, pp. 583–587 (1996) 8. Heijmans, H.J.A.M.: Morphological Image Operators. Academic Press, New York (1994) 9. Kohonen, T.: Self-Organization and Associative Memory. Springer, Heidelberg (1984) 10. Kosko, B.: Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence. Prentice Hall, Englewood Cliffs (1992)

350

P. Sussner and E. Esmi

11. Lippmann, R.P.: An Introduction to Computing with Neural Nets. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-4, 4–22 (1987) 12. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 13. Nachtegael, M., Kerre, E.E.: Connections between binary, gray-scale and fuzzy mathematical morphologies. Fuzzy Sets and Systems 124(1), 73–85 (2001) 14. Ritter, G.X., Sussner, P., Diaz de Leon, J.L.: Morphological Associative Memories. IEEE Transactions on Neural Networks 9(2), 281–293 (1998) 15. Ritter, G.X., Sussner, P.: An Introduction to Morphological Neural Networks. In: Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria, pp. 709–717 (1996) 16. Ronse, C.: Why Mathematical Morphology Needs Complete Lattices. Signal Processing 21(2), 129–154 (1990) 17. Serra, J.: Image Analysis and Mathematical Morphology, Theoretical Advances, vol. 2. Academic Press, New York (1988) 18. Soille, P.: Morphological Image Analysis. Springer, Berlin (1999) 19. Sussner, P., Esmi, E.L.: Morphological Perceptrons with Competitive Learning: LatticeTheoretical Framework and Constructive Learning Algorithm. Information Sciences (accepted for publication, 2010) 20. Sussner, P., Valle, M.E.: Classification of Fuzzy Mathematical Morphologies Based on Concepts of Inclusion Measure and Duality. Journal of Mathematical Imaging and Vision 32(2), 139–159 (2008) 21. Sussner, P., Valle, M.E.: Grayscale Morphological Associative Memories. IEEE Transactions on Neural Networks 17(3), 559–570 (2006) 22. Sussner, P.: Generalizing Operations of Binary Morphological Autoassociative Memories using Fuzzy Set Theory. J. of Math. Imaging and Vision 9(2), 81–93 (2003) 23. Sussner, P.: Associative morphological memories based on variations of the kernel and dual kernel methods. Neural Networks 16(5), 625–632 (2003) 24. Valle, M.E., Sussner, P.: A General Framework for Fuzzy Morphological Associative Memories. Fuzzy Sets and Systems 159(7), 747–768 (2008) 25. Zhang, B.-L., Zhang, H., Ge, S.S.: Face Recognition by Applying Wavelet Subband Representation and Kernel Associative Memory. IEEE Transactions on Neural Networks 15(1), 166–177 (2004)

An Increasing Hybrid Morphological-Linear Perceptron with Evolutionary Learning and Phase Correction for Financial Time Series Forecasting Ricardo de A. Ara´ujo1 and Peter Sussner2 1 2

Information Technology Department, [gm]2 Intelligent Systems, Brazil Department of Applied Mathematics, University of Campinas, Brazil [email protected], [email protected]

Abstract. In this paper we present a suitable model to solve the financial time series forecasting problem, called increasing hybrid morphological-linear perceptron (IHMP). An evolutionary training algorithm is presented to design the IHMP (learning process), using a modified genetic algorithm (MGA). The learning process includes an automatic phase correction step that is geared at eliminating the time phase distortions that typically occur in financial time series forecasting. Furthermore, we compare the proposed IHMP with other neural and statistical models using two complex nonlinear problems of financial forecasting. Keywords: Lattice Theory, Minimax Algebra, Morphological Neural Networks, Genetic Algorithms, Financial Time Series Forecasting.

1 Introduction In the last few years, morphological neural networks (MNNs) have been proposed for a wide range of applications [1, 2, 3, 4, 5]. MNNs are based on the framework of mathematical morphology (MM) whose algebraic foundations can be found in lattice theory [6, 7, 8]. Originally MM was developed for the processing and analysis of images using structuring elements (SEs) [9, 10]. In contrast to traditional artificial neural network (ANN) models, the aggregation functions of MNNs perform operations of MM instead of conventional linear operations [11]. Morphological neural networks have only very recently found applications in the domain of financial time series forecasting [12, 13, 14] whereas conventional ANN models have been successfully used for nonlinear modeling of time series for at least two decades [15,16,17]. ANNs typically require setting a series of system parameters, some of which are not always easy to determine. In the particular case of time series forecasting, another crucial element that needs to be determined beforehand is the relevant time lags to represent the series [13, 14]. Given the fact that financial time series exhibit a strong linear component as well as a weaker nonlinear component, this paper proposes a hybrid model, called increasing hybrid morphological-linear perceptron (IHMP), consisting of a convex combination of a nonlinear increasing morphological perceptron [18] (because experimental results presented in [13, 14] indicate that financial forecasting models can be assumed to be increasing) and a linear perceptron [19]. IHMP learning employs a modified genetic algorithm (MGA) [20]. Moreover, the learning process includes an automatic phase correction step that is geared at eliminating the time phase distortions that typically E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 351–358, 2010. c Springer-Verlag Berlin Heidelberg 2010

352

R. de A. Ara´ujo and P. Sussner

occur in financial forecasting (“random walk dilemma”) [16, 13, 14]. Furthermore, two complex nonlinear problems of financial prediction are used to compare the proposed IHMP with other prediction models found in the literature. The paper concludes with a discussion of the IHMP model and its performance in prediction problems.

2 The Random Walk Dilemma A naive prediction strategy is to use the last observation of a time series as a prediction of its future value (xt+1 = xt ). This kind of model is known as the random walk (RW) model [16] and is determined by the following equations: xt = xt−1 + rt ,

(1)

where xt is the current observation, xt−1 is the immediate observation before xt , and rt is a noise term with a gaussian distribution of zero mean and standard deviation σ (rt ≈ N (0, σ)). This behavior is common in the finance and economics and is called random walk dilemma or random walk hypothesis [16]. Assuming that an accurate prediction model is used to build an estimated value of xt , denoted by xt , the expected value (E[·]) of the difference between xt and xt must tend to zero, E[xt − xt ] → 0. (2) If the time series generator phenomenon is supposed to have a strong random walk linear component and a very weak nonlinear component (denoted by g(t)), and assuming = t), the expected value of the difference between that E[rt ] = 0 and E[rt rk ] = 0 (∀ k xt and xt (assuming that xt = xt−1 + g(t) + rt ) will be E[xt ] → E[xt−1 ] + E[g(t)] + E[rt ].

(3)

If E[g(t)] → 0, then E[xt−1 ] + E[g(t)] + E[rt ] → E[xt−1 ] and E[xt ] → E[xt−1 ]. Under these conditions, escaping the random walk dilemma is a hard task [16].

3 Background Information on Morphological Neural Networks In a general, yet rigorous way, a morphological neural network (MNN) is defined as the type of artificial neural network that performs an elementary operation of mathematical morphology (MM) between complete lattices at every node, possibly followed by an activation function [4, 5]. Recall that complete lattices provide an appropriate algebraic framework for MM [6,7,8]. This insight was acquired at later stages of the development of MM. In this paper, we adhere to the rigorous, lattice-algebraic definition of an MNN. A partially ordered set L is called a lattice if and only if every finite, non-empty subset of L has an infimum and asupremum in L [21]. we denote For any X ⊆ L, the infimum of X by the symbol X and we write j∈J xj instead of X if X = {xj , j ∈ J} for a index set J. We use similar notations to denote the supremum of X. Let L and M be lattices. A mapping Ψ : L → M is called increasing if and only if the following statement is true for all x, y ∈ L: x ≤ y ⇒ Ψ (x) ≤ Ψ (y) .

(4)

An IHMP with Evolutionary Learning and Phase Correction

353

If L are lattices, a partial order on Ln can be defined by setting (x1 , . . . , xn ) ≤ (y1 , . . . , yn ) ⇔ xi ≤ yi , i = 1, . . . , n

(5)

The resulting partially ordered set Ln is also a lattice and is called product lattice. A lattice L is complete if every non-empty (finite or infinite) subset has an infimum and a supremum in L [21]. If L is a complete lattice then the product lattice Ln is also complete. Complete lattices are widely accepted as the appropriate theoretical framework for mathematical morphology [6, 7, 8]. A central issue in this setting is the decomposition of mappings between complete lattices in terms of elementary operations. Let ε (an algebraic erosion) and δ (an algebraic dilation) be operators from a complete lattice L to a complete lattice M. Banon and Barrera have provided several theorems on the constructive decomposition of mappings between complete lattices in terms of elementary operations of MM [18]. In particular, Banon and Barrera’s constructive decomposition of increasing mappings leads to the following theorem: Theorem 1. An increasing mapping Ψ : L → M between complete lattices L and M can be represented either by a supremum of erosions or by an infimum of dilations. Formally, there exist erosions εi and dilations δ j for some index sets I and J such that Ψ= εi = δj . (6) i∈I

j∈J

Banon and Barrera’s decomposition theorems have (implicitly) served as the basis for the learning algorithms of several MNN models. In these models, the elementary morphological operators occurring in the decomposition are assumed to adopt a special form which requires an additional algebraic structure besides the complete lattice structure [4, 5]. In this paper, we focus on the complete lattice R±∞ since financial time series prediction problems can be modeled in terms of functions Rn±∞ → R±∞ (where n is the number of antecedents or time lags). Given a matrix A ∈ Rm×p and a matrix B ∈ Rp×n ∨ B, called ±∞ , the matrix C = A the max-product of A and B, and the matrix D = A ∧ B, called the min-product of A and B, are defined by the following equations: cij =

p k=1

(aik + bkj ),

dij =

p

(aik + bkj ) .

(7)

k=1

n×m : Consider the following operators εA , δA : Rn±∞ → Rm ±∞ for A ∈ R

εA (x) = AT ∧ x, δA (x) = AT ∨ x,

(8) (9)

where ·T denotes transposition. The operators εA and δA represent respectively an (algebraic) erosion and an (algebraic) dilation from the complete lattice Rn±∞ to the complete lattice Rm ±∞ [5]. In an upcoming paper, we will prove that every erosion ε : Rn±∞ → Rm ±∞ is of the form εA and every dilation δ : Rn±∞ → Rm ±∞ is of the form δA . This statement together with Equation 6 suggests that an increasing function Ψ : Rn → R can be approximated ¯ as follows in terms of vectors vi , wj ∈ Rn and some finite index sets I¯ and J:

354

R. de A. Ara´ujo and P. Sussner

Ψ

εvi or Ψ

i∈I¯

(10)

δw j .

j∈J¯

The hypothesis of Equation 10 provides the basis for our estimation of financial time series by means of morphological perceptrons. A further discussion is beyond the scope of the paper.

4 The Proposed Increasing Hybrid Morphological-Linear Perceptron We conducted a number of experiments that led us to believe that the financial time series considered in this paper are given by increasing functions Ψ : Rn → R, where n represents the number of antecedents or time lags. The proposed increasing hybrid morphological-linear perceptron (IHMP) has a morphological module as well a linear module whose outputs are linearly combined to yield the final output. We differentiate between erosion-based IHMP (E-IHMP) and dilation-based IHMP (D-IHMP). Specifically, the E-IHMP model is given by the following equations: y = λα + (1 − λ)β, λ ∈ [0, 1], (11) where

β = x · bT = x1 b1 + x2 b2 + . . . + xn bn

(12)

and α=

k

n

vi for v = (v1 , v2 , . . . , vk ) and vi = εai (x) =

i=1

(aij + xj ) .

(13)

j=1

Here, n denotes the input signal (x) dimensionality and k denotes the number of operations employed in the morphological module. The i-th erosion is given by εai , where ai = (ai1 , ai2 , . . . , ain )T ∈ Rn can be viewed as the structuring element corresponding to the erosion εai . The vector b comprises the coefficients of the linear component of the model. The only difference between the D-IHMP and E-IHMP models is that in the D-IHMP the following equation replaces Equation 13: α=

k

vi , for v = (v1 , v2 , . . . , vk ) and vi = δai (x) =

i=1

n

(aij + xj ).

(14)

j=1

Note that, for both the E-IHMP and the D-IHMP, a convex combination of α, the output of the morphological module, and β, the output of the linear module, yields the final output. 4.1 The Proposed Training Algorithm Note that the E-IHMP and D-IHMP models require the setting of the parameters λ, b and ai for i = 1, . . . k. If a denotes the concatenation of the vectors ai , i.e., aT = ((a1 )T , (a2 )T , . . . , (ak )T ) then the weight vector w of either model is given by wT = (λ, bT , aT ) .

(15)

An IHMP with Evolutionary Learning and Phase Correction

355

During the evolutionary training process, the weights of the IHMP are adjusted according to an error criterion until convergence or until the end of evolutionary algorithm generations. Each individual of population represents all IHMP weights (wT ). Let us define the following fitness function f (w) in terms of the weights: f (w) =

1+

1 M

m=1

e2 (m)

(16)

.

Here, M is the number of training data and e(m) is the instantaneous error given by e(m) = d(m) − y(m) ,

(17)

where d(m) and y(m) are respectively the desired output signal and the actual output for the m-th training pattern. The modified genetic algorithm (MGA) used to train the IHMP is based on the work of Leung et al. [20]. The MGA procedure consists on the selection of a parent pair of chromosomes and then performing crossover and mutation operators (generating the offspring chromosomes – the new population) until the termination condition is reached. Then the best individual in the population is selected as a solution to the problem. In our simulations, the population comprises ten individuals. The crossover operator is used for exchanging information from two parents (vectors p1 and p2 ) obtained in the selection process by a roulette wheel approach [20]. The recombination process to generate the offsprings (vectors C1 , C2 , C3 and C4 ) is done by four crossover operators, which are defined by the following equations [20]: C1 =

p1 + p2 , 2

(18)

C2 = w(p1 ∨ p2 ) + (1 − w)pmax ,

(19)

C3 = w(p1 ∧ p2 ) + (1 − w)pmin ,

(20)

w(p1 + p2 ) + (1 − w)(pmax + pmin ) . (21) 2 The symbol w ∈ [0, 1] (in this paper, we used 0.9) denotes the crossover weight (the closer w is to 1, the greater is the direct contribution from parents). The symbols p1 ∨p2 and p1 ∧ p2 denote the vectors whose elements are respectively the element-wise maximum and minimum of p1 and p2 . The terms pmax and pmin denote the vectors with the maximum and minimum possible gene values, respectively. After the offspring generation by crossover operators, the son exhibiting the greatest fitness value will be chosen as the offspring generated by the crossover process. The resulting vector is denoted using the symbol Cbest , which replaces the individual of the population with the smallest fitness value. After conclusion of the crossover process, three new mutated offsprings MC1 , MC2 , and MC3 are generated from Cbest as follows [20]: C4 =

MCj = Cbest + Γj ΔMj , j = 1, 2, 3 . j

best

(22) j

Here, the vectors ΔM satisfy the inequalities pmin ≤ C + ΔM ≤ pmax for j = 1, 2, 3. The vectors Γj have entries in {0, 1} and satisfy the following additional

356

R. de A. Ara´ujo and P. Sussner

conditions: The vector Γ1 has only one randomly chosen non-zero entry, Γ2 represents a random binary vector, and Γ3 is the constant vector 1 (consisting only of ones). The mutated offsprings are incorporated into the population according to the following scheme. We generate a random element r of the unit interval [0, 1] and compare it with 0.1. If r < 0.1 then the mutated offspring exhibiting the largest fitness replaces the individual of the current population that has the smallest fitness value. Otherwise, we perform the following steps for j = 1, 2, 3. If the fitness value of MCj exceeds the one of the least fit individual (the one that yields the smallest fitness value) of the current population then we substitute the latter with MCj . Finally, in order to automatically adjust time phase distortions we included a phase fix procedure in the training algorithm. The phase fix procedure have two steps. In the first step an application of the IHMP to an input pattern x = (x1 , . . . , xn )T produces the output y1 . Then the value y1 is attached to the shortened vector (x1 , . . . , xn−1 )T ∈ Rn−1 yielding the pattern (y1 , x1 , . . . , xn−1 )T . The modified pattern (y1 , x1 , . . . , xn−1 )T is now fed to the same IHMP which generates the phase corrected prediction y2 . Three stopping criteria are used in the proposed evolutionary training algorithm: i) The maximum generation number: gen = 10000, ii) The decrease in the training error process training (P t) [22] of the fitness function: P t ≤ 10−6 , and iii) An increase of the validation error or generalization loss of the fitness function (Gl) [22] beyond 5%. The entries of the weight vectors a and b of each individual of population are randomly initialized within the range [−1, 1]. The initial mixture coefficient λ of each individual of population is randomly chosen in the interval [0, 1]. The choice of k (the number of erosions or dilations operations used in the morphological module) varies for the prediction problems that we considered in this paper. It is important to mention that due to genetic operators, the resulting chromosome gene values may exceed their valid boundary values. Whenever this happens, the corresponding genes are truncated to remain within the permitted interval.

5 Simulations and Experimental Results A set of two real world financial time series (Dow Jones Industrial Average (DJIA) index and Standard & Poor 500 (S&P500) index) were used as a test bed for the evaluation of the proposed model. All time series were normalized to lie within the range [0, 1] and divided in three sets according to Prechelt [22]. In order to establish a performance study, previously published results obtained with the ARIMA [23], modular morphological neural network (MMNN) [12], multi-layer perceptron (MLP) [19] and morphological-rank-linear (MRL) perceptron [13] models on the same time series and under the same conditions are employed for comparative studies. To ensure a fair comparison, we applied the phase fix procedure to all of these models [13, 14]. As a global indicator of the prediction performances, we employed the following evaluation function (EF) which combines five well-known performance measures defined in [13, 14]: EF =

POCID . 1 + MSE + MAPE + THEIL + ARV

(23)

For the DJIA index series prediction, we utilized the same time lags presented in [12] to create the input vectors (in this case we use the lags 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11), and the same number of operations in morphological module presented in [12] (in this case we use 8). Table 1 shows the results for all the performance measures.

An IHMP with Evolutionary Learning and Phase Correction

357

Table 1. Results of the test set for the DJIA index series Metrics MSE MAPE THEIL ARV POCID EF

ARIMA 5.8033e-4 8.3200e-2 1.2649 3.9200e-2 46.10 19.3058

MMNN 8.3236e-4 9.6700e-2 0.9945 3.4423e-2 50.85 23.9130

MLP 8.3000e-2 9.3788e-2 0.9885 3.4204e-2 46.59 21.1822

MRL 8.2148e-4 9.6578e-2 0.9916 3.3981e-2 46.82 22.0539

D-IHMP 1.6044e-4 5.7717e-2 0.4965 6.5683e-3 100.00 64.0637

E-IHMP 1.7619e-4 6.0262e-2 0.5094 7.2129e-3 100.00 63.4095

For the S&P500 index series prediction, we used same time lags as in [12] to create the input vectors (in this case we use the lags 2, 3, 4, 5 and 6), and the same number of operations in the morphological module as in [12] (in this case we use 10). Table 2 shows the results for all the performance measures. Table 2. Results of the test set for the S&P500 index series Metrics MSE MAPE THEIL ARV POCID EF

ARIMA 2.1447e-5 1.2400e-2 1.4090 0.1374 47.22 18.4538

MMNN 9.7451e-5 9.2000e-2 0.9498 7.4749e-3 81.31 39.6756

MLP 9.6000e-3 1.0103e-2 0.9179 7.2875e-3 50.98 26.2123

MRL 1.0982e-4 1.0214e-2 1.0397 8.4926e-2 52.18 24.4409

D-IHMP 3.8909e-5 7.2277e-3 0.6184 2.9930e-3 100.00 61.4002

E-IHMP 2.9857e-5 6.2731e-3 0.5388 2.2967e-3 100.00 64.6245

6 Conclusion This paper introduces the increasing hybrid morphological-linear perceptron (IHMP) for financial time series prediction. The IHMP training algorithm makes use of a modified genetic algorithm (MGA) to determine the IHMP parameters. We also added an automatic phase correction step that is geared at eliminating the time phase distortions in financial time series. The performance of the proposed IHMP in comparison to a number of competitive neural and statistical models was assessed in terms of five well-known performance measures in two experiments using real world financial time series: DJIA and S&P500. In addition, an evaluation function that combines the five aforementioned performance measures served as a global indicator for the quality of prediction achieved by a certain model. The experimental results demonstrated a consistently better performance of the proposed IHMP model in comparison to other models found in the literature. With the inclusion of the phase correction step, the IHMP was able to escape the so called random walk dilemma [16] in our simulations. In other words, the IHMP model succeeded in automatically correcting the time phase distortions that typically occur in financial forecasting. Despite the incorporation of the same phase correction procedure, the other models tested in this paper were unable to cope as well with these time phase distortions. Finally, we would like to clarify that the excellent performance of the IHMP does not depend on the use of a genetic algorithm-based training method. Instead, the main advantage of the IHMP in comparison to other models is its capability of modeling the combination of the linear and nonlinear components that determine financial time series in terms of a combination of a linear module and a morphological or lattice-based module. The main purpose of the phase correction procedure is to adjust the nonlinear component which enters the final prediction.

358

R. de A. Ara´ujo and P. Sussner

References 1. Pessoa, L.F.C., Maragos, P.: Neural networks with hybrid morphological rank linear nodes: a unifying framework with applications to handwritten character recognition. Pattern Recognition 33, 945–960 (2000) 2. Gader, P.D., Khabou, M.A., Koldobsky, A.: Morphological regularization neural networks. Pattern Recognition, Special Issue on Mathematical Morphology and Its Applications 33(6), 935–945 (2000) 3. Khabou, M.A., Gader, P.D., Keller, J.M.: LADAR target detection using morphological shared-weight neural networks. Machine Vision and Applications 11(6), 300–305 (2000) 4. Sussner, P., Esmi, E.L.: Introduction to morphological perceptrons with competitive learning. In: Proceedings of the International Joint Conference on Neural Networks, Atlanta, GA, pp. 3024–3031 (2009) 5. Sussner, P., Esmi, E.L.: Morphological perceptrons with competitive learning: Latticetheoretical framework and constructive learning algorithm. Information Sciences (2009) (accepted for publication) 6. Serra, J.: Image Analysis and Mathematical Morphology, Theoretical Advances, vol. 2. Academic Press, New York (1988) 7. Ronse, C.: Why mathematical morphology needs complete lattices. Signal Processing 21(2), 129–154 (1990) 8. Heijmans, H.J.A.M.: Morphological Image Operators. Academic Press, New York (1994) 9. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 10. Serra, J.: Image Analysis and Mathematical Morphology. Academic Press, London (1982) 11. Sussner, P., Esmi, E.L.: Constructive morphological neural networks: some theoretical aspects and experimental results in classification. In: Kacprzyk, J. (ed.) Constructive Neural Networks. Studies in Computational Intelligence. Springer, Heidelberg (2009) 12. de A. Ara´ujo, R., Madeiro, F., de Sousa, R.P., Pessoa, L.F.C., Ferreira, T.A.E.: An evolutionary morphological approach for financial time series forecasting. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2467–2474 (2006) 13. de A. Ara´ujo, R., Ferreira, T.A.E.: An intelligent hybrid morphological-rank-linear method for financial time series prediction. Neurocomputing 72(10-12), 2507–2524 (2009) 14. de A. Ara´ujo, R., Ferreira, T.A.E.: A morphological-rank-linear evolutionary method for stock market prediction. Information Sciences (in Press, 2010) 15. Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting 14, 35–62 (1998) 16. Sitte, R., Sitte, J.: Neural networks approach to the random walk dilemma of financial time series. Applied Intelligence 16(3), 163–171 (2002) 17. Zhang, G.P., Kline, D.M.: Quarterly time-series forecasting with neural networks. IEEE Transactions on Neural Networks 18(6), 1800–1814 (2007) 18. Banon, G.J.F., Barrera, J.: Decomposition of mappings between complete lattices by mathematical morphology, part 1. general lattices. Signal Processing 30(3), 299–327 (1993) 19. Haykin, S.: Neural networks: A comprehensive foundation. Prentice Hall, New Jersey (1998) 20. Leung, F.H.F., Lam, H.K., Ling, S.H., Tam, P.K.S.: Tuning of the structure and parameters of the neural network using an improved genetic algorithm. IEEE Transactions on Neural Networks 14(1), 79–88 (2003) 21. Birkhoff, G.: Lattice Theory, 3rd edn. American Mathematical Society, Providence (1993) 22. Prechelt, L.: Proben1: A set of neural network benchmark problems and benchmarking rules. Technical Report 21/94 (1994) 23. Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, New Jersey (1994)

Lattice Associative Memories for Segmenting Color Images in Diﬀerent Color Spaces Gonzalo Urcid1, , Juan Carlos Valdiviezo-N1 , and Gerhard X. Ritter2 1

2

Optics Department, INAOE, Tonantzintla, Pue. 72000, Mexico {gurcid,jcvaldiviezo}@inaoep.mx CISE Department, University of Florida, Gainesville, FL 32611–6120, USA [email protected]

Abstract. This paper describes a technique for segmenting color images in diﬀerent color spaces based on lattice auto-associative memories. Basically, the min- or max auto-associative memories can be used to determine tetrahedra enclosing diﬀerent subsets of image pixels. The column vectors of either memory, additively scaled, correspond to the most saturated color pixels that are the vertices of a speciﬁed tetrahedron, and any other color pixel can be considered a linear mixture of these points. The non-negative least square method is used to linearly unmix color pixels and provides the fundamental step in the unsupervised segmentation of a given input color image. We give illustrative examples to demonstrate the eﬀectiveness of our method as well as the color separation results in four diﬀerent color spaces.

1

Introduction

Color image segmentation has been approached from several perspectives that currently are categorized as pixel, area, edge, and physics based segmentation [1]. For example, pixel based segmentation includes histogram techniques and cluster analysis in color spaces. Optimal thresholding [2] and the use of a perceptually uniform color space [3] are examples of histogram based techniques. Area based segmentation contemplates region growing as well as split-and-merge techniques whereas edge based segmentation embodies local methods and extensions of the morphological watershed transformation [4] such as the ﬂat zone approach [5]. A seminal work employing Markov random ﬁelds for splitting and merging color regions was proposed in [6]. Other recent developments contemplate the fusion of various segmentation techniques such as the application of morphological closing and adaptive dilation to color histogram thresholding [7], the use of the watershed algorithm for color clustering with Markovian labeling [8], or fuzzy principal component analysis coupled with clustering based on recursive one-dimensional histogram analysis [9]. For a recent systematic exposition of color image segmentation methods the interested reader may see [10].

Corresponding author. Fax: +52 (222) 247-2940; Tel: +52 (222) 266-3100 Ext.8205. G. Urcid and J.C. Valdiviezo-N. are grateful with SNI-CONACYT for partial ﬁnancial support through grant # 22036 and doctoral scholarship # 175027.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 359–366, 2010. c Springer-Verlag Berlin Heidelberg 2010

360

G. Urcid, J.C. Valdiviezo-N, and G.X. Ritter

In this paper, we describe a lattice algebra based technique for image segmentation and apply it to RGB (Red-Green-Blue) color images transformed to other representative systems such as, HSI (Hue-Saturation-Intensity), I1 I2 I3 (principal components approximation), and L*a*b*(Luminance - redness/greenness - yellowness/blueness) color spaces. The proposed method relies on the min WXX and max MXX lattice auto-associative memories, where X is the set formed by all diﬀerent colors or 3-dimensional pixel vectors contained in the input image. The scaled column vectors of either memory together with the minimum or maximum vector bounds of X may form the vertices of tetrahedra enclosing subsets of X, and correspond to the most saturated color pixels in the image. Image partition into regions of similar colors is realized by linearly unmixing pixels belonging to tetrahedra determined by WXX and MXX , and then by scaling pixel color fractions obtained with the non-negative least squares numerical method. Thus our approach to color image segmentation can be classiﬁed as a pixel based unsupervised clustering technique. Section 2 presents background material on image segmentation and a brief overview of minimax algebra and lattice associative memories; Section 3 describes the segmentation technique based on the scaled column vectors of WXX and MXX including the linear mixing model used to determine the color fractions composing any pixel vector in the input image. In Section 4, we give the segmentation results for images represented in the color spaces previously mentioned and, ﬁnally, Section 5 gives the conclusions concerning this research.

2

Background

Intuitively, to segment an image is to divide it into a ﬁnite set of disjoint regions whose pixels share well-deﬁned attributes. Perceptually, the segmentation process must convey the necessary information to visually recognize or identify the prominent features contained in the image such as color hue, brightness or texture. Let X be a ﬁnite set with k elements and p a logical predicate about a quantiﬁable attribute. A segmentation of X is a family {Ri } of subsets of X each with ki elements for i = 1, . . . , q, that satisfy the following properties: 1) Ri ∩ Rj = ∅ for i = j (pairwise disjoint q subsets), 2) for any i, Ri is a q connected subset, 3) i=1 Ri = X and i=1 ki = k (whole set covering), 4) ∀i, p(Ri ) = true (elements in a single subset share the same attribute), and 5) for i = j, p(Ri ∪ Rj ) = false (elements in a pairwise union of subsets do not share the same attribute). It should be clear that color image segmentation needs additional computational eﬀort due to its vectorial nature if compared to the scalar nature of grayscale image segmentation. The maximum and minimum of two numbers usually denoted as functions, max(x, y) and min(x, y), will be written as the “join” and “meet” binary operators employed in lattice theory, x ∨ y = max(x, y) and x ∧ y = min(x, y). Lattice matrix operations are deﬁned componentwise, e. g., the maximum of two matrices X, Y of the same size m × n is computed as (X ∨ Y )ij = xij ∨ yij for i = 1, . . . , m and j = 1, . . . , n. Inequalities between matrices are also veriﬁed

Lattice Associative Memories for Segmenting Color Images

361

elementwise, for example, X ≤ Y if and only if xij ≤ yij . Also, the conjugate matrix X ∗ is deﬁned as −X t where X t denotes usual matrix transposition. The max-of-sums X ∨ Y , of appropriately sized matrices and the min-ofare deﬁned, for i = 1, . . . , m and j= 1, . . . , n, respectively, as sums X ∧ Y , p p (X ∨ Y )ij = k=1 (xik + ykj ) and (X ∧ Y )ij = k=1 (xik + ykj ). For p = 1 these lattice matrix operations reduce to the outer sum of two vectors x = (x1 , . . . , xn )t ∈ IRn and y = (y1 , . . . , ym )t ∈ IRm , which is the m × n matrix: (y × xt )ij = (yi + xj ). Let (x1 , y 1 ), . . . , (xk , y k ) be k vector pairs with xξ ∈ IRn and y ξ ∈ IRm for each ξ, with corresponding associated matrices (X, Y ), where X = (x1 , . . . , xk ) and Y = (y 1 , . . . , y k ). Then, X is of dimension n × k with i, jth entry xji and Y is of dimension m × k with i, jth entry yij . To store k vector pairs (x1 , y1 ), . . . , (xk , y k ) in an m× n lattice associative memory (LAM), vector encoding uses the outer sums y ξ × (−xξ )t for all ξ [11]. The network weights, wij of the min-memory WXY and, mij of the max-memory MXY , for i = 1, . . . , m and j = 1, . . . , n, are given by wij =

k

(yiξ − xξj ) ;

ξ=1

mij =

k

(yiξ − xξj ).

(1)

ξ=1

We speak of a lattice hetero-associative memory (LHAM) if X = Y and of a lattice auto-associative memory (LAAM) if X = Y . In this paper we will use LAAMs only, i.e., WXX and MXX of size n × n; in particular, the main diagonals of both matrices, i. e., wii and mii consist entirely of zeros.

3

Segmenting Images with the W M Method

From a given a color image A of size p × q pixels, set X contains all diﬀerent colors or 3-dimensional vectors present in A. If |X| = k is the number of elements in X then k ≤ pq = |A|, where pq is the maximum number of possible color pixels available in A. Using (1) with yiξ = xξi for all i ∈ {1, . . . , n}, the memory matrices min-WXX and max-MXX are computed and written, respectively, as W = (w1 , w2 , w3 ) and M = (m1 , m2 , m3 ) to make explicit their column vectors. By construction, vector entries of W or M may not necessarily belong to the numerical range of a given color space; for example W usually has negative entries. The next transformation puts these vectors in the appropriate range. The minimum- and maximum vector bounds of X = (x1 , . . . , xk ) are given by v = kξ=1 xξ and u = kξ=1 xξ , respectively. Let W = (w1 , . . . , wn ) and M = (m1 , . . . , mn ) be the min- and max memory matrices, then additive scaling results in two scaled matrices, denoted W and M , whose column vectors are deﬁned by w i = w i + ui = w i +

k ξ=1

xξi

;

m i = mi + vi = mi +

k

xξi .

(2)

ξ=1

Notice that, wiii = ui and miii = vi , hence diag(W ) = u and diag(M ) = v. Each set of scaled vectors, {w1 , w2 , w3 } or {m1 , m2 , m3 }, makes possible to

362

G. Urcid, J.C. Valdiviezo-N, and G.X. Ritter

determine several tetrahedra enclosing speciﬁc subsets of X. Recall that X is said to be a convex set if the straight line joining any two points in X lies completely within X; also, an n-dimensional simplex is the minimal convex set whose n + 1 vertices are vectors in IRn . Since the color solid is a subspace of IR3 a 3-dimensional simplex will correspond to a tetrahedron. Hence, considering pixel vectors in a color image enclosed by some tetrahedron whose base face is determined by its most saturated colors, an estimation of the fractions in which they appear at any other color pixel can be made. A model commonly used for the analysis of spectral mixtures in hyperspectral images, known as the linear mixing (LM) can be used to unmix noiseless color images by representing each pixel vector x, as a linear combination of the most saturated colors. Thus, x = Sψ = ψ1 s1 + ψ2 s2 + ψ3 s3 ,

(3)

where, x is a 3 × 1 pixel vector, S = (s1 , s2 , s3 ) is a square matrix of size 3 × 3 whose columns are the most saturated colors, and ψ is the 3 × 1 vector of “color fractions” present in x. The components of ψ must satisfy the relations, ψ1 , ψ2 , ψ3 ≥ 0 (non-negativity) and ψ1 + ψ2 + ψ3 = 1 (full additivity). Solving (3) to ﬁnd vector ψ given that S = W or S = M for every x ∈ X, is the process known as constrained linear unmixing. For this task, we apply the non-negative least squares (NNLS) numerical method that relaxes the full additivity condition. Once (3) is solved for every color pixel x, all ψ vector values are reassembled into grayscale fraction images for s1 , s2 , s3 , and a thresholding procedure can be applied to get a coarser color segmentation depicting the corresponding image partition. Additional material and discussion on the W M method has been addressed earlier [12,13]. Figure 1 shows in the top left, the “peppers” RGB color image of size 128×128 pixels, its HSI transformation displayed as a false RGB color image, and the extreme color pixels determined from W (upper row) and M (lower row) in the HSI color space. Here, X = {x1 , . . . , x13,844 } (from a total of 16, 384 pixel vectors). A 3D-scatter plot of X is depicted to the left of Fig. 2. The computed scaled memory matrices and vector bounds are given by ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 255 100 36 255 0 67 140 0 W = ⎝ 188 255 16 ⎠ , u = ⎝ 255 ⎠ ; M = ⎝ 155 0 152 ⎠ , v = ⎝ 0 ⎠ . 115 103 255 255 219 239 0 0 Figure 2 illustrates four tetrahedra enclosing diﬀerent subsets of X, namely W ∪ {v} and W ∪ {u} shown in the middle, or M ∪ {v} and M ∪ {u} displayed to the right. Equation (3), implemented with the NNLS method, is applied to ﬁnd a fractions solution vector ψ for each one of the 16, 384 color pixels, ﬁrst taking S = W then making S = M . The 2nd and 3rd rows in Fig. 1, display the fraction maps obtained from the HSI saturated colors displayed in the top right, whose associated column vectors correspond, respectively, to W and M . Each segmented image sj is linearly scaled from the subinterval [0, μ] to the dynamic k range [0, 255], where μ = ξ=1 ψjξ and k = 16, 384. The fraction threshold φ and the grayscale threshold τ are related by the expression φ = μτ /256.

Lattice Associative Memories for Segmenting Color Images

363

Fig. 1. 1st row: RGB color image, transformed HSI color image, saturated colors obtained from W (upper line) and M (lower line); 2nd and 3rd rows: grayscale segmented images derived from w j , respectively, mj for j = 1, 2, 3, showing “red/green” pepper regions and bright reﬂected light regions. Brighter gray tones correspond to higher fractions of saturated colors.

Fig. 2. Left: 3D-scatter plot of X showing all diﬀerent colors present in the HSI representation of the “peppers” RGB color image; right: tetrahedra determined from W = {w 1 , w 2 , w 3 }, and M = {m 1 , m 2 , m3 } enclosing four diﬀerent subsets of X

4

Segmentation Results in Other Color Spaces

To test the performance of the W M method in diﬀerent color spaces, besides the standard non-normalized correlated RGB space, we selected as representative alternatives, Ohta’s I1 I2 I3 linearly decorrelated RGB color space [1,6], the HSI non-linear and non-uniform color space [10,14], and the perceptually uniform color space L*a*b* [3,10]. The “peppers” RGB color image and its transformation to the I1 I2 I3 , HSI, and L*a*b* color spaces, rendered as false color RGB images, are displayed in the ﬁrst four columns of row one of Fig. 3. In the 2nd row below each color image, composed thresholded fraction maps selected from W ∪ M , depict the best

364

G. Urcid, J.C. Valdiviezo-N, and G.X. Ritter Table 1. Segmentation performance for the “peppers” color image Segmentation Method WM in RGB WM in I1 I2 I3 WM in HSI WM in L*a*b* Mahalanobis distance clustering Histograms + Morph. Watersheds

Corr. Coef. 0.707 0.717 0.708 0.675 0.632 0.594

SNR 14.179 14.931 14.124 14.006 12.917 9.814

segmentation obtained in the corresponding color space, e. g., vectors and fraction thresholds used in RGB color space were w1 (0.454), w2 (0.363), and m1 (1.561); similarly, for the I1 I2 I3 color space, m3 (0.389), w3 (0.384), and w 1 (0.347) were chosen. The 3rd row displays Sobel gradient edge images corresponding to the segmentation produced by the W M method in the RGB and I1 I2 I3 color spaces, a clustering method based on Mahalanobis distance, and a hybrid technique employing histograms and morphological watersheds. The 5th column of Fig. 3 shows from top to bottom, the NTSC grayscale version of the original color image, a 16-level quantization produced by an optimized octree nearest color algorithm, and its corresponding Sobel edge image used as reference for quantitative comparisons (see Table 1). Similarly, Figure 4 displays the segmentation results of additional color images. In each row, the source color image in RGB format is shown left and to the right follows the segmentation obtained in the RGB, I1 I2 I3 , HSI, and L*a*b* color spaces, shown as quantized grayscale images. For example, the corresponding “bear” grayscale image in the I1 I2 I3 color space (2nd row, 3rd column) was

Fig. 3. Top row: color image in RGB, I1 I2 I3 , HSI, and L*a*b* color spaces; 2nd row: segmented images of “red/green” peppers and bright portions of reﬂected light; 3rd row: Sobel edge images of diﬀerent segmentation methods

Lattice Associative Memories for Segmenting Color Images

365

Fig. 4. 1st column: sample RGB color images; 2nd to 5th columns: compound segmented images obtained with the W M method, respectively, in the RGB, I1 I2 I3 , HSI, and L*a*b* color spaces, main regions of interest are quantized

generated by composing the fraction maps obtained from w2 and m2 after thresholding, respectively, at φ = 0.387 and φ = 0.326. Based on the examples given here, the best segmentation results produced by applying the W M method and semi-constrained LM model occur in the I1 I2 I3 space (cf. 2nd column in Fig. 3 and 3rd. column in Fig. 4).

5

Conclusions

This work describes a segmentation method for color images in diﬀerent color spaces based on the lattice auto-associative memories, W and M , whose scaled column vectors deﬁne the most saturated pixels. These extreme points are suitable to perform semi-constrained linear unmixing to determine color fractions of any other pixel. Granular segmented images of all saturated pixels are produced by scaling the fraction data computed with the NNLS method, and coarse segmented images can be obtained by thresholding the corresponding color fraction maps. Examples are given to illustrate visually the results of segmentation and a preliminary comparison was made against two other segmentation techniques. We remark that the LAAMs based approach can be classiﬁed as an unsupervised pixel clustering technique. Future work contemplates additional quantitative evaluation of the proposed method.

References 1. Cheng, H.D., Jain, X.H., Sun, Y., Wang, J.: Color Image Segmentation: Advances and Prospects. Pattern Rec. 34(12), 2259–2281 (2001) 2. Celenk, M., de Haag, M.U.: Optimal Thresholding for Color Images. In: Proc. SPIE, Nonlinear Image Processing IX, San Jose, CA, vol. 3304, pp. 250–259 (1998)

366

G. Urcid, J.C. Valdiviezo-N, and G.X. Ritter

3. Shafarenko, L., Petrou, H., Kittler, J.: Histogram-based Segmentation in a Perceptually Uniform Color Space. IEEE Trans. on Image Processing 7(9), 1354–1358 (1998) 4. Meyer, F.: Color Image Segmentation. In: Proc. IEEE 4th Inter. Conf. on Image Processing and its Applications, pp. 303–306 (1992) 5. Crespo, J., Schafer, R.W.: The Flat Zone Approach and Color Images. In: Serra, J., Soille, P. (eds.) Mathematical Morphology and Its Applications to Image Processing, pp. 85–92. Kluwer Academic, Dordrecht (1994) 6. Liu, J., Yang, Y.-H.: Multiresolution Color Image Segmentation. IEEE Trans. on Pattern Anal. and Mach. Int. 16(7), 689–700 (1994) 7. Park, S.H., Yun, I.D., Lee, S.U.: Color Image Segmentation based on 3-D Clustering: Morphological Approach. Pattern Rec. 31(8), 1061–1076 (1998) 8. G´eraud, T., Strub, P.-Y., Darbon, J.: Color Image Segmentation Based on Automatic Morphological Clustering. In: Proc. IEEE Inter. Conf. on Image Processing, Thessaloniki, Greece, vol. 3, pp. 70–73 (2001) 9. Essaqote, H., Zahid, N., Haddaoui, I., Ettouhami, A.: Color Image Segmentation Based on New Clustering Algorithm and Fuzzy Eigenspace. Research Journal of Applied Sciences 2(8), 853–858 (2007) 10. Koschan, A., Abidi, M.: Digital Color Image Processing, pp. 149–174. John Wiley & Sons, Hoboken (2008) 11. Ritter, G.X., Gader, P.: Fixed Points of Lattice Transforms and Lattice Associative Memories. In: Hawkes, P. (ed.) Advances in Imaging and Electron Physics, vol. 144, pp. 165–242. Elsevier, San Diego (2006) 12. Ritter, G.X., Urcid, G., Schmalz, M.S.: Autonomous Single-Pass Endmember Approximation using Lattice Auto-Associative Memories. Neurocomputing 72(10-12), 2101–2110 (2009) 13. Urcid, G., Valdiviezo-N., J.C.: Color Image Segmentation Based on Lattice AutoAssociative Memories. In: Proc. 13th IASTED Inter. Conf. on Artiﬁcial Intelligence and Soft Computing, pp. 166–173 (2009) 14. Zhang, C., Wang, P.: A New Method for Color Image Segmentation Based on Intensity and Hue Clustering. In: Proc. 15th IEEE Inter. Conf. on Pattern Recognition, vol. 3, pp. 613–616 (2000)

Lattice Neural Networks with Spike Trains Gerhard X. Ritter1 and Gonzalo Urcid2, 1

CISE Department, University of Florida, Gainesville, FL 32611–6120, USA [email protected] 2 Optics Department, INAOE, Tonantzintla, Pue. 72000, Mexico Fax: +52 (222) 247-2940; Tel.: +52 (222) 266-3100 Ext.8205 [email protected]

Abstract. Lattice based neural networks have proven their capability of resolving diﬃcult non-linear problems and have been successfully employed to resolve real-world problems. In this paper we introduce a novel lattice neural net that generalizes previous dendritic models. The new model employs the biological notion of dendritic spines and spike trains. We show by example that it can accomplish tasks previous lattice neural networks were incapable of achieving.

1

Introduction

Despite major advances in artiﬁcial intelligence, humans and other primates easily outperform the best machine vision systems with respect to most measures. For this reason, emulating object and pattern recognition processes in the cortex remains a fascinating and challenging area of research. Early attempts in constructing artiﬁcial neural networks (ANNs) were only partially successful and over time diverged into mathematical systems, such as radial basis function neural nets and support vector machines, that have little in common with biological neural nets. One reason for this divergence is that advances in neurobiology and biophysics of neural information transfer have either not been taken into serious consideration or have been too diﬃcult to implement into practical ANNs. Recent advances in neurobiology have brought to the foreground the importance of dendritic trees, axonal arborization, and spike trains. Less than a decade ago, we started to incorporate some of these concepts into ANNs. One of our early attempts concerned the incorporation of dendritic structures and axonal trees into ANNs [1]. Various researchers consider these structures as the primary basic computational units of the neuron, capable of realizing logical operations. Neurons with dendrites can function as many, almost independent, functional subunits with each being able to implement a rich repertoire of such logic operations as Xor, And, Not, and Or [2,3,4,5]. These logic operations, speed of computation, as well as work by Poggio and colleagues on the Max operator [6,7] are some of the reasons that we used lattice algebra as the main tool for mathematical modeling.

Corresponding author. G. Urcid is grateful with CONACYT for partial ﬁnancial support, grant # 22036.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 367–374, 2010. c Springer-Verlag Berlin Heidelberg 2010

368

G.X. Ritter and G. Urcid

In this paper we extend our earlier model based on dendritic computing to include the concepts of spike trains and spines. In Section 2 we provide a short background of the biological processes necessary for understanding the proposed mathematical model. In Section 3 we deﬁne the dendritic model and discuss the rationale and need for generalizing it. Section 4 introduces the dendritic model with spike trains and provides an example for a better understanding of the new model. We conclude this research work a few pertinent observations.

2

Elements of Neurobiology

To understand the computational neural network models discussed in the next section, some basic knowledge of neurobiology is necessary. We assume that the reader has some basic knowledge of the morphology of a neuron and its processes, namely the axon and its arborization, dendrites, and synapses. A neuron sending an electric impulse to another neuron is called the presynaptic neuron and the neuron receiving the impulse is the postsynaptic neuron. The impulse travels along the presynaptic neuron’s axon and its branches which terminate on the dendrites of postsynaptic neurons. The sites where the axonal ﬁbers terminate are called the synaptic sites or synapses. These are the sites were information from the presynaptic neuron is transferred to the postsynaptic neurons. Dendrites are usually (but not always) studded with large numbers of tiny branches called spines. Dentritic spines are major postsynaptic targets of presynaptic input. The number of synapses on a single neuron ranges between 500 to 200,000 and the number of synapses in the human brain has been estimated to be between 60 and 240 trillion (240 × 1012) residing on 10 to 20 billion neurons. These numbers provide a feel of the scope and immensity of the computational and information processing power of the human brain. After receiving impulses from presynaptic neurons, the dendrites generate smaller impulses to the postsynaptic neural cell body. The impulses received from the dendrites will change the electric potential of the postsynaptic neuron and, thus, possible turning it into a presynaptic neuron for another set of postsynaptic neurons. The impulses or action potentials traveling along the axon of the presynaptic neuron are also known as spikes. As a spike travels along the axon of the presynaptic neuron it will be automatically duplicated at each branch of the axonal tree. Thus, each spike reaches all its targeted synaptic sites on the postsynaptic neuron. A spike train is the time-series of spikes recorded from an individual neuron of the brain within some time interval Δt. Since spikes are measured in milliseconds, the number of spikes in a one second interval can be fairly large with some spikes bunched closely together, so called spike bursts, followed by gaps. This is also known as high frequency firing rate fluctuation. A graphical interpretation of a spike train is shown in Fig.1. The spikes in Fig. 1 are displayed as vertical segments of the same height. The reason for this is that the actual spikes generated by a neuron have basically all the same height and shape. There are no 1/4, 1/2, or 3/4 spikes.

Lattice Neural Networks with Spike Trains

t0

t1

t2 ... ti

ti+1 ...

369

tn = 1

Fig. 1. A one second spike train. The vertical line segments are just symbolic markers of action potentials. mV

neural firing threshold

-45 EPSP

-70

IPSP

t (ms)

Fig. 2. When the totality of the EPSPs and IPSPs exceeds the neuron’s ﬁring threshold, the neuron ﬁres and sends a spike along its axon. Here t is in milliseconds and V in millivolts.

When a spike reaches a synaptic site, it produces a postsynaptic potential (PSP). If this potential results in an increase of the membrane potential of the postsynaptic neuron, then it is called an excitatory PSP (EPSP), and if it leads to hyperpolarization of the membrane potential, then it is called an inhibitory PSP (IPSP). An IPSP reduces the postsynaptic cell’s potential away from its ﬁring threshold. These two reactions of the postsynaptic neuron to spikes are illustrated in Fig. 2. Recent research supports the idea that it is not the ﬁring rate, which corresponds to the number of spikes in a spike train, that is the key in coding and decoding of signals, but that it is the position of spikes, gaps, and spike bursts within the time interval Δt before a targeted postsynaptic neuron ﬁres that is important. Moreover, it is the totality of all spike trains generated by all presynaptic neurons during Δt with terminal axonal ﬁbers on a given postsynaptic neuron that is key to understanding the language by which neurons communicate. Figure 3 illustrates this rhythm for three presynaptic neurons with synapse on a given postsynaptic neuron.

3

Lattice Neural Networks

ANNs whose major computational components are derived from lattice theory, are collectively known as lattice neural networks (LNNs). In the past ten years, these networks have become an extremely active area of research. A model of

370

G.X. Ritter and G. Urcid

Dt3

Dt2

Dt1

N3 N2 N1

Dt

Fig. 3. Spike trains of three presynaptic neurons N1 , N2 , and N3 for time interval Δt = 3i=1 Δti

LNNs with dendritic structures was ﬁrst described in [1]. In this model, as well as later reﬁnements and modiﬁcations, a postsynaptic neuron with dendritic structure receives input from n presynaptic input neurons, N1 , . . . , Nn , whose axons have multiple terminal ﬁbers with knobs on synaptic sites on dendritic branches of their target postsynaptic neurons. The input neurons carry the information of a pattern vector x ∈ IRn , by assigning the pattern feature xi to Ni . The computation at the kth dendritic branch is given by (−1)1− xi + wik , (1) τk (x) = pk i∈I(k) ∈L(i)

where I(k) ⊆ {1, . . . , n} corresponds to the set of all input neurons with terminal ﬁbers that synapse on the kth dendritic branch of the neuron, L(i) ⊆ {0, 1} corresponds to the set of terminal ﬁbers of Ni that synapse on the kth dendrite of the neuron, and pjk ∈ {−1, 1} denotes the EPSP (pjk = 1) or IPSP (pjk = −1) response of the kth dendrite’s membrane that will eﬀect the total membrane potential of the neuron. The superscript on the additive synaptic weight wik 1− can only be ‘0’ or ‘1’ since L(i) ⊆ {0, 1}. Thus, if = 0, then (−1) = −1 provides an additional inhibitory eﬀect, while = 1 provides for an excitatory eﬀect since (−1)0 = 1. However, it also means that in this model at most two synapses are allowed on a given dendritic branch for a presynaptic neuron. The kth-dendrite response τk (x) is passed to the cell body and the state of the postsynaptic cell is a function of the input received from all its dendritic branches. Then, the overall neural response is given by τ (x) = p

K

τk (x),

(2)

k=1

where K denotes the total number of dendritic branches of the neuron and p = ±1 denotes the response of the cell body to the received dendritic input. Here again, p = 1 means that the input is accepted, while p = −1 means that the cell rejects the received input. The appeal of this model is that no multiplication comes into play and the max and min operators provide another aid for extremely fast convergence of training algorithms. At ﬁrst glance, (2) seems to be based

Lattice Neural Networks with Spike Trains

371

x2 2

1

2

1

x1

Fig. 4. A shape that cannot be exactly modeled by a ﬁnite number of rectangles with sides that are orthogonal to the x1 and x2 axis

on only minimums. However, due to the relations −(x ∧ y) = −x ∨ −y and −x ∧ −y = −(x ∨ y) and the use of p = ±1, pk = ±1, and (−1)1− , the maximum function is automatically built into the operations expressed by (1) and (2). Just about all of the techniques for training single layer and multilayer LNNs based on this model focus on the training patterns being enclosed by a series of hyperboxes that are orthogonal to the axis of the data space. These techniques have many desirable properties, including fast convergence, clear geometric interpretation, and 100% accurate classiﬁcation of the training data. Boundaries of these hyperboxes are established in the dendrites and this information ﬂows into the neural body which recognizes the full geometric conﬁguration established by the boundary pieces. A result is that a single layer LNN with only one output neuron can approximate any compact geometric shape in n-dimensional Euclidean space to any degree > 0 of accuracy. This includes connected as well as disconnected conﬁgurations. As most algorithms are derived from slight modiﬁcations of the algorithm given in [1], we will simply refer to them collectively as Algorithm A. A major problem of various Algorithm A approaches is that many shapes cannot be exactly modeled by hyperboxes, but can only be approximated. Consider the triangle in Fig. 4. The region is described by only three lines, but there is no ﬁnite number of rectangles, with sides orthogonal to the x1 and x2 axis, whose union and/or intersections is the triangle. The only way to solve this problem exactly with the use of rectangles whose sides are orthogonal to one of the x1 or x2 axis, is through the use of a postsynaptic neuron with an inﬁnite number of synapses. In order to get around this problem of dealing only with hyperboxes whose faces are orthogonal to the standard basis axis, we constructed orthonormal basis LNNs (OB-LNNs) [8]. In this scheme, rotation matrices are used in order to ﬁnd the best ﬁt hyperbox enclosing the data or training set of interest. The lattice computation of the kth orthonormal basis dendrite can be expressed, based on (1), as follows τk (x) = pk

(−1)1− [(Rk x)i + wik ],

i∈I(k) ∈L(i)

(3)

372

G.X. Ritter and G. Urcid

x2

x2

x1 x1 Fig. 5. Two rectangular boxes containing the triangle from Fig. 4. The box with the dotted boundaries is orthogonal with respect to its dotted basis which is obtained from a 45◦ rotation of the standard basis. The triangle is the intersection of two rectangular boxes.

where Rk is a square matrix whose columns are unit vectors forming an orthonormal basis. Each dendrite now works its own normal basis deﬁned by the matrix Rk . Figure 6 provides a simple visual example. Observe that each rectangular box is orthogonal with respect to its orthonormal basis and is the smallest box containing the (not rotated) data set, which in this case is the triangle, in that basis. The training algorithm for an OB-LNN, here referred to as Algorithm B, given in [8], proved superior to Algorithm A on three data sets, namely the dual spiral separation problem, the Iris data set, and the separation of an ellipse from its complement. However, it also has its own problems in that no box can classify the triangle in Fig. 4. Nevertheless, eliminating some rotated boxes in its complement allows one to carve out the shape of the triangle. Additionally, having a rotation matrix as part of a dendritic operation is somewhat diﬃcult to explain from a biological standpoint. A slightly diﬀerent approach becomes apparent if one looks at each step of Algorithms A and B. In the ﬁrst step, each ﬁnds the smallest box containing the data set of interest, Algorithm A in the standard basis and Algorithm B in another basis obtained via a rotation. It is important to note, that both contain an optimal box with respect its basis and each box contains the test data. Hence their intersection contains the test data and provides a better solution for the ﬁrst step. In a similar way, the next step, namely elimination of points belonging to another class that maybe in the conﬁguration obtained from the intersection can be handled by each algorithm separately and their results again combined. Thus, for instance, the triangle problem is solved in one step by taking the intersection of the ﬁrst step of Algorithms A and B as illustrated in Fig. 5. Since the surfaces thus obtained are piecewise linear, a LNN can be quickly constructed with dendritic structure and synapses growing or being removed according to the results of each combined step. As it turns out, the notion of spike trains are an ideal way to obtain such networks.

Lattice Neural Networks with Spike Trains

4

373

Single Neuron with Spike Trains

For networks involving dendritic branches, spines, spike trains, synapses, and time delays, we will use the following assumptions. The synaptic weights for two terminal ﬁbers whose terminal knobs impinge on each other or share the same synaptic site, are the same. When we refer to spines we assume that they contain one or more synapses and the information transfer that occurs on the spine within a small time interval (say 1ms to 5ms) gets summed before ﬂowing down the dendritic branch toward the soma of the neuron. We assume that the time interval Δt has been subdivided into these smaller time intervals Δth of equal lengths. Additionally, we assume that the postsynaptic neuron has K distinct dendritic branches d1 , . . . , dK , and each branch dk has rk spines, where the rth spine on dk is denoted by σ(r, k). Finally, the input x ∈ Rn resides in the input neurons Ni with xi ∈ Ni . The lattice algebraic formulation for dendritic computing with spike trains is expressed as τk (x, Δt) = pk

rk m h=1 r=1

r (−1)1−(r,i) si (Δth )(xi + wik ) ,

(4)

i∈I(k,r)

where m denotes the number of subintervals Δth , I(k, r) is the set of all integers r i for which the presynaptic neuron Ni has a synapse on σ(r, k) and wik is the additive weight associated with this synapse,and si (Δth ) denotes the number of spikes generated by Ni during the time Δth . The PSP factor pk = ±1 is determined during training and so is (r, i) ∈ {0, 1}, which depends on both r and i. The postsynaptic neuron collects the information generated by its dendrites over the time interval Δt and computes the output τ (x, Δt) = p

K

τk (x, Δt),

(5)

k=1

where p = ±1 is determined during training. Training is accomplished by applying Algorithms A and B as outlined in the last paragraph of the preceding section. First trials on the Iris data set showed that when using 60% of the data, an error rate of 5.3% resulted when testing the full data set. This compares with error rates of 6.41%, 10%, and 12, 21% for Algorithms B, A, and a multilayer perceptron, respectively, when using the same training data. The triangle problem provides for a simple toy example. Since the algorithm stops after step 1, each Algorithm A and B produce an LNN having two input neurons and one output neuron with two dendrites. These are combined into one LNN with two dendrites. However, the number of axonal ﬁbers changes and so do some synaptic weights. Since the new conﬁguration has only three sides, only three boundaries need to be encoded, thus reducing the number of synapses. The step number of the algorithm provides the delay time interval which in this case is Δt1 = Δt and during this time only one spike from each Ni is needed as each variable xi is used only once in the step 1 computation. In many other cases more than one spike is needed (spike burst) within a small interval. With this in mind,

374

G.X. Ritter and G. Urcid

x2

N2

0

d2 M

-2

x1

N1

0

y

d1

Fig. 6. A single LNN that solves the triangle problem. We assume that the two terminal ﬁbers synapsing on d2 have synapses on the same spine.

and the fact that r1 = 2, τ1 (Δt1 ) computes τ1 (Δt1 ) = (x1 − 0) ∧ [−(x1 − 2)], while τ2 (Δt1 ) = (x1 − 0) + [−(x2 + 0)] = x1 − x2 . Hence τ (x, Δt) ≥ 0 if and only if x1 ≥ 0 and x1 ≤ 2 and x1 ≥ x2 . That is, τ (x, Δt) ≥ 0 if and only if x is in the triangle. This means that the LNN depicted in Fig. 6 can recognize the triangle exactly.

5

Conclusions

We developed a novel LNN that incorporates the notion of dendrites, dendritic spines, and spine trains. This brings ANNs a little closer to the biological model. Initial testing shows superiority over preceding LNN models as well as the perceptrons. Further testing and comparisons need to be done in the future.

References 1. Ritter, G.X., Urcid, G.: Lattice Algebra Approach to Single-Neuron Computation. IEEE Trans. on Neural Networks 14(2), 282–295 (2003) 2. Holmes, W.R., Rall, W.: Electronic Models of Neuron Dendrites and Single Neuron Computation. In: McKenna, T., Davis, J., Zornetzer, F. (eds.) Single Neuron Computation, pp. 7–25. Academic Press, New York (1992) 3. Koch, C., Poggio, T.: Multiplying with Synapses. In: McKenna, T., Davis, J., Zornetzer, F. (eds.) Single Neuron Computation, pp. 315–345. Academic Press, New York (1992) 4. Koch, C.: Biophysics of Computation: Information Processing in Single Neurons. Oxford University Press, Oxford (1999) 5. Mel, B.W.: Why have Dendrites? A Computational Perspective. In: Dendrites, S.G., Spruston, N., Hausser, M.D. (eds.) pp. 271–228. Oxford University Press, Oxford (1999) 6. Yu, A.J., Giese, M.A., Poggio, T.A.: Biophysiologically Plausible Implementations of the Maximum Operation. Neural Computation 14(12), 2857–2881 (2002) 7. Lampl, I., Ferster, D., Poggio, T., Riesenhuber, M.: Intracellular Measurements of Spatial Integration and the Max Operation in Complex Cells of the Cat Primary Visual Cortex. Journal of Neurophysiology 92, 2704–2713 (2004) 8. Barmpoutis, A., Ritter, G.X.: Orthonormal Basis Lattice Neural Networks. In: Kaburlasos, V.G., Ritter, G.X. (eds.) Computational Intelligence based on Lattice Theory. SCI, vol. 67, pp. 45–58. Springer, Berlin (2007)

Detecting Features from Confusion Matrices Using Generalized Formal Concept Analysis Carmen Pel´aez-Moreno and Francisco J. Valverde-Albacete Dpto. de Teor´ıa de la Se˜ nal y de las Comunicaciones, Universidad Carlos III de Madrid Avda. de la Universidad, 30. Legan´es 28911. Spain {carmen,fva}@tsc.uc3m.es

Abstract. We claim that the confusion matrices of multiclass problems can be analyzed by means of a generalization of Formal Concept Analysis to obtain symbolic information about the feature sets of the underlying classiﬁcation task. We prove our claims by analyzing the confusion matrices of human speech perception experiments and comparing our results to those elicited by experts.

1

Motivation

For n, p ∈ N, let G = {gi }ni=1 be a set of input labels or stimuli and M = {mj }pj=1 a set of output labels or responses for a multiclass classiﬁer task embodied in a human or artiﬁcial agent. Consider the joint event “presenting a stimulus gi to a classiﬁer and obtaining response mj ,” (G = gi , M = mj ) . A contingency table or confusion matrix (CM) for the classiﬁer C ∈ Nn×p is a record of the decisions of N repetitions of such an experiment1 . Confusion matrices are rich summaries of how the classiﬁer performed in a test set. This is usually transformed into an aggregate ﬁgure of merit, like accuracy, or a visual depiction, like a multi-class ROC, thereby losing information about the particular errors the classiﬁer may commit. We contend that some information about the underlying task can be obtained from the numerical data in the confusion matrix via a special type of biclustering scheme, a concept lattice, from Formal Concept Analysis (FCA) [1]. Furthermore, concept lattices allow us both to observe the global behavior of classiﬁers and to analyze their confusions in detail. FCA, unfortunately, cannot deal in an automatic way with non-binary incidences, but generalizations of it to cater for the notion of degree of incidence have been developed [2,3,4,5,6]. In this paper, we use K-Formal Concept Analysis (kFCA) [5,7], which enables the analysis of practical real-valued CM by embedding them into an idempotent 1

This work has been supported by Spanish Government-Comisi´ on Interministerial de Ciencia y Tecnolog´ıa projects TEC2008-02473/TEC and TEC2008-06382/TEC. We consider here the general case where the labels used in the training speech samples diﬀer from those considered by the recognizer.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 375–382, 2010. c Springer-Verlag Berlin Heidelberg 2010

376

C. Pel´ aez-Moreno and F.J. Valverde-Albacete

semiﬁeld K—actually a bounded lattice-ordered group [8]—, to try and prove that a concept lattice can elicit a symbolic description of the features being used in the classiﬁcation process and how they are misused by the classiﬁer.

2

Generalized Formal Concept Analysis of Confusion Matrices

From count matrix to ϕ-confusion lattice. To illustrate K-Formal Concept Analysis of confusion matrices, consider that of Fig. 1(a). The ﬁrst design choice is to ﬁnd an adequate domain to express the strength of confusions. From a count matrix NGM we may obtain an estimate of the mutual information distribution for the events CˆGM , like that of Fig. 1(b). A proper choice for the semiring in K-Formal Concept Analysis is Rmax,+ (read “completed max-plus”). This is the completed set of reals with the “max” operation used as addition and normal addition as multiplication.

N p m t f th k s

s

ˆ C

150 0 38 7 13 88 0 0 201 0 0 0 0 0 30 0 193 1 0 28 0 4 1 3 199 46 5 4 11 0 6 85 114 4 10 86 0 45 4 1 138 0 0 0 2 5 38 1 170

p m t f th k s

p m

t

f th

k

(a) NGM

p

m

t

f

th

k

s

2.851 −∞ 0.824 -1.717 -0.305 2.155 −∞ −∞ 4.202 −∞ −∞ −∞ −∞ −∞ 0.761 −∞ 3.401 -4.292 −∞ 0.735 −∞ -2.213 -3.793 -2.674 3.277 1.683 -1.817 -1.626 -0.567 −∞ -1.487 2.236 3.179 -1.953 -0.117 2.149 −∞ 1.169 -2.424 -3.904 2.905 −∞ −∞ −∞ -3.047 -1.826 1.619 -3.928 3.995

(b) CˆGM

p m t f th k s p × m t ×

×

×

×

×

×

f

× ×

th

× ×

k × s

×

× ×

×

(c) (G, M, I + C,ϕ )Rmax,+

(d) B(G, M, I + C,ϕ )Rmax,+

Fig. 1. Example analysis using kFCA: (a) count confusion matrix, obtained from the Miller and Nicely experiments [9] for snr = 0dB—only phonemes G = M = {/m/, /p/, /t/, /k/, /f /, /s/, /th/} have been retained as both stimuli (left) and responses (above); (b) its mutual information distribution; (c) structural matrix and (d) structural lattice for ϕ = 0.056585

Detecting Features from Confusion Matrices Using Generalized FCA

377

For n, p ∈ N, given two sets of stimuli G = {gi }ni=1 , and responses M = n×p {mj }pj=1 , and a Rmax,+ -valued matrix C ∈ Rmax,+ , the triple (G, M, C)Rmax,+ is called a Rmax,+ -valued formal context, where C(i, j) = λ reads as “stimulus gi is confused with response mj to degree λ” and dually “response gj is evoked by stimulus mi in degree λ”. We may associated multi-valued sets of stimuli A and responses B by means n p p n + + of a pair of functions (·)C,ϕ : Rmax,+ → Rmax,+ and C,ϕ (·) : Rmax,+ → Rmax,+ forming a Galois connection [1,7] as follows: deﬁne ϕ-concepts as pairs (A, B)ϕ + + such that (A)C,ϕ = B ⇐⇒ A = C,ϕ (B). The Basic Theorem of K-Formal Concept Analysis asserts that the set of formal ϕ-concepts is a complete lattice Bϕ (G, M, C)Rmax,+ (see [5,7] for details). The parameter ϕ ∈ R is called the threshold of existence and it describes a minimum degree of confusion required for concepts to be considered members of the Bϕ (G, M, C)Rmax,+ . Structural Confusion Lattices. The ϕ-concept lattice Bϕ (G, M, C)Rmax,+ has a huge number of concepts (inﬁnite, in the typical case) and is hard to visualize. Therefore, for each choice of ϕ deemed interesting, we introduce its structural (confusion) lattice B(G, M, I + C,ϕ ) , the (standard) confusion lattice of the binary incidence, I + , depicting only those concepts above a ﬁxed threshold C,ϕ of existence ϕ. The following lattice exploration algorithm must be carried out once for each choice of ϕ 2 : + 1. Work out the concepts γ(gi )+ C,ϕ and μ(mj )C,ϕ associated to singleton stimuli and responses, respectively. 2. Build a binary incidence I + C,ϕ associated to those concepts by adequately comparing them to create the binary context (G, M, I + C,ϕ ) with the binary + + + incidence gi I C,ϕ mi ⇐⇒ γ(gi )C,ϕ ≤ μ(mj )C,ϕ . 3. Use a standard tool for Formal Concept Analysis, called ConExp [11], to build and visualize the structural concept lattice at ϕ, B(G, M, I + C,ϕ ) .

Structural confusion lattice interpretation. For a boolean confusion matrix I—such as that of Fig. 1(c)— the triple (G, M, I) is called a formal context, and assumed to encode all information pertaining to the phenomenon being analyzed. Pairs of a particular set of stimuli that are all confused with a particular set of responses, and vice versa, are called formal concepts. For instance, c1 = ({/s/}, {/s/, /th/}) is one such pairs for the context above, and c2 = ({/s/, /f /, /th/}, {/th/}) another. The set of stimuli in a concept is called the extent and the set of responses is the intent of the concept: {/s/} and {/s/, /th/}, are the extent and intent, respectively of c1 , meaning stimulus /s/ is confused with responses /s/ and /th/. To distinguish between stimuli and responses, boldface characters will be used for the former throughout the text. Concepts are partially ordered by inclusion of extents, or, equivalently, reverse inclusion of intents: if (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 ⇔ B1 ⊇ B2 we say that 2

An on-line demonstration of this can be accessed in [10].

378

C. Pel´ aez-Moreno and F.J. Valverde-Albacete

the ﬁrst concept is more speciﬁc (less general ) than the second. For instance, c1 is more speciﬁc than c2 . The Basic Theorem of Formal Concept Analysis asserts that the set of formal concepts of a formal context, as related by this order relation, is a complete lattice called the concept lattice B(G, M, I) . In the Hasse diagram of a confusion lattice, stimulus labels appear in white boxes just below the corresponding concept and response labels usually appear in gray boxes just above. To diminish visual clutter, instead of completely labeling each node with all labels of either sort we put the label of each response only in the highest—most abstract—concept it appears, and the label of each stimulus only in the lowest—most speciﬁc—concept it appears. This is the reduced labeling shown in Fig. 1(d). In this labeling scheme, concepts capture the confusions between more phones than those that actually appear attached to the concept. To recover the confusion extent, the set of stimuli being confused at a particular concept, we take the union of all stimulus labels found from the node downwards in the lattice. Similarly, to build the confusion intent, we take the union of all response labels found from the node upwards in the lattice . In the example, if we go from c1 downwards in the lattice collecting stimulus labels (below the nodes) we obtain its extent {/s/}, and if we go upwards we ﬁnd the labels in its intent, /s/ (above c1 itself) and /th/ (above c2 ). There are two types of complementary, domain-speciﬁc information that can be gleaned from a lattice: speciﬁc concept information and overall lattice information. As to the ﬁrst, the most interesting concepts are the join-irreducible concepts (bottom half-ﬁlled in black in Fig. 1(d)), and meet-irreducible concepts (top half-ﬁlled in gray, blue online). Call the rest of the concepts in the example lattice cptk = ({/p/, /t/, /k/}, {/p/, /t/, /k/}), cm = ({/m/}, {/m/}), and cf th = ({/f /, /th/}, {/f /, /th/}) . The set of join-irreducibles is J = {cptk , cf th , c1 , cm }, and the set of meet-irreducibles is M = {cptk , cf th , c1 , c2 , cm }. In confusion lattices, join-irreducibles, always annotated with a stimulus label, are the concepts to peruse in order to know what responses each individual stimulus invokes. And likewise, meet irreducibles, annotated with response labels, show what set of stimuli evokes a particular response. Regarding overall information about the matrix, consider the three separate sublattices of Fig. 1(d) including, the ﬁrst, concepts top, bottom and cptk , to the left; the second, concepts top, bottom and cm , to the right; and the third, concepts top, bottom, c1 , c2 and cf th , at the center. Concepts in diﬀerent sublattices are incomparable except for the top and bottom. We will say that such sublattices are adjoined factor sublattices of the confusion lattice. Notice that stimuli and responses that lie in adjoined factor sublattices are never confused, hence the presence of some adjoined sublattices in the confusion lattice is essentially the lattice-theoretic manifestation of as many diﬀerent virtual channels in the classiﬁer system. By this we mean that the classiﬁer succeeds in conveying deﬁnite information from input to output without error. In the example, the channels for {/m/}, {/f /, /s/, /th/} and {/p/, /t/, /k/} seem evident.

Detecting Features from Confusion Matrices Using Generalized FCA

3

379

The Elicitation of Symbolic Knowledge from Phonetic Confusion Matrices

Confusion matrices became a key tool for the analysis of human speech perception since the Miller & Nicely experiments [9]. After a thorough analysis their major conclusion was that phone recognition is grounded on hierarchic categorical discrimination, that is, English consonant sounds form groups identiﬁed in terms of hierarchical clusters of articulatory features. They introduced the notion of virtual articulatory communication channels, according to such clusterings, and posited that the channels were characterized by ﬁve distinctive acousticarticulatory features, namely, voicing, nasality, aﬀrication, duration and place. In the following we will try to reproduce these results using K-Formal Concept Analysis of perceptual confusion matrices. To assess the complexity of the structural confusion lattices, we have worked out the concept counts deﬁned at diﬀerent thresholds ϕ . The concept count represent the number of nodes present in the corresponding structural lattice, and provides a rough measure of the complexity of the resulting representation. Small values of the threshold ϕ bring into the picture non-systematic, diﬃcult to explain, confusions evident in the analysis of structural lattices with a high number of nodes. On the contrary, if a larger value of ϕ is chosen, the number of concepts will be reduced oﬀering a much simpler structural lattice showing the most prominent confusions. The diﬀerent plots of Fig. 2 represent the evolution of the number of concepts for several Signal to Noise Ratios (SNR) and a full 200 − 6 500Hz band for the Miller and Nicely experiments. We can clearly notice how the maximum number of concepts attained by each plot is inversely related to the SNR of the emitted syllables. Therefore, the confusion lattice analysis is capturing the complexity of the CM that corresponds to each SNR: as the speech signal quality gets better the errors become more systematic or structured and therefore the number of concepts decreases. The evolution of the number of concepts suggests a method for describing the information in structural lattices: 1. Begin by observing the most salient properties of the system, that is, those lattices obtained with higher values of the threshold ϕ. 2. Subsequently, try to bring more detail into the picture by sweeping from higher to lower values of ϕ (from right to left in ﬁgures 2). We thus obtain a sequence of structural lattices starting from the least complex— with the least number of concepts—and gradually increasing the complexity as new concepts appear. Figure 3 is a typical structural lattice for the Miller & Nicely experiments at 0dB and a particular ϕ where six adjoined factor sublattices can be observed. To the left the voiced phonemes with /m/ and /n/, the nasals, even represented in two separate sublattices. To the right, three sublattices representing unvoiced phonemes: the (oral) stops /p-t-k/, fricative /sh/ and the rest of fricatives.

380

C. Pel´ aez-Moreno and F.J. Valverde-Albacete

Fig. 2. (Color online) Number of concepts vs. ϕ for HSR confusion matrices (data from [9]). The maximum number of concepts attained by each plot is inversely related to the SNR of the emitted syllables.

Fig. 3. Phonetic confusion lattice at ϕ = 0.11716 and SNR = 0dB (data from [9])

Hence our hypothesis is that adjoined factor sublattices in a structural confusion lattice reﬂect virtual feature transmission channels. Since this has to be contrasted to the Miller and Nicely ﬁndings, a direct method to elicit what phonetic knowledge the sublattices reﬂect would be to show the stimuli and responses in each lattice. This would demand, afterwards, the concourse of a phonetic expert to elicit the features. However, a clustering of phonemes in terms of their voicing, manner and place of articulation can also be cast into an Formal Concept Analysis concept lattice as shown at the top of Fig. 4(a)—showing two phonemes for each feature that

Detecting Features from Confusion Matrices Using Generalized FCA

381

Fig. 4. Phonemes vs. articulatory features concept lattices: (a) canonical clustering with unvoiced sound concepts on the left and voiced ones on the right; (b) clustering elicited from the confusion lattice of Fig. 3; (c) id. including the place feature

correspond to unvoiced (on the left) and voiced sounds (on the right). We may use this knowledge to label the structural lattices automatically by selecting the feature label adequate for each phonemic concept extent. The lattices at the bottom of ﬁgure 4 demonstrate which part of the clustering can be actually elicited from the confusion matrix in Fig. 3. Voicing, manner of articulation—stop, nasal, fricative—can be obtained almost without error as shown in Fig. 4(b), although clear mismatches between the canonical and empirically induced representations can be observed: /b/ and /g/ are perceived as fricative, /z/ as stop. But place of articulation is hopeless as Fig. 4(c) shows. In fact, labiodental and velar can not be deﬁned at all. This agrees in all with the Miller & Nicely conclusions, except for result in place of articulation, which has often been disputed.

4

Conclusions

We have provided evidence that Rmax,+ -Formal Concept Analysis of confusion data for a multiple-classiﬁcation task can identify features present in the classiﬁcation act. Since our generalization considers non-binary matrices in the analysis, it is ideally suited to the analysis of count confusion matrices.

382

C. Pel´ aez-Moreno and F.J. Valverde-Albacete

After a preprocessing stage which amounts to considering the confusion matrix as a joint-distribution of input stimuli and output responses, we are able to pinpoint adjoined sublattices in the concept lattice which we take as evidence that some deﬁnite feature is being transmitted. For assessment purposes, we also elicited these features using conventional articulatory acoustic knowledge. Our results agree with expert-drawn conclusions in all but the most contested ones, what we take to reﬂect the robustness of the elicitation process.

References 1. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999) 2. Burusco, A., Fuentes-Gonz´ alez, R.: The study of the L-fuzzy Concept Lattice. Mathware and Soft Computing 1(3), 209–218 (1994) 3. Bˇelohl´ avek, R.: Lattice generated by binary fuzzy relations. Tatra Mt. Mathematical Publications 16, 11–19 (1999) 4. Krajci, S.: A generalized concept lattice. Logic Journal of IGPL 13, 543 (2005) 5. Valverde-Albacete, F.J., Pel´ aez-Moreno, C.: Towards a generalisation of Formal Concept Analysis for data mining purposes. In: Missaoui, R., Schmidt, J. (eds.) Formal Concept Analysis. LNCS (LNAI), vol. 3874, pp. 161–176. Springer, Heidelberg (2006) 6. Medina, J., Ojeda-Aciego, M., Ruiz-Calvi˜ no, J.: Formal concept analysis via multiadjoint concept lattices. Fuzzy Sets and Systems 160, 130–144 (2009) 7. Valverde-Albacete, F.J., Pel´ aez-Moreno, C.: Galois connections between semimodules and applications in data mining. In: Kuznetsov, S.O., Schmidt, S. (eds.) ICFCA 2007. LNCS (LNAI), vol. 4390, pp. 181–196. Springer, Heidelberg (2007) 8. Cuninghame-Green, R.: Minimax Algebra. Lecture notes in Economics and Mathematical Systems, vol. 166. Springer, Heidelberg (1979) 9. Miller, G.A., Nicely, P.E.: An analysis of perceptual confusions among some English consonants. Journal of the Acoustic Society of America 27, 338–352 (1955) 10. Esteban-Alonso, V., Valverde-Albacete, F.J., Pel´ aez-Moreno, C.: Generalised Formal Concept Analysis demo (2008) (date last viewed 28/02/2010) 11. Yevtushenko, S.A.: System of data analysis Concept Explorer. In: Proceedings of the 7th National Conference on Artiﬁcial Intelligence KII 2000, pp. 127–134 (2000) (in Russian), http://sourceforge.net/projects/conexp

Reconciling Knowledge in Social Tagging Web Services Gonzalo A. Aranda-Corral1 and Joaqu´ın Borrego-D´ıaz2 1

2

Universidad de Huelva. Department of Information Technology, Crta. Palos de La Frontera s/n. 21819 Palos de La Frontera Universidad de Sevilla. Department of Computer Science and Artiﬁcial Intelligence, Avda. Reina Mercedes s/n. 41012 Sevilla. Spain [email protected], [email protected]

Abstract. Sometimes we want to search for new information about topics but we can not ﬁnd relevant results using our own knowledge (for example, our personal bookmarks). A potential solution could be the use of knowledge from other users to ﬁnd what we are searching for. This solution implies that we can achieve some agreement on implicit semantics used by the other users. We call it Reconciliation of Knowledge. The aim of this paper is to show an agent-based method which lets us reconcile two diﬀerent knowledge basis (associated with tagging systems) into a common language, obtaining a new one that allows the reconcilitiation of (part of) this knowledge. The agents use Formal Concept Analysis concepts and tools and it has been implemented on the JADE multiagent platform.

1

Introduction

The amazing growth of Web 2.0 provides powerful technologies for sharing information among users (members of social networks) as, for example, the social indexing of the digital objects of the Web. Collaborative tagging represents a very useful process for users that aim to add metadata to documents, objects, resources, urls, etc. Among other applications, the tagging enable users to achieve personal knowledge organization according to their own interests. Additionally, the Web 2.0 systems can extract (by means of Collective Intelligence methods) some global organization of the information (from a user’s personal point of view). This way the collaborative tagging oﬀers a pragmatic alternative to the semantic web ontologies. However, the gap between the personal organization of information and the global one (as well as between that of diﬀerent users) makes the use of automated methods to reconcile them diﬃcult. These diﬀerent ways are combined in tagging tools that the tag-based platform facilitates. This situations leads to a crowd of tagging systems. Moreover, inside of the platform and due to the preferences of the users, diﬀerent tagging

Partially supported by TIN2009-09492 project of Spanish Ministry of Science and Innovation, coﬁnanced with FEDER founds.

E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 383–390, 2010. c Springer-Verlag Berlin Heidelberg 2010

384

G.A. Aranda-Corral and J. Borrego-D´ıaz

behaviours exist that actually obstruct the automated interoperability among tag sets. Despite the fact that the systems oﬀer solutions to aid the understanding of the folksonomy that the users collectively build (tag clouds, tools based on related tag ideas, collective intelligence methods, data mining, etc.) Although tagging shows potential beneﬁts, personal organization of information leads to implicit logical conditions that often diﬀer from the global one. Tagging provides a sort of weak organisation of the information, very useful, but mediated by the user’s behaviour. Therefore, it is also possible that user’s tags associated with an object do not agree with the other users tags. Formal Concept Analysis (FCA) is a mathematical tool that, applied to tagging systems, makes explicit the set of concepts that the user manages in tagging, as well as the structure of the relationship between them [7]. The concept lattice (a mathematical structure extracted by FCA methods) represents an intermediate structure between tagging (nonhierarchical and inclusive) and classical taxonomies (hierarchical and exclusive). Thus, FCA is useful to bridge the semantic gap providing a solid mathematical theory to tagging [7]. 1.1

Motivation

Since that user’s tagging reﬂect their own set of concepts about the documents, two of the main tools of FCA, namely the concept lattice and Stem basis, shows distinct results for diﬀerent users (semantic heterogenity). From the point of view of navigation by means of tags, the semantic heterogeneity makes the activity insecure. Thus, to ensure an eﬃcient use of another user’s tags, some reasoning on tags must be performed, in order to achieve some consensus (also represented by FCA tools) that allows navigation between diﬀerent concept structures. In this scenario, it could be very important to attempt to delegate these tasks to intelligent agents (see ﬁg. 1). Our aim is to show how the authors have solved this problem. The solution presented in this paper was designed in the framework of Mobile Web 2.0 project (Mowento), although the solution proposed is valid for any tagging system (in fact, in this paper the method is applied to a well known social bookmarking platform, Delicious1 ). The aim of Mowento is that anyone can publish content (on the WWW) both videos and photos from anywhere at anytime, without needing a next-generation mobile device[2]. Mowento allows users to annotate basic information semantically. This annotation is in principle very limited due to the usability of nonadvanced mobile devices, which do not allow the use of complex applications for tagging. We address the challenge of creating a simple and eﬀective labeling method for the content, which should be able to be properly labeled with a few clicks. The method consists of a series of hierarchically arranged menus whose construction algorithm is based on the Formal Concepts Analysis [1]. From the point of view of the Mowento server, the information is received and is automatically entered into a database, pending processing. From here, the multiagent system (programmed on JADE2 ) takes control of the process and performs its 1 2

http://delicious.com http://jade.tilab.com/

Reconciling Knowledge in Social Tagging Web Services

385

Fig. 1. Knowledge conciliation in social bookmarking represented by concept lattices

tasks. In this context several tagging problems have been solved by means of agents. Among others, this is the main aim of this paper, namely the agentbased reconciliation method. The solution method presented in this paper is also applicable to platforms with tag-based organization of information such as Delicious, which will be used as an example. Mowento is in experimental phase, and user generated content allocated in the project does not provide representative examples while personal bookmarks of authors in Delicious represent a good sample for showing results.

2

Tagging and Heterogenity

In the case of bookmarking systems as Delicious, diﬀerent features and users’ behaviours represent a similar problem to one faced in the Mowento project: how to organise folksonomies by means formal ontologies (or better, ontologies on user’s tags). Although tagging is useful to navigate among pages on the WWW, it can not be considered as a robust knowledge organization method. Some methods exist to integrate this kind of knowledge organization into SW realm [10]. These methods can be classiﬁed according to the semantics associated with tag sets (or folksonomies). For example, there are methods based on ontological deﬁnition of tags which use ad hoc ontologies, in order to formally describe the properties of tags (see [8]). Other methods are based on transforming folksonomies into ontologies (see, e.g., [12]), including ontologies designed for dealing with folksonomies [6] or more concrete proposals, as in [9]. 2.1

Heterogeneity

As is argued in [5], tagging is fundamentally about sensemaking, a process in which information is categorized, labeled and, critically, through which meaning emerges [13]. Even in a personal tagging structure, boundaries of concepts and categories are vague, so some items are doubtfully labeled. Lastly, users also use the tagging for their own beneﬁt, but nevertheless constitute a useful public good ([5]).

386

G.A. Aranda-Corral and J. Borrego-D´ıaz

There exist several limitations to collaborative tagging in sites such as Delicious. The ﬁrst one is that a tag can be used to refer to diﬀerent concepts; that is, there is a context dependent feature of the tag associated with the user. This dependence limits both the eﬀectiveness and adequacy of collaborative tagging. The limitation is called ”Context Dependent Knowledge Heterogeneity” (CDKH). A second is the Classical Ambiguity (CA) of terms, inherited from natural language and/or the consideration of diﬀerent ”basic levels” among users [11][5]. CA would not be critical when users work with urls (content of url induces, in fact, a disambiguation of terms because of its speciﬁc topic). In this case, the contextualization of tags in a graph structure (by means of clustering analysis) distinguishes the diﬀerent terms associated with the same tag [3]. However, CDKH is associated with concept structures that users do not represent in the system, but that FCA can extract. It is also possible CDKH is associated with the potential future use of the tagging (it can be used for classifying documents, for facilitating navigation among visited urls, to collect speciﬁc and temporal urls, etc.). Thus, navigation among concept structures of diﬀerent users faced up with CDKH. In the case of platforms such as Mowento, CDKH is a less important problem than with collaborative tagging such as Delicious. This is due to both the speciﬁc scope of the activities (reporting testimonials about events) and the common language represented by the tags oﬀered by Mowento’s mobile tagging widget. In Mowento, CDKH can occur only in speciﬁc concepts of the personal concept lattice. Thus, reconcilitation is easier than collaborative tagging systems. However, in sites such as Delicious CDKH represents the main problem, because tags perform several functions as bookmarks (see [5]).

3

Agent-Based Reconciliation Knowledge Algorithm

To implement the algorithm, a solution has been chosen based on a multiagent system, which make the extension and distribution of our algorithm no big eﬀort. The multiagent system has to satisfy some requirements, such as to be FIPA compliant, in order to facilitate communication and integration with other multiagent systems. We also thought that it should be, as far as possible, open source. Jade was selected since it is composed of a set of tools for developing agents and an execution platform where the agents can live. Another major point in this decision was that it is developed in Java, a multiplatform language. The reconciliation algorithm consists of the following sequence of steps (see ﬁg. 2): 1. Agent creation step: It starts by creating two Jade agents, passing the agent names and Delicious data as parameters. They know the existence of each other within the platform, so it is not necessary to search -at Service Directory level, managed by the Directory Facilitator agent- another agent that oﬀers the reconciliation service. White Pages registration is transparent to developers because it is already implemented in the Jade toolkit. 2. Building formal contexts and Stem basis: In this step, the agents work in parallel mode, with no interaction, by loading and setting their own knowledge base (KB). They work with the formal context which is built from the

Reconciling Knowledge in Social Tagging Web Services

387

Fig. 2. Reconciliation algorithm

Delicious downloaded information, where the objects are the urls and the attributes are the associated tags. With data, the context is built, and concepts and Stem Basis (SB) are extracted. To obtain such elements it has been integrated with the Concept Explorer tool, ConExp3 , which provides all the FCA algorithms that we need. It is developed in Java, it allows us a fast deployment of FCA algorithms. ConExp comes as a compressed ﬁle ”.jar” to be included in the classpath application, and from there, we can instantiate the necessary objects for the computations. 3. Initializing agent dialogue step: Once the agent is initialized, he has to execute a double task related with communication. On one side, the agent sends its own language (attribute set) to the other agent. On the other side, the agent prepares itself for the reception of the same kind message from the other agent. For agent communication, we try to adjust the intention to FIPA performatives and its meaning, so that each message is associated with the best one, according to the content. Speciﬁcally, the sending of one language of an agent to another one is done through the INFORM performative. 4. Restrictions of own formal contexts: After this brief communication, agents reduce their languages (the attribute set) to the common language, restricting their formal contexts to that language. This restriction also implies that many objects are now outside of the restriction of the context, because it has discarded those that are not labelled with any common tags, and these contribute nothing to our knowledge base. With the restricted contexts, agents compute the new concept lattices, as well as their concepts and the Stem basis. 5. Synthesizing the production system from Stem Basis: From the stem basis, calculated in the previous step, agents consider the rules that have a support greater than zero. In this paper we call this set of implications Stem Kernel Basis (SKB). Based on the SKB, a production system is synthesized, 3

http://sourceforge.net/projects/conexp/

388

G.A. Aranda-Corral and J. Borrego-D´ıaz

that it will serve later to suggest to the other agent the changes to objects so they can be accepted by the common ground. This production system (used for the new tags’ suggestions) has been completely implemented, because the inference engine requirements were few and not worth the eﬀort to integrate with any other engine, such as Jess4 or Drools5 . 6. Knowledge negotiation between agents: To execute this step, a phase of implementation is necessary, which is clearly multiagent in character , in which a deep agent communication/negotiation is produced. Though a turn-based communication or alternating shifts could have implemented, a more asynchronous one that respects even more, the multiagent philosophy is preferred. The reason is that usual scenario consists of agents’ KB of diﬀerent sizes, so the communication needs of each agent will be diﬀerent. – The negotiation begins with the creation, by each agent of a new context where the common knowledge will be stored and will produce the results of the reconciliation. Then, a massive sending of all objects (associated tags included) to the other agent is performed and it waits for the objects and responses from it. All of these sent messages are described by the PROPOSE performative. – When an agent receives an object from the other one, we check whether the object satisﬁes all the implications of the agent’s SKB, and if so, it includes it within the common context and it also sends an acceptance message to another agent (ACCEPT PROPOSAL performative) so it can also include it in its common context. – If the object does not meet SKB, it introduces it into the production system, created from the SKB, and checks if any of the attributes obtained can be added to the object in order to be accepted by the SKB. This object is then sent back to the other agent as a “new object”, restarting the negotiation about this object. If any suggestions are returned by the production system, we will send a message of rejection (REJECT PROPOSAL performative) to the other agent to proceed to remove the object, as we did. – Once made the whole process of message exchanging and negotiation has ﬁnished, the agents will get a common context. So it can extract new concepts and suggestions from the stem basis. These represent a shared conceptualization 3.1

Example

As we explain above, Delicious has been chosen as a test environment to illustrate the method. For reasons of paper length, it is not possible to show the trace of the method. For the experiment, authors’ accounts has been selected in Delicious (http://delicious.com/garanda and http://delicious.com/jborrego), which share common interests. This us ﬁnd a signiﬁcant common language, and 4 5

http://www.jessrules.com/ http://www.jboss.org/drools/

Reconciling Knowledge in Social Tagging Web Services

User jborrego garanda Language 381 137 Bookmarks 358 536

389

User jborrego garanda Common Tags 19 Bookmarks 131 114 Implications 11 11

Fig. 3. User data statistics before(left) and after(right) reduce to common language User A ( t u t o r i a l ) ( r o b o t i c s ) −−> ( a i ) ( t w i t t e r ) −−> ( s o c i a l ) ( web2 . 0 ) ( socialnetworking ) ( f a c e b o o k ) −−> ( h a s k e l l ) ( t u t o r i a l )

User B ( t w i t t e r ) ( b l o g ) −−> ( s o c i a l ) ( web2 . 0 ) ( t u t o r i a l ) ( t w i t t e r ) −−> ( web2 . 0 ) ( f a c e b o o k ) −−> ( t w i t t e r )

Fig. 4. Rules before conciliating ( tutorial ) ( tutorial ) ( twitter ) ( facebook )

( r o b o t i c s ) −−> ( programming ) ( a i ) ( h a s k e l l ) ( b l o g ) ( programming ) ( h a s k e l l ) ( b l o g ) −−> ( a i ) −−> ( s o c i a l ) ( web2 . 0 ) ( b l o g ) ( s o c i a l n e t w o r k i n g ) −−> ( s o c i a l ) ( t w i t t e r ) ( t u t o r i a l ) ( h a s k e l l ) ( web2 . 0 ) ( b l o g ) ( s o c i a l n e t w o r k i n g )

Fig. 5. Some rules produced by reconciliation proccess

Language Bookmarks Implications

Conciliation 19 245 21

Fig. 6. Size of conciliated knowledge

the conciliated knowledge could be more interesting. The size of the data refered to users’ accounts (see ﬁg 3, left) with attributes (language) and objects (bookmarks). According to the multiagent protocol, it has to set the common language and reduce the context, leaving the common attributes and removing the objects with no tags in the common language. Results are in ﬁg. 3, right (step 4). In ﬁg. 4, part of the rule sets corresponding to both agents is depicted. Fig. 5 shows some of the rules after reconciliation. Finally, we obtain a common context with a small number of objects and a greatly reduced number of implications with a support greater than zero (last step) (in ﬁg. 6 is presented some information on this context).

4

Conclusions and Future Work

In this paper, a method to reconcile knowledge basis associated to tagging systems is presented. The method is based on FCA, and designed on a mutiagent system, where agents collaborate in order to establish a common knowledge represented both a new tagging and a concept lattice. It is based on dialogs, so it is

390

G.A. Aranda-Corral and J. Borrego-D´ıaz

interesting to compare them with some standard protocols, such as contract-net or similar, and in the near future to adopt one of them. Reconciliation knowledge method can be applied to any tagging-based system. Experiments on Delicious show that after a small number of taggings on the same item, a nascent consensus seems to form and this consensus is not aﬀected by the addition of new tags [5]. This stabilisation implies, for the conciliation method presented, that intentions of objects tend to be similar among users. Future work will be focused on extending the algorithm to ﬁnd consensus ontologies (with a crowd of users) and, if possible, in a semi-automatic way.

References 1. Alonso-Jim´enez, J.A., Aranda-Corral, G.A., Borrego-D´ıaz, J., Fern´ andez-Lebr´ on, M.M., Hidalgo-Doblado, M.J.: Extending Attribute Exploration by Means of Boolean Derivatives. In: Proc. 6th Int. Conf. on Concept Lattices and Their Applications. CEUR Workshops Proc., p. 433 (2008) 2. Aranda-Corral, G.A., Borrego-D´ıaz, J., G´ omez-Mar´ın, F.: Toward Semantic Mobile Web 2.0 through Multiagent Systems. In: H˚ akansson, A., Nguyen, N.T., Hartung, R.L., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2009. LNCS, vol. 5559, pp. 400–409. Springer, Heidelberg (2009) 3. Yeung, C.M.A., Gibbins, N., Shadbolt, N.: Contextualising Tags in Collaborative Tagging Systems. In: Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (2009) 4. Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer, Heidelberg (1999) 5. Golder, S., Huberman, B.A.: The structure of collaborative tagging systems. Journal of Information Science 32(2), 98–208 (2006) 6. Gruber, T.: Ontology of Folksonomy: A Mash-up of Apples and Oranges. Int’l. Journal on Semantic Web & Information Systems 3(2) (2007) 7. J¨ aschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Discovering shared conceptualizations in folksonomies. Journal of Web Semantics 6(1), 38–53 (2008) 8. Kim, H.-L., Scerri, S., Breslin, J., Decker, S., Kim, H.-G.: The state of the art in tag ontologies: A semantic model for tagging and folksonomies. In: International Conference on Dublin Core and Metadata Applications, Berlin, Germany (2008) 9. Knerr, T.: Tagging ontology- towards a common ontology for folksonomies (2006), http://tagont.googlecode.com/files/TagOntPaper.pdf (June 14, 2008) 10. Smith, G.: 2007 Tagging: People-Powered Metadata for the Social Web. First. New Riders Publishing, Indianapolis (2007) 11. Tanaka, J.W., Taylor, M.: Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive Psychology 23(3), 457–482 (1991) 12. Van Damme, C., Hepp, M., Siorpaes, K.: FolksOntology: An Integrated Approach for Turning Folksonomies into Ontologies. In: ESWC 2007 workshop Bridging the Gap between Semantic Web and Web 2.0, May 2007, pp. 57–70 (2007) 13. Weick, K.E., Sutcliﬀe, K.M., Obstfeld, D.: Organizing and the Process of Sensemaking. Organization Science 16(4), 409–421 (2005)

2-D Shape Representation and Recognition by Lattice Computing Techniques V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis Technological Educational Institution of Kavala Department of Industrial Informatics 65404 Kavala, Greece {vgkabs,aamanat,spap}@teikav.edu.gr

Abstract. We consider binary images such that an image includes a single 2-D shape, from which we extract three populations of three diﬀerent (shape) descriptors, respectively. Each population is represented by an Intervals’ Number, or IN for short, in the mathematical lattice (F, ) of INs. In conclusion, a 2-D shape is represented in the Cartesian product lattice (F3 , ). We present a 2-D shape classiﬁcation scheme based on fuzzy lattice reasoning (FLR). Preliminary experimental results have been encouraging. We discuss the potential of Lattice Computing (LC) techniques in image representation and recognition applications. Keywords: 2-D shape classiﬁcation, Fuzzy lattice reasoning (FLR), Inclusion measure, Intervals’ number (IN), Lattice computing.

1

Introduction

In a recent work [1], we have evaluated three diﬀerent 2-D shape descriptors, namely Fourier descriptors (F D), angular radial transform (ART ) descriptors, and image moments (IM ) descriptors, towards 2-D shape retrieval as follows. A 2-D shape was represented by a Nd -dimensional vector per shape descriptor d ∈ {F D, ART, IM }. Then, 2-D shape retrieval was pursued in the Euclidean space RNd by k- Nearest Neighbor (kN N ), for k = 1. In conclusion, the aforementioned shape descriptors were evaluated, comparatively. Building on [1], this work deals with a diﬀerent problem, namely 2-D shape recognition/classiﬁcation, using novel techniques as described next. A population of shape descriptors, instead by a Nd -dimensional vector, is represented here by an Intervals’ Number (INs) in the complete lattice (F, ) of INs [4,6,7]. In conclusion, a 2-D shape is represented here by three INs, respectively, for the three shape descriptors d ∈ {F D, ART, IM }. Finally, we apply a Fuzzy Lattice Reasoning (FLR) scheme for classiﬁcation in lattice (F3 , ). Preliminary experimental results have been encouraging. The potential of Lattice Computing (LC) in image representation/recognition is also discussed. We remark that the term Lattice Computing (LC) was originally coined by Manuel Gra˜ na [2,3] to denote a Computational Intelligence branch, which develops algorithms in an algebra (R, ∨, ∧, +), where R is the set of real numbers. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 391–398, 2010. c Springer-Verlag Berlin Heidelberg 2010

392

V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis

Later work proposed a wider deﬁnition as follows: LC is an evolving collection of tools and methodologies that can process disparate types of data including logic values, numbers, sets, symbols, and graphs based on mathematical lattice theory with emphasis on clustering, classiﬁcation, regression, pattern analysis, and knowledge representation applications [5]. The work here is organized as follows. Section 2 presents the mathematical background. Section 3 presents a FLR scheme for classiﬁcation. Section 4 formulates the 2-D shape recognition problem. Section 5 describes preliminary experimental results. Section 6 concludes by summarizing the contribution.

2

Mathematical Background

For basic deﬁnitions regarding lattice theory the interested reader may refer elsewhere [4,5]. This section summarizes useful mathematical tools. Note that “curly” symbols such as , , , etc. are employed below between general lattice elements, whereas “straight” symbols such as ≤, ∨, ∧, etc. are employed between real numbers. Consider the following deﬁnition. Definition 1. Let (L, ) be a complete lattice with least and greatest elements O and I, respectively. An inclusion measure in (L, ) is a function σ : L×L → [0, 1], which satisfies the following conditions C0. C1. C2. C3.

σ(x, O) = 0, ∀x = O. σ(x, x) = 1, ∀x ∈ L. x y ≺ x ⇒ σ(x, y) < 1. u w ⇒ σ(x, u) ≤ σ(x, w).

We remark that σ(x, y) can be interpreted as the fuzzy degree to which x is less than y; therefore notation σ(x y) may be used instead of σ(x, y). Two diﬀerent inclusion measure functions are presented next, based on a positive valuation1 function. Theorem 1. If function v : L → R is a positive valuation in a complete latand sigma-join tice (L, ) then both functions sigma-meet σ (x, y) = v(xy) v(x) σ (x, y) =

v(y) v(xy)

are inclusion measures.

Since condition C0 in Deﬁnition 1 requires σ(x, O) = 0, ∀x = O, our interest here is in positive valuation functions v : L → R≥0 such that v(O) = 0 as explained by Kaburlasos and Papadakis [6] in Theorem A.10. A four-level hierarchy of complete lattices is presented progressively, next. 2.1

Hierarchy Level-0: The Lattice (R, ≤) of Real Numbers

The set R of real numbers ordered by the conventional order (≤) relation is a complete, totally-ordered lattice (R, ≤) with least and greatest elements denoted, respectively, by O = −∞ and I = +∞. 1

Positive valuation in a lattice (L, ) is a real function v : L × L → R that satisﬁes both v(x) + v(y) = v(x y) + v(x y) and x ≺ y ⇒ v(x) < v(y).

2-D Shape Representation and Recognition by LC Techniques

2.2

393

Hierarchy Level-1: The Lattice (Δ, ) of Generalized Intervals

A generalized interval is deﬁned next. Definition 2. Generalized interval is an element of lattice (R, ≤∂ ) × (R, ≤). We remark that ≤∂ in Deﬁnition 2 denotes the dual (i.e. converse) of order relation ≤ in lattice (R,≤), i.e. ≤∂ ≡≥. The complete product lattice (R,≤∂ )×(R,≤) ≡ (R × R,≥ × ≤) will be denoted, simply, by (Δ, ). A generalized interval will be denoted by [x, y], where x, y ∈ R. The meet () and join () in lattice (Δ,) are given, respectively, by [a, b][c, d] = [a∨c, b∧d] and [a, b] [c, d] = [a ∧ c, b ∨ d]. The set of positive (negative) generalized intervals [a, b], characterized by a ≤ b (a > b), is denoted by Δ+ (Δ− ). It turns out that (Δ+ ,) is a poset, namely poset of positive generalized intervals. Furthermore, poset (Δ+ ,) is isomorphic2 to the poset (τ (R),) of conventional intervals (sets) in R, i.e. (τ (R),) ∼ = (Δ+ ,). We augmented poset (τ (R),) by a least (empty) interval, denoted by O = [+∞, −∞]. Hence, the complete lattice (τO (R) = τ (R)∪{O},)∼ = (Δ+ ∪ {O}, ) emerged. A strictly decreasing bijective, i.e. one-to-one, function θ : R → R implies isomorphism (R,≤) ∼ = (R,≥). Furthermore, a strictly increasing function v : R → R is a positive valuation in lattice (R,≤). It follows that function vΔ : Δ → R given by vΔ ([a, b]) = v(θ(a)) + v(b) is a positive valuation in lattice (Δ,≤). In general, parametric functions θ(.) and v(.) may introduce tunable nonlinearities. Two diﬀerent inclusion measures, namely sigma-meet and sigma-join, have been proposed in lattice (τO (R), ) as follows: , if a ∨ c ≤ b ∧ d; otherwise, σ ([a, b] 1) σ ([a, b] [c, d]) = v(θ(a∨c))+v(b∧d) v(θ(a))+v(b) [c, d]) = 0, and v(θ(c))+v(d) 2) σ ([a, b] [c, d]) = v(θ(a∧c))+v(b∨d) . 2.3

Hierarchy Level-2: The Lattice (F,) of Intervals’ Numbers

A generalized interval number is deﬁned in the ﬁrst place, next. Definition 3. Generalized interval number (GIN) is a function G : (0, 1] → Δ. Let G denote the set of GINs. It follows complete lattice (G, ), as the Cartesian product of complete lattices (Δ, ). Our interest here focuses on the sublattice3 of intervals’ numbers deﬁned next. Definition 4. An Intervals’ Number, or IN for short, is a GIN F such that both F (h) ∈ (Δ+ ∪ {O}) and h1 ≤ h2 ⇒ F (h1 ) F (h2 ). 2

3

A map ψ : (P, ) → (Q, ) is called (order) isomorphism iﬀ both “x y ⇔ ψ(x) ψ(y)” and “ψ is onto Q”. Two posets (P, ) and (Q, ) are called isomorphic, symbolically (P, ) ∼ = (Q, ), iﬀ there is an isomorphism between them. A sublattice of a lattice (L, ) is another lattice (S, ) such that S ⊆ L.

394

V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis

Let F denote the set of INs. It follows complete lattice (F, ) with least element O = O(h) = [+∞, −∞] and greatest element I = I(h) = [−∞, +∞], h ∈ (0, 1]. A IN will be denoted by a capital letter in italics, e.g. F ∈ F. Given the two inclusion measures σ (., .) and σ (., .) in (Δ, ), the following two inclusion measures emerge, respectively, in (F,): 1 1) σ (F1 F2 ) = σ (F1 (h) F2 (h))dh. 0

2) σ (F1 F2 ) =

1

σ (F1 (h) F2 (h))dh.

0

2.4

Hierarchy Level-3: The Cartesian Product Lattice (FN , )

The Cartesian product lattice (FN , ) is the “fourth level” in our proposed hierarchy of complete lattices. An element of the complete lattice (FN , ) will be denoted by a capital letter in bold, e.g. F = (F1 , ..., FN ) ∈ FN . 2.5

Additional Definitions

The size of a IN is deﬁned as follows. Definition 5. The size of a IN F = F (h) = [ah , bh ], h ∈ (0, 1], with respect to a positive valuation function v : R → R, is defined as a nonnegative function 1 S : F → R≥0 given by S(F ) = [v(bh ) − v(ah )]dh. 0

We remark that the size of interval-IN δ = [A, B] equals S(δ) = S(A) − S(B).

3

A Fuzzy Lattice Reasoning (FLR) Classifier

Algorithm 1 (BIINtrn ) induces L interval-INs from a set {F1 , . . . , Fntrn } of (labelled) INs for training. Whereas, Algorithm 2 (BIINtst ) assigns classes to a set {E1 , . . . , Entst } of INs for testing. We remark that, on the one hand, algorithm BIINtrn is an agglomerative clustering scheme, which proceeds by conditionally merging “nearby” INs until a maximum user-deﬁned threshold size Sθ = 0.5 is met. On the other hand, algorithm BIINtst assigns a IN Ei to the class of an interval-IN where Ei in included most, in an inclusion measure function (σ) sense. Note that an employment of an inclusion measure function is called Fuzzy Lattice Reasoning (FLR).

4

The Problem and Its Mathematical Formulation

We considered the MPEG-7 benchmark data set of binary images including 2-D shapes [1]. In particular, we used the 1,400 image data set divided in 70 classes with 20 images per class. Sample images are shown in Fig.1. In a data preprocessing step, from each image, we extracted Fourier descriptors (F D), angular radial transform (ART ) descriptors, and image moments

2-D Shape Representation and Recognition by LC Techniques

395

Algorithm 1. BIINtrn : Batch Interval-IN algorithm for training 1: Let {F1 , . . . , Fntrn } be a set of labelled INs for training. Furthermore, let c(F ) denote the class label of IN F , where ∈ {1, . . . , ntrn }. 2: Consider the set {δ1 , . . . , δntrn } of labelled (trivial) interval-INs δ = [F , F ]; moreover, let c(δ ) denote the class label of interval-IN δ , where ∈ {1, . . . , ntrn }. 3: L ← ntrn . 4: Consider a user-deﬁned size threshold Sθ = 0.5. . 5: Let (I, J) = arg min{Size(δi δj )}, i, j ∈ {1, . . . , L}: I = J and c(δI ) = c(δJ ). 6: while Size(δI δJ ) < Sθ do {learn by merging interval-INs of the same class} 7: Replace both δI and δJ by δI δJ . 8: L ← L − 1. . 9: Let (I, J) = arg min{Size(δi δj )}, i, j ∈ {1, . . . , L}: I = J and c(δI ) = c(δJ ). 10: end while

Algorithm 2. BIINtst : Batch Interval-IN algorithm for testing 1: Consider a set {δ1 , . . . , δL } of labelled interval-INs, where both δ ∈ F3 × F3 and c(δ ) denotes the class label of interval-IN δ , ∈ {1, . . . , L}. 2: for i = 1 to ntst do {for each testing datum Ei ∈ F3 do} . 3: J = arg max{σ([Ei , Ei ] δ )}, ∈ {1, . . . , L}. 4: Assign IN Ei to class c(δJ ). 5: end for

(a)

(b)

Fig. 1. Samples of 2-D shapes regarding shape classes (a) “chicken”, and (b) “bird”

(IM ) descriptors [1]. A 2-D shape was represented by Nd descriptors, where d ∈ {F D, ART, IM }; in particular, NF D = 32, NART = 112, and NIM = 6. A population of Nd descriptors was represented by one IN induced from the aforementioned population by algorithm CALCIN [4]. For example, Fig.2 and Fig.3 show INs induced from F D and ART descriptors, respectively, for two different classes, namely “chicken” and “bird”. Fig.4 shows interval-INs computed by the lattice-meet and lattice-join operations of the INs they contain.

396

V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis

1.0

1.0

0.5

0.5

0.0 0.0

0.5

1.0

1.5

2.0

0.0 0.0

FD Magnitute

0.5

1.0

1.5

2.0

FD Magnitute

(a)

(b)

Fig. 2. Each IN, above, was induced from one population of Fourier Descriptors (FD) regarding shape classes (a) “chicken”, and (b) “bird”

1.0

1.0

0.5

0.5

0.0 0.0

0.5 ART

(a)

1.0

0.0 0.0

0.5 ART

1.0

(b)

Fig. 3. Each IN, above, was induced from one population of Angular Radial Transform (ART) descriptors regarding shape classes (a) “chicken”, and (b) “bird”

5

Preliminary Computational Experiments

We have developed reliable software, which implements our algorithms for both training and testing. In our computational experiments we employed sigmoid positive valuation functions v(x; λ, μ0 ) = 1/(1 + e−λ(x−μ0) ) with tunable parameters λ and μ0 ; furthermore, we used function θ(x) = −x.

2-D Shape Representation and Recognition by LC Techniques

1.0

1.0

0.5

0.0

397

0.5

0.0

0.5 ART

(a)

1.0

0.0

0.0

0.5 ART

1.0

(b)

Fig. 4. Two interval-INs, drawn in thick lines, computed from ART-descriptor-induced INs regarding, respectively, shape classes (a) “chicken”, and (b) “bird”. Note that an interval-IN (drawn in thick lines) envelops a cluster of INs drawn in thin lines.

Preliminary experimental results have been encouraging. For instance, in certain experiments, we have recorded recognition rates well over 90%. Comparative experimental work is currently under way.

6

Conclusion

This work has presented preliminary experimental evidence that distributions of measurements, represented by INs, can be used for eﬀective 2-D shape representation towards recognition. Further research on IN-based LC techniques in image representation and recognition applications is a topic for future work. Acknowledgement. This work has been supported, in part, by a project Archimedes-III contract.

References 1. Amanatiadis, A., Kaburlasos, V.G., Gasteratos, A., Papadakis, S.E.: A comparative study of invariant descriptors for shape retrieval. In: Proc. 2009 IEEE Intl. Conf. on Imaging Systems & Techniques (IST 2009), pp. 391–394 (2009) 2. Gra˜ na, M.: State of the art in lattice computing for artiﬁcial intelligence applications. In: Nadarajan, R., Anitha, R., Porkodi, C. (eds.) Mathematical and Computational Models, pp. 233–242 (2007)

398

V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis

3. Gra˜ na, M.: Lattice computing: lattice-theory-based computational intelligence. In: Matsuhisa, T., Koibuchi, H. (eds.) Proc. Kosen Workshop on Mathematics, Technology, and Education (MTE), pp. 19–27 (2008) 4. Kaburlasos, V.G.: Towards a Uniﬁed Modeling and Knowledge-Representation Based on Lattice Theory. SCI, vol. 27. Springer, Heidelberg (2006) 5. Kaburlasos, V.G.: Granular fuzzy inference system (FIS) design by lattice computing. In: Corchado Rodriguez, E.S., et al. (eds.) HAIS 2010, Part II. LNCS (LNAI), vol. 6077, pp. 410–417. Springer, Heidelberg (2010) 6. Kaburlasos, V.G., Papadakis, S.E.: A granular extension of the fuzzy-ARTMAP (FAM) neural classiﬁer based on fuzzy lattice reasoning (FLR). Neurocomputing 72(10-12), 2067–2078 (2009) 7. Papadakis, S.E., Kaburlasos, V.G.: Induction of classiﬁcation rules from histograms. In: Proc. 8th Intl. Conf. on Natural Computing, Joint Conf. on Information Sciences (JCIS 2007), pp. 1646–1652 (2007)

Order Metrics for Semantic Knowledge Systems Cliﬀ Joslyn1 and Emilie Hogan2 1

National Security Directorate, Paciﬁc Northwest National Laboratory, Seattle, Washington, 98109, USA [email protected] 2 Mathematics Department, Rutgers University

Abstract. Knowledge systems technologies, as derived from AI methods and used in the modern Semantic Web movement, are dominated by graphical knowledge structures such as ontologies and semantic graph databases. A critical but typically overlooked aspect of all of these structures is their admission to analyses in terms of formal hierarchical relations. The partial order representations of whatever hierarchy is present within a knowledge structure aﬀord opportunities to exploit these hierarchical constraints to facilitate a variety of tasks, including ontology analysis and alignment, visual layout, and anomaly detection. We introduce the basic concepts of order metrics and address the impact of a hierarchical (order-theoretical) analysis on knowledge systems tasks.

1

Introduction

Knowledge systems technologies are dominated by graphical structures. Semantic graph databases [15] take the form of labeled directed graphs implemented in RDF [9]. Their OWL [10] ontological typing systems are also labeled directed graphs, frequently dominated by directed acyclic graph (DAG) and other hierarchical structures. Fig. 1 shows a toy example, where the ontology of classes on the left forms the typing system for the semantic graph of node and link instances on the right. But where semantic taxonomies such as the Gene Ontology [2] include hierarchical class structures, other portions can be non-hierarchical. And more general knowledge structures like semantic graphs are not explicitly or necessarily hierarchical, but may contain large hierarchical components. In practice, ontologies are dominated by their “hierarchical cores”, speciﬁcally their class hierarchies connected by is-a subsumptive and has-part compositional links. And many of the most common links in RDF graphs are transitive, including causes, implies, and precedes. The partial order representation of whatever hierarchy is present within a knowledge structure aﬀords opportunities to exploit these hierarchical constraints for a variety of tasks, including: Clustering and Classification: Characterizing a portion of a hierarchy (e.g. groups of ontology nodes) to identify common characteristics [12,18], Alignment: Casting ontology matching [7] as mappings between hierarchical structures [11,14]. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 399–409, 2010. c Springer-Verlag Berlin Heidelberg 2010

400

C. Joslyn and E. Hogan ONTOLOGY = TYPE GRAPH Thing

Located

Is-a Weapon

FACT BASE = INSTANCE SEMANTIC GRAPH

Place

Has-part

Al Qaeda : Group

Posseses Group Country

Bio

Explosives

Has-part Terrorist Organization

Anthrax

Smallpox

USA

State

Has-part

Ohio

Possesses Strain B: Anthrax

Located Ohio: State

County

Strain A

Strain B

Al Qaeda

Fig. 1. Toy model of a semantic graph database. (Left) Ontological typing system as a labeled, directed graph of classes (sample instances shown below dashed links). (Right) Conforming instance sub-graph.

Visualization: Including exploiting the level structure of hierarchies to achieve a satisfactory layout [13]. In general, such a hierarchical analysis, when available, promises complexity reduction, improved user interaction with the knowledge base, and improved layout and visual analytics.

2

DAGs and Partial Orders

Hierarchies are represented as partially ordered sets (posets), which are reﬂexive, anti-symmetric, and transitive binary relations P = P, ≤ on an underlying ﬁnite set of nodes P [6]. While we typically think of hierarchies as tree structures, more general kinds of hierarchies have “multiple inheritance”, where nodes can have more than one parent. These include lattice structures, where pairs of nodes have unique least common subsumers (and unique greatest lower bounds as well); partial orders where pairs of nodes can have an indeﬁnite number of least common subsumers and greatest lower bounds; and ﬁnally general DAGs can also include “transitive links” which form shortcuts across paths. Consider simple DAG in the top of Fig. 2. The two transitive links 1 → H, 1 → E connect the two paths 1 → K → H and 1 → C → I → E respectively. Given a DAG D, the DAG P(D) produced by including all possible transitive links consistent with its paths is its transitive closure, and determines an ordered set P(D) = P, ≤ where a ≤ b ⊆ P if there is a directed path from a to b in D. The graph V(D) produced from a DAG D by removing all its transitive links (its transitive reduction [1]) determines a cover relation or Hasse diagram. Thus each cover relation V determines a unique poset P(V), and vice versa a poset P determines a unique cover V(P); each DAG D determines a unique poset P(D) and cover V(D); and each unique poset-cover pair determines a class of DAGs equivalent by transitive links.

Order Metrics for Semantic Knowledge Systems

401

Fig. 2. (Top) A DAG D. (Left) Transitive reduction V(D). (Right) Transitive closure P(D).

For a DAG D we can measure its degree of transitivity as T R(D) : =

|D \ V(D)| , |P(D) \ V(D)|

(1)

where \ is set subtraction, we interpret each structure as the binary relation on P 2 of its incidence matrix, and | · | is cardinality, so that | · | is the number of links in ·, seen as a graph. T R(D) measures the number |D \ V(D)| of transitive links in D relative to the total possible number |P(D) \ V(D)| in its transitive 2 , indicating a relatively low degree closure P(D). In Fig. 2 we have T R(D) = 11 of transitivity. In knowledge systems such as ontologies, our interpretation of the presence or absence of transitive links in DAGs is signiﬁcant. If the link-type in question is anti-transitive, so that transitive links are disallowed, then clearly the presence of transitive links is in error. If, on the other hand, the link-type in question is atransitive, so that transitive links are allowed, but not required, then the T R(D) measures this extent. But ﬁnally, if, as is the case with our subsumption and composition types, the link type represents a fully transitive property, then the presence of transitive links are irrelevant or erroneous. Eﬀectively, such link types live in the trasitively equivalent class of DAGs, that is, in the partial order P(D), and T R(D) can be used as an aid to the user or engineer to identify issues with the underlying ontology.

3

Measures on Hierarchical Graphs

Given a hierarchical DAG structure represented by its transitive closure poset P, tools are available to measure this hierarchical structure. Here we discusses

402

C. Joslyn and E. Hogan

interval-valued rank measuring the vertical level of nodes, and order metrics measuring the distances between nodes. See [13] for more details. Consider the hierarchy shown in Fig. 3. We are concerned with the proper representation of the vertical level of each node, as represented by its positioning in a layout. We note that all children of the root have the same “distance” from the root, but if these are also leaves then they should be positioned further down. In other words, we need to exploit the vertical distance from both the top and a global bottom, including a virtual node 0 ∈ P inserted below all the leaves. 1

L

B

G

M

K X A I

Y

H J

E

O D Q

Fig. 3. A DAG displayed as a hierarchy

For a, b ∈ P , let h∗ (a, b) be the length of the maximum path from a to b. Then the distance of a node a ∈ P from the root 1 ∈ P is the top rank rt (a) : = h∗ (a, 1). Dually we deﬁne the bottom rank rb (a) : = h∗ (0, 1) − h∗ (0, b), where ¯ h∗ (0, 1) is the overall height of the structure. Then the interval rank R(a) : = t b [r (a), r (a)] becomes available as an interval-valued measure of the vertical levels over which a can range, while the rank width W (a) : = rb (a)− rt (a) is a measure of that range [13]. We can exploit this vertical rank for hierarchical layout and visualization, as shown for our example in Fig. 4. Each node which sits on a complete chain (a path from 1 down to 0) of maximal size is placed horizontally at the center of the page. Nodes are laid out horizontally according to the size of their largest maximal chains. The result is to place maximal complete chains along a central axis, and short complete chains towards the outer edges. Nodes are placed vertically according to the mathematical quantity of the midpoint of their interval rank, but can be free to move between top rank rt (a) and bottom rank rb (a). The result is that while nodes on maximal complete chains (all those intersecting the chain 0 → D → E → I → X → 1 in the example) exist at a single level, some (for example K) do not. While Fig. 4 shows a 2D layout, we have also deployed this concept in a 3D layout [13].

4

Order Metrics

Given the need to perform operations like clustering or alignment on ontologies represented as ordered sets P = P, ≤, it is essential to have a general sense of

Order Metrics for Semantic Knowledge Systems

403

Max chain Length 5 = Height Top rank = 2 Min length from bottom: 2 Max length from bottom: 3 Bottom rank = 5 - 3 = 2 Rank = [2,2]

Other chain Length 4

1

Min chain Length 2

X L

B

Lower Top Rank Lower Bottom Rank More Children Fewer Parents

Y I K

J M

G

A

Top rank = 1 Bottom rank = 5 - 1 = 4 Rank = [1,4]

E H O

Q

D Top rank = 2 Bottom rank = 5 - 1 = 4 Rank = [2,4]

Shorter chains

Higher Top Rank Higher Bottom Rank Fewer Children More Parents

0 Virtual bottom

Longer chains

Shorter chains

Fig. 4. Chain layout of the cyclic decomposition of the network in Fig. 3

distance d(a, b) between two nodes a, b ∈ P . The knowledge systems literature has focused on semantic similarities to perform a similar function, which are available when P is equipped with a probability distribution, derived, for example, from the frequency with which terms appear in documents (for the Wordnet [8] thesaurus), or genes are annotated to GO nodes. p : P → [0, 1], So assume a poset P, ≤ with a base probability distribution p(a) = 1, and a “cumulative” function β(a) : = p(a). We then gena∈P b≤a eralize the join (least upper bound) and meet (greatest lower bound) operations in lattices as follows. Let ↑ a : = {b ≥ a} and ↓ a : = {b ≤ a} are the up-set (ﬁlter) and down-set (ideal) respectively of a node a ∈ P . Then for two nodes a, b ∈ P , let a∇b : = ↑ a ∩ ↑ b and aΔb : = ↓ a ∩ ↓ b be the set of nodes above or below respectively both of them. Then the generalized join a ∨ b is the set of minimal (lowest) nodes of a∇b, and the generalized meet a ∧ b is the set of maximal (highest) nodes of aΔb. When P is a lattice, then |a ∨ b| = |a ∧ b| = 1, recovering traditional join and meet. Common choices for the semantic similarity S(a, b) between two nodes include the measures of Resnik, Lin, and Jiang and Conrath [5]: S(a, b) = max [− log2 (β(c))] c∈a∨b

S(a, b) =

2 maxc∈a∨b [log2 (β(c))] log2 (β(a)) + log2 (β(b))

(2)

(3)

404

C. Joslyn and E. Hogan

S(a, b) = 2 max [log2 (β(c))] − log2 (β(a)) − log2 (β(b)) c∈a∨b

(4)

respectively. But most of these are not metrics (not satisfying the triangle inequality), and all of these lack a general mathematical grounding and require a probabilistic weighting. We use ordered set metrics [16,17], which are preferable to semantic similarities, because while they can use, they do not require, a quantitative weighting such as β; and because they always yield a metric. They are based on valuation functions v : P → IR+ which are, ﬁrst, either isotone (a ≤ b → v(a) ≤ v(b)) or antitone (a ≤ b → v(a) ≥ v(b)); and then semimodular, in that v(a) + v(b) v ∇ (a, b) + vΔ (a, b),

(5)

where ∈ {≤, ≥, =}, yielding super-modular, sub-modular, and modular valuations respectively; and v ∇ (a, b) : = min v(c), c∈a∇b

vΔ (a, b) : = max v(c). c∈aΔb

(6)

Whether a valuation v is antitone or isotone, and then sub- or super-modular, determines which of four distance functions is generated, e.g. the antitone, supermodular case yields d(a, b) = v(a) + v(b) − 2v ∇ (a, b). When P is a lattice, then this simpliﬁes to d(a, b) = v(a) + v(b) − 2v(a ∨ b). See [17] for full details and proofs. Typical valuations v include the cardinality of up-sets and down-sets: v(a) = | ↑ a|, v(a) = | ↓ a|, and the cumulative probabilities used in semantic similarities v(a) = β(a). In this way, poset metrics generalize semantic similarities and provide a strong basis for various analytical tasks.

5

Order Metrics in Ontology Alignment

A good example of the utility of this order theoretical technology in knowledge systems tasks is in ontology alignment [7,11]. An ontology alignment is a mapping f : P → P taking anchors a ∈ P in one semantic hierarchy P = P, ≤ into anchors a ∈ P in another P = P , ≤ . In seeking a measure of the structural properties of the mapping f , our primary criterion is that f should not distort the metric relations of concepts, taking nodes that are close together and making them farther apart, or vice versa. It should be noted that a “smooth” mapping f is neither necessary nor sufﬁcient to be a good alignment: on the one hand, a good structural mapping may be available between structures from diﬀerent domains; and on the other, diﬀerences in semantic intent between the two structures may be irreconcilable. Nonetheless, other things being equal, it is preferable to have a more smooth mapping than not.

Order Metrics for Semantic Knowledge Systems

405

So, for two ontology nodes a, b ∈ P, consider the lower cardinality distance dl (a, b) : = | ↓ a| + | ↓ b| − 2 max | ↓ c|. We can measure the change in distance c∈a∧b

between a, b ∈ P induced by f as the distance discrepancy δ(a, b) : = |d¯l (a, b) − d¯l (f (a), f (b))|,

(7)

dl (a,b) ∈ [0, 1] is the normalized lower distance between a diamd (P) and b in P given the diameter diamd (P) : = max d(a, b). We can measure the

where d¯l (a, b) : =

a,b∈P

entire amount of distance discrepancy at a node a ∈ P compared to all the other anchors b ∈ P by summing δf (a) : = δ(a, b) = |d¯l (a, b) − d¯l (f (a), f (b))|, (8) b∈P

b∈P

yielding the discrepancy δ(f ) : = a∈P δf (a) of the alignment. Consider the example in Fig. 5, with the partial alignment function f as shown, mapping only certain nodes {B, E, G} from P to P . Then we have e.g. the lower normalized distance between nodes E and G as d¯l (E, G) = 1/3; the distance discrepancy between the two nodes E, G in virtue of f as δ(E, G) = |1/3−3/5| = .267; the entire distance discrepancy at the node E as δf (E) = 2/5; and ﬁnally the distance discrepancy for the entire alignment as δ(f ) = .47. F

1

1

f1

f2 B

C

D

f3

I

J

K

L

f4 E

G 0

0

Fig. 5. An example alignment

6

Order Metrics for Ontology Clustering

Consider the following question. Assume a (portion of a) taxonomy is represented as a ﬁnite, non-empty poset P = P, ≤, and then we’re given a collection of nodes Q ⊆ P . How “big an area” does Q “implicate” or “delineate” or “occupy” in the hierarchy? We are pursuing this question now in the context of determining the quality of ontology query returns: the “tighter” a set of ontology nodes returned from an ontology query, the stronger the quality of that set. For two nodes a, b ∈ P , they are comparable a ∼ b if a ≤ b or a ≥ b. If a ≤ b ∈ P then deﬁne the order interval [a, b] : = {c ∈ P : a ≤ c ≤ b}. Note

406

C. Joslyn and E. Hogan

that [a, b] = ↑ a ∩ ↓ b. Now consider two typical order metrics, the upper and lower cardinality metrics: du (a, b) : = | ↑ a| + | ↑ b| − 2| ↑ a ∩ ↑ b|,

dl (a, b) : = | ↓ a| + | ↓ b| − 2| ↓ a ∩ ↓ b|, (9)

for a, b ∈ P . From the triangle inequality of d, we know that ∀a, b, c ∈ P, d(a, b) ≤ d(a, c) + d(c, b). So following [3,4], for two points a, b ∈ P and metric d, we can deﬁne the segment [[a, b]]d as the set of all nodes which are “between” them in the metric sense: [[a, b]]d : = {c ∈ P : d(a, b) = d(a, c) + d(c, b)}.

(10)

We know that ∀a, b ∈ P, [[a, b]] = ∅, since a, b ∈ [[a, b]]; and when nodes are comparable, segments collapse to order intervals: a ∼ b → [[a, b]] = [a, b]. Consider the three-cube in Fig. 6, with du shown in Table 1, we have du (B, G) = 4, and [[B, G]] = {a ∈ P : du (a, B) + du (a, G) = 4} = {A, B, C, D, G}.

A B

C

D

E

F

G

H Fig. 6. The Boolean 3-cube Table 1. Upper distance matrix du in the 3 cube du (a, b) A B C D E F G H

A 0 1 1 1 3 3 3 7

B 1 0 2 2 2 2 4 6

C 1 2 0 2 2 4 2 6

D 1 2 2 0 4 2 2 6

E 3 2 2 4 0 4 4 4

F 3 2 4 2 4 0 4 4

G 3 4 2 2 4 4 0 4

H 7 6 6 6 4 4 4 0

Convexity is the idea that any nodes between other nodes in a collection Q ⊆ P are also in that collection, so that a subset of nodes Q ⊆ P is convex if ∀a, b ∈ Q, [[a, b]] ⊆ Q. We can then deﬁne C(Q), the convex hull of Q, by the following iterative algorithm using the function K(Q) = a,b∈Q [[a, b]]d .

Order Metrics for Semantic Knowledge Systems

407

Let ^ Q := Q While ^ Q is not convex{ ^ Q := K(^ Q) } RETURN ^ Q The convex hull C(Q) is clearly convex, and includes the original set: Q ⊆ C(Q). Consider again a poset P = P, ≤ with metric d, and a subset of nodes Q ⊆ P . Then we can deﬁne the exterior points as those outside the convex hull: E(Q) : = P \ C(Q); and the interior points as those inside the convex hull, but not in the original collection: I(Q) : = C(Q) \ Q. For a subset of nodes Q ⊆ P we have its size S(Q) : = |Q|,

S(Q) ¯ S(Q) : = S(P )

(11)

in both un-normalized and normalized forms, and similarly the dispersion D(Q) ¯ D(Q) : = d(a, b), D(Q) : = . (12) D(P ) a,b∈C(Q)

Continuing our example from Fig. 6, still using du , consider the set Q={B, E, G}. Then we have C(Q) = [[B, E]] ∪ [[B, G]] ∪ [[E, G]] = {B, E} ∪ {A, B, C, D, G} ∪ {E, C, G} = {A, B, C, D, E, G}

(13)

This is shown in Fig. 7. We have exterior points E(Q) = {F, H} and interior points I(Q) = {A, C, D}. We also have D− (Q) = 10, D(Q) = 35, and D(P ) = ¯ D− (P ) = 91. So, note that while the normalized dispersion is D(Q) = 35/91 = ¯ 0.385%, the relative size is S(Q) = 3/8 = 0.375 ≤ 0.385 = D(Q).

Fig. 7. The 3-cube identifying C({B, E, G})

408

C. Joslyn and E. Hogan

Now consider Q = {E, G} ⊆ Q, then we have: C(Q ) = [[E, G]] = {E, C, G}, I(Q ) = {C},

E(Q ) = {A, B, D, F, H},

V (Q ) = C(Q) = {E, C, G}.

(14)

(15)

Note that this last result V (Q) = C(Q) holds whenever |Q| = 2. Finally we have S(Q ) = 2, D(Q ) = D− (Q ) = 3,

¯ ) = 0.25, S(Q ¯ ) = D ¯ − (Q ) = 3/91 = 0.033. D(Q

(16) (17)

It is valuable to compare the above approach to a typical approach used in semantic analysis, which is to work not within the poset P as a directed graph, but rather the undirected, symmetrically-closed version of P wherein link directions and thus hierarchical structure are not recognized. Let G(P) : = P, R, where R ⊆ P 2 and ∀a, b ∈ P, a, b ∈ R ↔ a ∼ b. The metric dp (a, b) is then the minimum path length in G(P) of a and b. In our original example with Q = {B, E, G}, we have E ∼ B, so that [[B, E]]du = [[B, E]]dp = {B, E}. But [[E, G]]dp = {E, C, G, H}, and [[B, G]]dp = P , because B and G are inverses. Thus the convex hull is Cdp (Q) = P , and Q can be said to be of maximal size. This is clearly inadequate.

References 1. Aho, A.V., Garey, M.R., Ullman, J.D.: The Transitive Reduction of a Directed Graph. SIAM Journal of Computing 1(2), 131–137 (1972) 2. Ashburner, M., Ball, C.A., Blake, J.A., et al.: Gene Ontology: Tool For the Uniﬁcation of Biology. Nature Genetics 25(1), 25–29 (2000) 3. Bandelt, H.J.: Centroids and Medians of Finite Metric Spaces. J. Graph Theory 16(4), 305–317 (1992) 4. Bandelt, H.J., Chepoi, V.: Metric Graph Theory and Geometry: A Survey. In: Surveys on Discrete and Computational Geometry: Twenty Years Later, vol. 453, pp. 49–86. American Math. Soc., Providence (2008) 5. Butanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1), 13–47 (2006) 6. Davey, B.A., Priestly, H.A.: Introduction to Lattices and Order, 2nd edn., Cambridge UP, Cambridge UK (1990) 7. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Hiedelberg (2007) 8. Fellbaum, C. (ed.): Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998) 9. http://www.w3.org/RDF 10. http://www.w3.org/TR/owl-features 11. Joslyn, C., Baddeley, B., Blake, J., Bult, C., et al.: Automated AnnotationBased Bio-Ontology Alignment with Structural Validation. In: Smith, B. (ed.) Proc. Int. Conf. on Biomedical Ontology (ICBO 2009), pp. 75–78 (2009), doi:10.1038/npre.2009.3518.1

Order Metrics for Semantic Knowledge Systems

409

12. Joslyn, C., Mniszewski, S., Fulmer, A., Heaton, G.: The Gene Ontology Categorizer. Bioinformatics 20(s1), 169–177 (2004) 13. Joslyn, C., Mniszewski, S.M., Smith, S.A., Weber, P.M.: SpindleViz: A Three Dimensional, Order Theoretical Visualization Environment for the Gene Ontology. In: Joint BioLINK and 9th Bio-Ontologies Meeting, JBB 2006 (2006), http://www.bio-ontologies.org.uk/2006/download/ Joslyn2EtAlSpindleviz.pdf 14. Joslyn, C., Paulson, P., White, A.: Measuring the Structural Preservation of Semantic Hierarchy Alignments. In: Proc. 4th Int. Wshop. on Ontology Matching (OM 2009), CEUR, vol. 551 (2009), http://ceur-ws.org/Vol-551/om2009_Tpaper6.pdf 15. McBride, B.: Jena: A Semantic Web Toolkit. IEEE Internet Computing 6(6), 55–59 (2002) 16. Monjardet, B.: Metrics on Partially Ordered Sets - A Survey. Discrete Mathematics 35, 173–184 (1981) 17. Orum, C., Joslyn, C.A.: Valuations and Metrics on Partially Ordered Sets (2009) (submitted), http://arxiv.org/abs/0903.2679v1 18. Verspoor, K.M., Cohn, J.D., Mniszewski, S.M., Joslyn, C.A.: A Categorization Approach to Automated Ontological Function Annotation. Protein Science 15, 1544–1549 (2006)

Granular Fuzzy Inference System (FIS) Design by Lattice Computing Vassilis G. Kaburlasos Technological Educational Institution of Kavala Department of Industrial Informatics 65404 Kavala, Greece [email protected]

Abstract. Information granules are partially/lattice-ordered. Therefore, lattice computing (LC) is proposed for dealing with them. The granules here are Intervals’ Numbers (INs), which can represent real numbers, intervals, fuzzy numbers, probability distributions, and logic values. Based on two novel theoretical propositions introduced here, it is demonstrated how LC may enhance popular fuzzy inference system (FIS) design by the rigorous fusion of granular input data, the sensible employment of sparse rules, and the introduction of tunable nonlinearities. Keywords: Fuzzy inference system (FIS), Granular data, Inclusion measure, Intervals’ number (IN), Lattice computing.

1

Introduction

An information granule [11] can be thought of as a (local) cluster. It turns out that clusters are partially-ordered – For a formal deﬁnition of partial-order see below. Under certain conditions, a partially-ordered set is a lattice. Hence, mathematical lattice theory emerges naturally in granular computing. The term Lattice Computing (LC) was coined by Gra˜ na [3,4] to denote a Computational Intelligence branch, which develops algorithms in an algebra (R, ∨, ∧, +), where R is the set of real numbers. Later work [5,9] proposed the following, wider deﬁnition: Lattice computing (LC) is an evolving collection of tools and methodologies that can process disparate types of data including logic values, numbers, sets, symbols, and graphs based on mathematical lattice theory. Note that the former LC deﬁnition was motivated mainly by mathematical morphology for image processing [12], whereas the latter LC deﬁnition has a wider motivation including, in addition, formal concept analysis [1], general clustering/classiﬁcation/regression techniques [7], logic and reasoning [15], etc. A popular family of algorithms is Fuzzy Inference Systems (FISs) [6], whose inputs typically consist of vectors in the Euclidean space RN . Recent work described an approach to FIS design based on mathematical morphology [13]. This work proposes a rigorous extension of conventional FIS techniques towards computing with (information) granules, namely Intervals’ Number (INs). E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 410–417, 2010. c Springer-Verlag Berlin Heidelberg 2010

Granular Fuzzy Inference System (FIS) Design by Lattice Computing

411

More speciﬁcally, this work builds on an established mathematical result, namely “the resolution identity theorem”, which speciﬁes that a fuzzy set can (equivalently) be represented either by its membership function or by its αcuts. In conclusion, based on two novel mathematical propositions, an inclusion measure function emerges here as an instrument towards substantial FIS improvements including the rigorous fusion of granular input data, the sensible employment of sparse rules, and the introduction of tunable nonlinearities. The work here is organized as follows. Section 2 presents the mathematical background. Section 3 introduces novel mathematical tools. Section 4 demonstrates granular FIS. Section 5 concludes by summarizing the contribution.

2

Mathematical Preliminaries

This section summarizes a hierarchy of lattices [7,10] using an improved mathematical notation introduced recently [8,10]. 2.1

The Complete Lattice (Δ,) of Generalized Intervals

There is no unanimous opinion whether lattice (R, ≤) is complete or not [7,10]. Here, we assume that lattice (R, ≤) is complete with least and greatest elements O = −∞ and I = +∞, respectively. We deﬁne a generalized interval, next. Definition 1. Generalized interval is an element of lattice (R, ≤∂ ) × (R, ≤). We remark that ≤∂ in Deﬁnition 1 denotes the dual (i.e. converse) of order relation ≤, i.e. ≤∂ ≡≥. Product lattice (R,≤∂ )×(R,≤) ≡ (R × R,≥ × ≤) will be denoted, simply, by (Δ,). Note that curly symbols , , are used for general lattice elements, whereas straight symbols ≤, ∨, ∧ are used for real numbers. A generalized interval will be denoted by [x, y], where x, y ∈ R. The meet () and join () in lattice (Δ,) are given, respectively, by [a, b][c, d] = [a∨c, b∧d] and [a, b] [c, d] = [a ∧ c, b ∨ d]. The set of positive (negative) generalized intervals [a, b], characterized by a ≤ b (a > b), is denoted by Δ+ (Δ− ). It turns out that (Δ+ ,) is a poset, namely poset of positive generalized intervals. Poset (Δ+ ,) is isomorphic1 to the poset (τ (R),) of intervals (sets) in R, i.e. (τ (R),) ∼ = (Δ+ ,). We augmented poset (τ (R),) by a least (empty) interval, denoted by O = [+∞, −∞]. Hence, the complete lattice (τO (R) = τ (R) ∪ {O},)∼ = (Δ+ ∪ {O}, ) emerged. A strictly decreasing bijective, i.e. one-to-one, function θ : R → R implies isomorphism (R,≤) ∼ = (R,≥). Furthermore, a strictly increasing function v : R → R is a positive valuation2 in lattice (R,≤). It follows that function vΔ : Δ → R given by vΔ ([a, b]) = v(θ(a)) + v(b) is a positive valuation in lattice (Δ,≤). Parametric functions θ(.) and v(.) may introduce tunable nonlinearities. 1

2

A map ψ : (P, ) → (Q, ) is called (order) isomorphism iﬀ both “x y ⇔ ψ(x) ψ(y)” and “ψ is onto Q”. Two posets (P, ) and (Q, ) are called isomorphic, symbolically (P, ) ∼ = (Q, ), iﬀ there is an isomorphism between them. Positive valuation in a general lattice (L, ) is a real function v : L × L → R that satisﬁes both v(x) + v(y) = v(x y) + v(x y) and x ≺ y ⇒ v(x) < v(y) [2].

412

2.2

V.G. Kaburlasos

The Complete Lattice (F,) of Intervals’ Numbers (INs)

Based on generalized intervals, this subsection presents intervals’ numbers (IN s). A more general number type is deﬁned in the ﬁrst place, next. Definition 2. Generalized interval number (GIN) is a function G : (0, 1] → Δ. Let G denote the set of GINs. It follows the complete lattice (G, ), as the Cartesian product of complete lattices (Δ, ). Our interest here focuses on the sublattice3 of intervals’ numbers deﬁned next. Definition 3. An Intervals’ Number, or IN for short, is a GIN F such that both F (h) ∈ (Δ+ ∪ {O}) and h1 ≤ h2 ⇒ F (h1 ) F (h2 ). Let F denote the set of INs. It follows that (F, ) is a complete lattice with least element O = O(h) = [+∞, −∞] and greatest element I = I(h) = [−∞, +∞], ∀h ∈ (0, 1]. Conventionally, a IN will be denoted by a capital letter in italics, e.g. F ∈ F. Moreover, a N -tuple IN will be denoted by a capital letter in bold, e.g. F = (F1 , ..., FN ) ∈ FN . Lattice (FN , ) is the fourth-level in a hierarchy of complete lattices whose ﬁrst-, second- and third- level include lattices (R, ), (Δ,) and (F, ), respectively. A IN is a mathematical object, which admits diﬀerent interpretations as follows. First, based on the “resolution identity theorem”, a IN F (h), h ∈ (0, 1] may be interpreted as a fuzzy number, where F (h) is the corresponding α-cut for α = h. Hence, a IN F : (0, 1] → τO (R) may, equivalently, be represented by an upper-semicontinuous membership function mF : R → (0, 1]; that is the membership-function-representation for a IN. Moreover, a IN F (h), h ∈ (0, 1] is represented by a set of intervals; that is the interval-representation for a IN. There follows equivalence mF1 (x) ≤ mF2 (x) ⇔ F1 (h) F2 (h), where x ∈ R, h ∈ (0, 1]. Second, a IN F (h), h ∈ (0, 1] may also be interpreted as a probability distribution such that interval F (h) includes 100(1 − h)% of the distribution, whereas the remaining 100h% is split even both below and above interval F (h).

3

Novel Mathematical Tools

Consider the following deﬁnition [7,8,10]. Definition 4. Let (L, ) be a complete lattice with least and greatest elements O and I, respectively. An inclusion measure in (L, ) is a function σ : L×L → [0, 1], which satisfies the following conditions C0. C1. C2. C3. 3

σ(x, O) = 0, ∀x = O. σ(x, x) = 1, ∀x ∈ L. x y ≺ x ⇒ σ(x, y) < 1. u w ⇒ σ(x, u) ≤ σ(x, w).

A sublattice of a lattice (L, ) is another lattice (S, ) such that S ⊆ L.

Granular Fuzzy Inference System (FIS) Design by Lattice Computing

413

We remark that σ(x, y) can be interpreted as the fuzzy degree to which x is less than y; therefore notation σ(x y) may be used instead of σ(x, y). Two inclusion measures, namely sigma-meet and sigma-join, respectively, have been proposed [7,8] in the complete lattice (τO (R), ) of intervals as follows. 1) σ ([a, b] [c, d]) = v(θ(a∨c))+v(b∧d) v(θ(a))+v(b) , if a ∨ c ≤ b ∧ d; otherwise, σ ([a, b] [c, d]) = 0, and v(θ(c))+v(d) 2) σ ([a, b] [c, d]) = v(θ(a∧c))+v(b∨d) , where function v : R → R is strictly increasing, whereas function θ : R → R is strictly decreasing. In conclusion, as detailed in [7], the following two inclusion measures emerge, respectively, in the complete lattice (F,) of INs. 1 1) σ (F1 F2 ) = σ (F1 (h) F2 (h))dh. 0

2) σ (F1 F2 ) =

1

σ (F1 (h) F2 (h))dh.

0

The following Proposition can be interpreted with reference to Fig. 1. Proposition 1. Consider a continuous dual isomorphic function θ : R → R and a continuous positive valuation function v : R → R. Let U0 (h) = [u0 , u0 ], h ∈ (0, 1] be a trivial IN and let W (h), h ∈ (0, 1] be a IN with upper-semicontinuous membership function mW : R → R. Then σ (U0 W ) = mW (u0 ).

U0

1 h2

W mW(x)

mW(u0) = h0 h1 ah1

u0 ah2

bh2

bh1

x

Fig. 1. The sigma-meet σ (U0 W ) degree of inclusion of trivial IN U0 = [u0 , u0 ], h ∈ (0, 1] to IN W = W (h) = [ah , bh ], h ∈ (0, 1] equals mW (u0 ), where mW : R → R is the membership function of IN W

We remark that Proposition 1 couples a IN’s two diﬀerent representations, namely the interval-representation and the membership-function-representation. Note that the principal advantage of the former (interval) representation is that it enables useful algebraic operations, whereas the principal advantage of the latter (membership function) representation is that it enables convenient fuzzy logic interpretions. The practical signiﬁcance of Proposition 1 as well as of the following Proposition is demonstrated below.

414

V.G. Kaburlasos

Proposition 2. Consider complete lattices (Li , ), i ∈ {1, ..., N }, each equipped with an inclusion measure function σi : Li × Li → [0, 1]. Consider N -tuples x = (x1 , . . . , xN ) and y = (y1 , . . . , yN ) such that x, y ∈ L = L1 × . . . × LN . Furthermore, consider the conventional lattice ordering x y ⇔ xi yi , ∀i ∈ {1, ..., N }. Then, both functions 1) σ∧ : L × L → [0, 1] given by σ∧ (x y) = min{σi (xi yi )} and 2) σΠ : L × L → [0, 1] given by σΠ (x y) = Πσi (xi yi ), i ∈ {1, . . . , N } are inclusion measures in (L, ).

4

Computational Experiments

A FIS includes K rules (implications) Rk , k = 1, . . . K, of the following form Rule Rk : IF (variable V1 is Fk,1 ).and. . . . .and.(variable VN is Fk,N ) THEN ck This work does not concern with the consequents ck , k = 1, . . . K of rules. Instead, the interest here focuses exclusively on rule antecedents. Furthermore, unless otherwise stated, this work employs functions v(x) = x and θ = −x. Fig. 2 displays the antecedent of a FIS rule R with only two INs W1 and W2 having parabolic membership functions mW1 (x) = −x2 + 6x − 8 and mW2 (x) = −0.25x2 + 3.5x − 11.25, respectively. Let an input [u1,0 , u2,0 ] = [3.5, 5.5] be presented as shown in Fig. 3(a). Using conventional FIS techniques, the activation mR (u1,0 , u2,0 ) of rule R is a function of both mW1 (u1,0 ) = 0.75 and mW2 (u2,0 ) = 0.4375. For instance, it may be either mR (u1,0 , u2,0 ) = min{mW1 (u1,0 ), mW2 (u2,0 )} or mR (u1,0 , u2,0 ) = mW1 (u1,0 )mW2 (u2,0 ). Identical results are obtained by inclusion measure σ (., .) as explained next. Let trivial INs U1,0 = U1,0 (h) = [u1,0 , u1,0 ] = [3.5, 3.5], h ∈ (0, 1] and U2,0 = U2,0 (h) = [u2,0 , u2,0 ] = [5.5, 5.5], h ∈ (0, 1] represent real numbers u1,0 = 3.5 and u2,0 = 5.5, respectively. Then, based on Proposition 1, it follows both σ (U1,0 W1 ) = mW1 (u1,0 ) = 0.75 and σ (U2,0 W2 ) = mW2 (u2,0 ) = 0.4375. Finally, based on Proposition 2, the degree of inclusion of U0 = [U1,0 , U2,0 ] to W = [W1 , W2 ] may be either σ∧ (U0 W) = min{σ (U1,0 W1 ), σ (U2,0 W2 )} = min{mW1 (u1,0 ), mW2 (u2,0 )} or σΠ (U0 W) = σ (U1,0 W1 )σ (U2,0 W2 ) = mW1 (u1,0 )mW2 (u2,0 ).

W1

1

W2

1

mW (x 2)

AND

mW (x 1)

2

1

0

0

1

2

3

4

5 x1

6

7

8

9

10

0

0

1

2

3

4

5 x2

6

7

8

9

10

Fig. 2. A FIS rule R antecedent: “variable V1 is W1 ” and “variable V2 is W2 ”. The membership functions of INs W1 and W2 are parabolas mW1 (x1 ) and mW2 (x2 ) with maxima at x1 = 3 and x2 = 7, respectively.

Granular Fuzzy Inference System (FIS) Design by Lattice Computing

415

A ﬁrst substantial advantage for an inclusion measure is its capacity to accommodate “in principle” granular input INs for representing uncertainty/vagueness in practice [14]. For instance, consider the granular input INs U1 and U2 shown in Fig. 3(b) each with an isosceles (triangular) membership function of width 2 ∗ 0.2 = 0.4 centered at x1 = 3.5 and x2 = 5.5, respectively. Inclusion measure σ (., .) computes the activation of rule R in Fig. 3(b) as follows. 0.6825 0.7902 −0.2h−0.3+√1−h One the one hand, it is σ (U1 W1 ) = 1dh + dh + −0.4h+0.4 0

1

0dh ≈ 0.7456. On the other hand, it is σ (U2 W2 ) =

0.7902 0.5088 2√1−h−0.2h−1.3 dh −0.4h+0.4 0.3331

W1

1 0.75

1

+

0dh ≈ 0.4321.

U1,0

U2,0

1

W2 mW (x 2)

AND

2

0.4375

3 3.5 4

2

1

1dh +

0.5088

mW (x 1)

0

0.3331 0

1

0

0.6825

5 x1

0

10

9

8

7

6

0

1

2

4

3

5 5.5 6 x2

9

8

7

10

(a) W1

1

U1

AND mW (x 1)

0

1

2

W2 mW (x 2) 2

0.5088 0.3331

1

0

U2

1

0.7902 0.6825

3 3.5 4

5 x1

6

7

9

8

0

10

0

1

2

4

3

5 5.5 6 x2

7

8

9

10

(b) Fig. 3. Consider the antecedent of rule R in Fig. 2. (a) Rule R is activated by trivial IN U0 = [U1,0 , U2,0 ]. (b) Rule R is activated by IN U = [U1 , U2 ], where both INs U1 and U2 have an isosceles (triangular) membership function of width 2 ∗ 0.2 = 0.4.

A second substantial advantage for σ (., .), in particular, is its capacity to deal with nonoverlapping INs towards sensibly employing a sparse rule-base. For instance, on the one hand, Fig. 4(a) shows a trivial IN input U0 = [U0 , U0 ], where U0 = U0 (h) = [4.5, 4.5], h ∈ (0, 1], presented to rule R. It follows σ (U0 W1 ) = 1 2√1−h 1 4√1−h √ √ dh ≈ 0.5974, moreover σ (U W ) = dh ≈ 0.6737. On 0 2 1.5+ 1−h 2.5+2 1−h 0

0

the other hand, Fig. 4(b) shows a nontrivial IN input U = [U, U ] presented to rule R, where IN U has an isosceles (triangular) membership function of width √ 1 2 1−h √ 2 ∗ 0.2 = 0.4 centered at 4.5. It follows σ (U W1 ) = 1.7−0.2h+ dh ≈ 1−h 0.5693, and σ (U W2 ) =

1 0

√ 4 1−h√ dh 2.7−0.2h+2 1−h

0

≈ 0.6555.

416

V.G. Kaburlasos

W1

1

U0

U0

1

W2 mW (x 2)

AND

mW (x 1)

2

1

0

0

2

1

4 4.5 5 x1

3

6

0

10

9

8

7

0

1

2

4 4.5 5 x2

3

7

6

9

8

10

(a) W1

1

U

W2

U

1

mW (x 2)

AND

mW (x 1)

2

1

0

0

1

2

4 4.5 5 x1

3

6

0

10

9

8

7

0

1

2

7

6

4 4.5 5 x2

3

8

9

10

(b) Fig. 4. Consider the antecedent of rule R in Fig. 2. (a) A trivial IN input U0 = [U0 , U0 ] is presented. (b) A granular IN input U = [U, U ] is presented. Only inclusion measure σ (., .) can activate “in principle” rule R.

Finally, a third substantial advantage for an inclusion measure is its capacity to employ alternative positive valuation functions, whereas, in stark contrast, the majority of FISs in the literature (implicitly) employ solely positive valuation v(x) = x. In the following we demonstrate the eﬀects of the (paramet1 , x ∈ R, where ric) sigmoid positive valuation function vs (x; λ, μ0 ) = 1+e−λ(x−μ 0) >0 λ ∈ R , μ0 ∈ R. Consider INs W1 and W2 of Fig. 2, trivial IN U0 of Fig. 4(a), and triangular IN U of Fig. 4(b). Then, for the sigmoid function vs (x; 1, 4.5) shown in Fig. 5, it was computed σ (U0 W1 ) ≈ 0.6114 and σ (U0 W2 ) ≈ 0.9999; furthermore, σ (U W1 ) ≈ 0.5803 and σ (U W2 ) ≈ 1. Hence, a positive valuation can be used as an instrument for tunable decision-making.

W1

1

U0

W2

U 0.5

0 -6

-5

-4

-3

-2

-1

0

1

2 x

3

4.5

6

7

8

9

10

Fig. 5. INs W1 and W2 of Fig. 2 are displayed as well as both trivial IN U0 and triangular IN U of Fig. 4. Inclusion measures σ (., .) were computed using the displayed sigmoid positive valuation vs (x; λ, μ0 ) = 1/(1 + e−λ(x−μ0 ) ) with λ = 1, μ0 = 4.5.

Granular Fuzzy Inference System (FIS) Design by Lattice Computing

5

417

Discussion and Conclusion

This work introduced two major theoretical results, presented by Proposition 1 and Proposition 2 relating, on the one hand, inclusion-measure-based algebraic operations in lattice (F, ) and, on the other hand, membersip-function-based fuzzy logic operations. In conclusion, signiﬁcant improvements were demonstrated in FIS design including the rigorous fusion of granular input data, sensible employment of sparse rules, and introduction of tunable nonlinearities. Acknowledgement. This work has been supported, in part, by a project Archimedes-III contract.

References 1. Belohlavek, R.: Fuzzy Relational Systems: Foundations & Principles. Springer, Heidelberg (2002) 2. Birkhoﬀ, G.: Lattice Theory. AMS, Colloquium Publications 25 (1967) 3. Gra˜ na, M.: State of the art in lattice computing for artiﬁcial intelligence applications. In: Nadarajan, R., Anitha, R., Porkodi, C. (eds.) Mathematical and Computational Models, pp. 233–242 (2007) 4. Gra˜ na, M.: Lattice computing: lattice-theory-based computational intelligence. In: Matsuhisa, T., Koibuchi, H. (eds.) Proc. Kosen Workshop on Mathematics, Technology, and Education (MTE), pp. 19–27 (2008) 5. Gra˜ na, M., Villaverde, I., Maldonado, J.O., Hernandez, C.: Two lattice computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72(10-12), 2111–2120 (2009) 6. Guillaume, S.: Designing fuzzy inference systems from data: an interpretabilityoriented review. IEEE Trans. Fuzzy Systems 9(3), 426–443 (2001) 7. Kaburlasos, V.G.: Towards a Uniﬁed Modeling and Knowledge-Representation Based on Lattice Theory. SCI, vol. 27. Springer, Heidelberg (2006) 8. Kaburlasos, V.G., Hatzimichailidis, A.G.: Improved fuzzy inference system (FIS) design based on fuzzy lattice reasoning (FLR) (submitted) 9. Kaburlasos, V.G., Papadakis, S.E.: Piecewise-linear approximation of nonlinear models based on interval numbers (INs). In: Kaburlasos, V.G., Priss, U., Gra˜ na, M. (eds.) Proc. Lattice-Based Modeling (LBM 2008) Workshop, pp. 13–22 (2008) 10. Papadakis, S.E., Kaburlasos, V.G.: Piecewise-linear approximation of nonlinear models based on probabilistically/possibilistically interpreted intervals’ numbers (INs). Information Sciences (to be published) 11. Pedrycz, W., Skowron, A., Kreinovich, V. (eds.): Handbook of Granular Computing. John Wiley & Sons, Chichester (2008) 12. Ritter, G.X., Wilson, J.N.: Handbook of Computer Vision Algorithms in Image Algebra, 2nd edn. CRC Press, Boca Raton (2000) 13. Sussner, P., Valle, M.E.: Morphological and certain fuzzy morphological associative memories for classiﬁcation and prediction. In: Kaburlasos, V.G., Ritter, G.X. (eds.) Computational Intelligence Based on Lattice Theory. SCI, vol. 67, pp. 149–171. Springer, Heidelberg (2007) 14. Wang, P.P.: Mathematics of Uncertainty – Guest Editiorial. Information Sciences 177(23), 5141–5142 (2007) 15. Xu, Y., Ruan, D., Qin, K., Liu, J.: Lattice-Valued Logic. Studies in Fuzziness and Soft Computing, vol. 132. Springer, Heidelberg (2003)

Median Hetero-Associative Memories Applied to the Categorization of True-Color Patterns Roberto A. Vázquez1 and Humberto Sossa2 1

Escuela de Ingeniería – Universidad La Salle Benjamín Franklin 47 Col. Condesa CP 06140 México, D.F. 2 Centro de Investigación en Computación – IPN Av. Juan de Dios Batiz, esquina con Miguel de Othon de Mendizábal Ciudad de México, 07738, México [email protected], [email protected]

Abstract. Median associative memories (MED-AMs) are a special type of associative memory based on the median operator. This type of associative model has been applied to the restoration of gray scale images and provides better performance than other models, such as morphological associative memories, when the patterns are altered with mixed noise. Despite of his power, MEDAMs have not been applied in problems involving true-color patterns. In this paper we describe how a median hetero-associative memory (MED-HAM) could be applied in problems that involve true-color patterns. A complete study of the behavior of this associative model in the restoration of true-color images is performed using a benchmark of 14400 images altered by different type of noises. Furthermore, we describe how this model can be applied to an image categorization problem.

1 Introduction The concept of associative memory (AM) emerges from psychological theories of human and animals learning. These memories store information by learning correlations among different stimuli. When a stimulus is presented as a memory cue, the other is retrieval as a consequence; this means that the two stimuli have become associated each other in the memory. An AM can be seen as a particular type of neural network designed to recall output patterns in terms of input patterns that can appear altered by some kind of noise. Several AMs have been proposed in the last 50 years (refer for example [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11] and [12]). Some of these AMs have several constraints that limit their applicability in complex problems. Most of these constraints are related to storage capacity, the type of patterns (only binary, bipolar), see for example [4], and robustness to noise (additive, subtractive, mixed, Gaussian noise, deformations, etc), see for example [8] and [12]. In 1998, Ritter et al. [8] proposed the concept of morphological associative memories (MAMs) which exhibit optimal absolute storage capacity and one-step convergence. Basically, the authors substituted the outer product by max and min E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 418–428, 2010. © Springer-Verlag Berlin Heidelberg 2010

Median Hetero-Associative Memories Applied to the Categorization

419

operations. This type of associative model has been applied to different pattern recognition problems including face localization and reconstruction of gray scale [9] and true-color images [17], but they are not robust to mixed noise. However, the morphological associative model alone is incapable to deal with patterns distorted with additive and subtractive noise at the same time. A solution to this problem was proposed in [6]. There are other approaches based on fuzzy theory and lattice theory, see for example [7], [13], [14] and [16]. Kosko’s model [7] describes an associative memory in terms of a nonlinear matrix vector product called max-min composition, and the synaptic weight matrix is given by fuzzy Hebbian learning; however, it exhibits a low storage capacity (one rule per FAM matrix). Later in 1996, Chung and Lee [13] presented a generalization of this model and demonstrated that a perfect recall of multiple rules per FAM matrix is possible if the input fuzzy sets are normal and max-t composition orthogonal. Recently, Sussner and Valle [14], generalized the implicative learning rules to include any max-t composition based on a continuous t-norm. On the other hand, the associative memory based on a dendritic single layer morphological perceptron is robust under different type of noises [16]. Despite of the robustness of these models under noisy patterns, they do not present one-step convergence as morphological associative memories. Another interesting one-step approach was introduced by Sossa, et al. [12]. In this model, the authors substituted the max-min operator by the med operator. By using this new operator the median associative model (MED-AM) was capable to deal with patterns which include additive and subtractive noise at the same time. Despite of the power of recent models, they have not been applied in problems that involve true-color patterns neither a deep study of this associative model under truecolor image pattern has been performed. In this paper it is described how a MED-AM could be applied in problems that involve true-color patterns. Furthermore, a complete study of the behavior of this associative model in the reconstruction of true-color images is performed using a benchmark of 14400 images altered by different type of noises. In addition, we describe how this model could be applied to an image categorization problem.

2 Basics on Median Associative Memories An associative memory is a device designed to recall patterns. These patterns might appear altered by noise. An associative memory M can be viewed as an input-output system as follows: x → M → y , with x and y, respectively the input and output patterns vectors. Each input vector forms an association with a corresponding output vector. The associative memory M is represented by a matrix whose ij-th component is mij . M is generated from a finite a priori set of known associations, known as the fundamental set of associations, or simply the fundamental set (FS). If ξ is an index,

{(

)

}

the fundamental set is represented as: x , y | ξ = 1,2, K, p with p the cardinality of the set. The patterns that form the fundamental set are called fundamental patterns. If it holds that

ξ

ξ

x ξ = y ξ ∀ ξ ∈ {1,2,K p}, then M is auto-associative,

420

R.A. Vázquez and H. Sossa

otherwise it is hetero-associative. A distorted version of a pattern x to be recuperated will be denoted as ~ x . If when feeding a distorted version of x with w ∈ {1,2,K, p} to an associative memory M, then it happens that the output correw

sponds exactly to the associated pattern Let

[ ]

P = pij

m×r

and

[ ]

Q = qij

r ×n

y w , we say that recalling is perfect. two matrices.

Definition 1. The following two matrix operations are defined to recall integervalued patterns:

[ ] = [f ]

1.

Operation

◊ Α : Pm×r ◊ Α Qr×n = f ijΑ

2.

Operation

◊ Β : Pm×r ◊ Β Qr×n

where

f ijΑ = ⊗ Α( pik , qkj ) .

where

f ijΒ = ⊗ Β( pik , qkj ) .

r

m×n

Β ij m×n

k =1 r

k =1

According to the operators ⊗, Α and Β, different results can be obtained. If we want, for example, to compensate for additive or subtractive noise, operator ⊗ should be replaced by median operator (med) because it provides excellent results in the presence of mixed noise. It can be easily shown that if matrix of dimension m × n .

x ∈ Z n and y ∈ Z m , then y◊ Α x t is a

Relevant simplifications are obtained when operations tween vectors: 1.

If x ∈ Z and holds that n

y ∈ Z m , then y◊ Α x t is a matrix of dimensions m×n, and also it

⎛ Α( y1 , x1 ) Α( y1 , x 2 ) ⎜ ⎜ Α( y 2 , x1 ) Α( y 2 , x 2 ) y ◊Α xt = ⎜ M M ⎜ ⎜ Α( y , x ) Α( y , x ) m m 1 2 ⎝

2.

If

◊ Α and ◊ Β are applied be-

L Α( y1 , x n ) ⎞ ⎟ L Α( y 2 , x n ) ⎟ ⎟ O M ⎟ L Α( y m , x n )⎟⎠ m×n

(1)

x ∈ Z n and P a matrix of dimensions m×n, operations Pm×n ◊ Β x gives as a

result one vector with dimension m, with i-th component given as:

(Pm×r ◊ Β x )i = med Β( p ij , x j ) j =1 n

If

(2)

x ∈ Z n and P a matrix of dimensions m × n then operation M m×n ◊ Β x outputs

an m-dimensional column vector, with i-th component given as:

(M m×n ◊ Β x )i

= med Β(mij , x j ) n

j =1

(3)

Median Hetero-Associative Memories Applied to the Categorization

421

Operators Α and Β are defined as follows:

Α( x, y ) = x − y Β(x, y ) = x + y

(4) (5)

2.1 Memory Construction Two steps are required to build the MED-AM: Step 1:

For each ξ = 1,2,L, p , from each couple

[y ◊ (x ) ] ξ

Step 2:

Α

ξ t

m×n

(x

ξ

, yξ

)

build matrix:

as in equation 1.

Apply the median operator to the matrices obtained in Step 1 to get matrix M as follows: p

[

( )]

M = med y ξ ◊ Α x ξ ξ =1

t

(6)

2.2 Pattern Recall

~ x (altered version of a pattern x w is presented to the HAM memory M x using equation 3. and the following operation is done by M◊ Β ~ A pattern

The complete set of theorems which guarantee perfect recall and their corresponding proofs are presented in [12]. However, in practice most of the fundamental sets of patterns do not satisfy the restricted conditions imposed by the authors. For that reason the authors propose the following procedure to perfectly recall a general FS. TRAINING PHASE: Step 1. Transform the FS into an auxiliary fundamental set (FS’) satisfying Theorem 1: 1) Make d = cont .

(

) (

)

2) Make x , y = x , y . 3) For the remaining couples do { For ξ = 2 to p For i=1 to n { xiξ = xiξ −1 + d ; xˆiξ = xiξ − xiξ ; yiξ = yiξ −1 + d ; yˆiξ = yiξ − yiξ } 1

1

1

1

Step 2. Build matrix M in terms of set FS’: Apply to FS’ steps 1 and 2 of the training procedure described at the beginning of Section 2.1. Remark 1. After this transformation, patterns from the auxiliary fundamental set are equidistant among them in a ratio of d. This value also determines the noise supported by the model. Originally the authors decided to use the difference between the first components; however, in [10] the authors proposed other technique to compute d.

422

R.A. Vázquez and H. Sossa

RECALLING PHASE: ξ Recalling of a pattern y from an altered version of its key ~ xξ :

~ x ξ to x ξ by applying the following transformation: xξ = ~ x ξ + xˆ ξ . ξ ξ 2) Apply equations 3 to x to get y , and ξ ξ ξ ξ ξ 3) Anti-transform y as y = y − yˆ to get y . 1) Transform

Some important to mention is that this MED-AM is robust only to mixed noise.

3 Behavior of the MED-AM In this section a behavioral study of the MED-HAM using true-color noisy patterns is presented. The benchmark used in this set of experiments is composed by 14440 color images of 63 × 43 pixels and 24 bits in a bmp format [15]. This benchmark is composed of 40 classes of flowers and animals. Per each class, there are 90 images altered with additive noise (0% of the pixels to 90% of the pixels), 90 images altered with subtractive noise (0% of the pixels to 90% of the pixels), 90 images altered with mixed noise (0% of the pixels to 90% of the pixels) and 90 images altered with Gaussian noise (0% of the pixels to 90% of the pixels). In addition, one image of each class was altered removing some parts of the image. Some images which compose this benchmark are shown in Fig. 1. Additive noise Subtractive noise Mixed noise Gaussian noise 10 % of noise

20 % of noise

30 % of noise

40 % of noise

50 % of noise

Missing data

Fig. 1. Some images from the benchmark used to train and test the MED-AM

Median Hetero-Associative Memories Applied to the Categorization

423

In order to generate the images altered with additive noise, we follow the next procedure: 1) use a uniform distribution to select k percentage of the RGB pixels which compose the image; 2) set each component of these RGB pixels to the maximum grey level value

( L − 1) , in this case 255 to produce a white color. As with the previous

procedure, in order to generate the images altered with subtractive noise, we use a uniform distribution to select at random k percentage of the RGB pixels which compose the image and then, each component of these RGB pixels were set to the minimum grey level value, in this case 0 to produce a black color. For the case of mixed noise, we combined the two previous procedures as follows: randomly select k percentage of the RGB pixels which compose the image, then generate a random number between 0 and 1 and if the number is greater than 0.5 alter the RGB pixels with additive noise, otherwise alter with subtractive noise. In order to generate the image altered with Gaussian noise, we use an uniform distribution to select k percentage of the RGB pixels which compose the image, then set each component of these RGB pixels with a random number sampled from a normal L −1

L −1

distribution with

μ = ∑i i =0

and

σ = 1 L ∑ (i − μ )

2

.

i =0

Although it seems that there is not much difference between gray level images and true-color images, MED-AMs are not designed to cope with multivariable patterns (three channels per pixel). Instead of training one memory per color channel and then deciding how to combine the information recalled by each memory and finally restore the true-color image, we proposed to transform these three channels into one channel. Before the MED-HAM was trained, each image was transformed into an image pattern. To build an image pattern from a bmp file, the image was read from left-right and up-down. Each RGB pixel (hexadecimal value) was transformed into a decimal value and finally, this information was stored into an array. For example, suppose that the value of a RGB pixel is “0x3E53A1” then by transforming into its decimal version, its corresponding decimal value will be “4084641”. Note that if we transform the RGB channels into one channel by means of computing the average of the three channels (in other words transform the true-color image into a gray level image), we will not able to recover the information of the RGB channels from the average channel. Once the images were transformed, the MED-HAM was trained using a set of associations composed by the 40 image patterns which are not altered with any type of noise. Each image pattern is composed by 2709 pixels which imply that this MEDHAM has 2709 × 2709 synaptic weights. First to all, we verified if the MED-HAM was able to recall the complete set of associations. Then we verified the behavior of MED-HAM using noisy versions of the images used to train it. After that, we performed a study concerning to how the number of associations influence the behavior of the MED-HAM. In order to measure the accuracy of the MED-HAM we counted the percentage of pixels correctly recalled. Once trained the associative memory, we proceeded to evaluate the behavior of the MED-HAM. It is important to remark that, even storing true-color patterns in the MED-HAM, it was capable to recall the complete set of associations.

424

R.A. Vázquez and H. Sossa

In resume, we can conclude that the accuracy of the MED-HAM is not susceptible to the number of associations stored. The associative model presents the same behavior even if the number of associations is increased or decreased. The general behavior of the MED-HAM is shown in Fig. 2, where clearly we can observe the robustness of this memory with patterns altered with mixed noise. Although we already knew that the MED-HAM is robust to mixed noise, nobody had reported results using color images. These results are acceptable and support the applicability of this MED-HAM to restore true-color images from noisy versions altered with mixed noise. On the other hand, we expected the worst results for the case of additive, subtractive and Gaussian noise, however, the accuracy obtained with images altered with additive and subtractive noise was highly acceptable. In this experiment we also could observe that the MED-HAM was more robust to Gaussian than additive and subtractive noise. Although the authors in [12] said that MED-HAM is only robust to mixed noise, we experimentally showed that this model is robust also to true-color patterns altered with additive, subtractive, and Gaussian noise.

(a)

(c)

(b)

(d)

Fig. 2. General behavior of the MED-HAM tested with different type of noises

Contrary to the behavior of morphological associative memories (best accuracy for auto-associative version, low accuracy for the hetero-associative version) [17], the same accuracy with the same experiments was observed in the MED-AAM and MED-HAM versions. No comparison against other models was performed because

Median Hetero-Associative Memories Applied to the Categorization

425

authors do not report results related to the relationship between storage capacity and amount of noise with more than 5 associations and with more that the 40% of noise added to the image. 3.1 A Real Application: Image Categorization Using MED-HAMs

Image categorization is not a trivial problem when pictures are taken from real life situations. This implies that categorization must be invariant to several image transformations such as translations, rotations, scale changes, illumination changes, orientation changes, noise, and so on [18]. In this section we describe how images can be categorized using the MED-HAM already described and the methodology proposed in [18]. Suppose that we feed a MED-HAM with a picture and we expect that it responds with something indicating the content of the picture. For example, if the picture contains a lion, we would expect that the MED-HAM should respond with the word “lion”. A first step to solve this problem was reported in [18]; now we will use that approach to show the applicability of this median associative model trying to give solution to this image categorization problem when the concerned images are distorted only by additive noise. Following the procedure described in [18] we firstly selected a set of images, in this case the benchmark used in previous experiment. Then, we associated these images with describing words. The images and the describing words are our fundamenk

k

tal set of associations, with x is the k -image and y the k -describing word. With this set of associations we proceed to train the MED-HAM.

Lion

Leopard

Peacock

Tiger

Turtle

Zebra

Wild dog

Domestic dog

Rhinoceros

Flamingo

Fig. 3. Fundamental set of associations composed of 40 associations used to train the MEDHAM applied to an image categorization problem

Before the MED-HAM was trained, each image was transformed into an image pattern. The elements y r , r = 1,K, R of vector y correspond to the ASCII codes of the k

k

letters of each describing word, where R is the number of letters of a given word. In Fig. 3, we show the information used to train the associative memory. By using this set of association we expect that when feeding the MAM with the image which contains an agapanthus, we will recall the word “agapanthus”, even if the image is altered with additive noise. Once trained the associative model, we proceed to test the accuracy of the proposal altering the images with additive noise.

426

R.A. Vázquez and H. Sossa

In average, the accuracy of the proposal in this image categorization task was of 88.27%. As you can appreciate from Fig. 4, not all the 14400 images were correctly categorized, in other words, not all the words associated with the images were correctly recalled. From this figure we can also observe that if the quantity of noise added to the image is less than 70%, all the noisy images are correctly categorized or classified. If the quantity of noise surpasses this threshold, the accuracy starts to decreases.

Fig. 4. Accuracy of the MED-HAM when it is applied to an image categorization problem

On the other hand, when images miss some data (see figure 1), the complete set of associations were perfectly recalled, i.e. all images were correctly categorized of classified.

4 Conclusions In this paper, a complete behavioral study of the median hetero-associative memory in the restoration of true-color images was performed using a benchmark of 14400 images altered by different type of noises. Furthermore, we described how this associative model could be applied in an image categorization or classification problem, using the same benchmark Due to this associative model had been only applied to gray level patterns; this paper is useful to really understand the power and limitations of this model. Through several experiments, we found some interesting properties of this associative model. MED-HAMs present robust recall even if patterns are altered by some kind of noise. Furthermore, MED-HAMs are not sensitive to the amount of noises; however, after certain amount of additive and subtractive noise, the accuracy of the model tends to decrease; in this case this threshold is reach at 73% of noise. We also observed that the model is not only robust to mixed noise but to additive, subtractive and Gaussian noise too. Regarding to the storage capacity, we found that the accuracy of the model is not sensitive to the number of stored associations. In general we can say that when the number of association is increased, the accuracy of the memory reminds almost stable. In average, MED-HAMs correctly recall 79.9% of the pixels when patterns are altered by additive noise. A correctly recall of 79.4% of the pixels is obtained when

Median Hetero-Associative Memories Applied to the Categorization

427

patterns are altered by subtractive noise; for the case of mixed and Gaussian noise, correctly recall 100% of the pixels is obtained. These results are highly acceptable, compared against the results provided by the morphological associative model [17] which were in average 77.4% using the same benchmark. Concerning to the image categorization problem, the accuracy of the proposal in average was of 88.27%. We observed that if the quantity of noise added to the image to be classified is less than 70%, all the noisy images are correctly categorized or classified. On the other hand, if the quantity of noise surpasses this threshold, the accuracy starts to decreases. Other possible applications of MED-HAMS (not developed here due to space limitations) are the following: Recall of the word that best describes a given image corrupted by mixed noise, finding the index class of an object given an image of it, retrieving an associated image from a corrupted version of its associated image, and so on. Acknowledgements. The authors thank the SIP-IPN under grant 20091421. H. Sossa thanks CINVESTAV-GDL for the support to do a sabbatical stay from December 1, 2009 to May 31, 2010. Authors also thank the European Union, the European Commission and CONACYT for the economical support. This paper has been prepared by economical support of the European Commission under grant FONCICYT 93829. The content of this paper is an exclusive responsibility of the CIC-IPN and it cannot be considered that it reflects the position of the European Union. We thank also the reviewers for their comments for the improvement of this paper.

References [1] Steinbuch, K.: Die Lernmatrix. Kybernetik 1, 26–45 (1961) [2] Anderson, J.A.: A simple neural network generating an interactive memory. Math. Biosci. 14, 197–220 (1972) [3] Kohonen, T.: Correlation matrix memories. IEEE Trans. on Comp. 21, 353–359 (1972) [4] Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79, 2554–2558 (1982) [5] Sussner, P.: Generalizing operations of binary auto-associative morphological memories using fuzzy set theory. J. Math. Imaging Vis. 19, 81–93 (2003) [6] Ritter, G.X., et al.: Reconstruction of patterns from noisy inputs using morphological associative memories. J. Math. Imaging Vis. 19, 95–111 (2003) [7] Kosko, B.: Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence. Prentice- Hall, Englewood Cliffs (1992) [8] Ritter, G.X., Sussner, P., Diaz de Leon, J.L.: Morphological associative memories. IEEE Trans. Neural Networks 9, 281–293 (1998) [9] Sussner, P., Valle, M.: Gray-Scale Morphological Associative Memories. IEEE Trans. on Neural Netw. 17, 559–570 (2006) [10] Vazquez, R.A., Sossa, H.: A new associative memory with dynamical synapses. Neural Processing Letters 28(3), 189–207 (2008) [11] Vazquez, R.A., Sossa, H.: A Bidirectional Heteroassociative Memory for True Color Patterns. Neural Processing Letters 28(3), 131–153 (2008)

428

R.A. Vázquez and H. Sossa

[12] Sossa, H., Barron, R., Vazquez, R.A.: New associative memories to recall real-valued patterns. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 195–202. Springer, Heidelberg (2004) [13] Chung, F.-L., Lee, T.: On fuzzy associative memory with multiple-rule storage capacity. IEEE Trans. Fuzzy Syst. 4(4), 375–384 (1996) [14] Susner, P., Valle, M.E.: Implicative fuzzy associative memory. IEEE Trans. on Fuzzy Systems 14(6), 793–807 (2006) [15] Sossa, H., Vazquez, R.A.: Flower and Animals Database, http://roberto.a.vazquez.googlepages.com [16] Ritter, G.X., Urcid, G.: Learning in Lattice Neural Networks that Employ Dedritic Computing. In: Kaburlasos, V.G., Ritter, G.X. (eds.) Computational Intelligence based on Lattice Theory, vol. 67, pp. 25–44 (2007) [17] Vazquez, R.A., Sossa, H.: Behavior of morphological associative memories with truecolor image patterns. Neurocomputing 73, 225–244 (2009) [18] Vazquez, R.A., Sossa, H.: Associative memories applied to image categorization. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 549–558. Springer, Heidelberg (2006)

A Comparison of VBM Results by SPM, ICA and LICA Darya Chyzyk, Maite Termenon, and Alexandre Savio Computational Intelligence Group Dept. CCIA, UPV/EHU, Apdo. 649, 20080 San Sebastian, Spain www.ehu.es/ccwintco

Abstract. Lattice Independent Component Analysis (LICA) approach consists of a detection of independent vectors in the morphological or lattice theoretic sense that are the basis for a linear decomposition of the data. We apply it in this paper to a Voxel Based Morphometry (VBM) study on Alzheimer’s disease (AD) patients extracted from a well known public database. The approach is compared to SPM and Independent Component Analysis results.

1

Introduction

Morphometry analysis has become a common tool for computational brain anatomy studies. It allows a comprehensive measurement of structural diﬀerences within a group or across groups, not just in speciﬁc structures, but throughout the entire brain. Voxel-based Morphometry (VBM) is a computational approach to neuroanatomy that measures diﬀerences in local concentrations of brain tissue through a voxel-wise comparison of multiple brain images [3]. For instance, VBM has been applied to study volumetric atrophy of the grey matter (GM) in areas of neocortex of AD patients vs. control subjects [4,17,6]. The procedure involves the spatial normalization of subject images into a standard space, segmentation of tissue classes using a priori probability maps, smoothing to correct noise and small variations, and voxel-wise statistical tests. Statistical analysis is based on the General Linear Model (GLM) to describe the data in terms of experimental and confounding eﬀects, and residual variability. Classical statistical inference is used to test hypotheses that are expressed in terms of GLM estimated regression parameters. This computation is speciﬁed as a contrast that produces a scalar estimate which the Statistical Parametric Map (SPM) thresholds according to the Random Field theory to obtain clusters of signiﬁcant voxels. SPM has been also widely applied to fMRI voxel activation analysis. Alternative works on fMRI analysis are based on the Independent Component Analysis (ICA) [18] assuming that the time series observations are linear mixtures of independent sources which can not be observed. This leads us to consider here ICA and other approaches for VBM on transversal data. ICA assumes that the source signals are non-Gaussian and that the linear mixing process is unknown. The approaches to solve the ICA problem obtain both the independent sources E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 429–435, 2010. c Springer-Verlag Berlin Heidelberg 2010

430

D. Chyzyk, M. Termenon, and A. Savio

and the linear unmixing matrix. These approaches are unsupervised because no a priori information about the sources or the mixing process is included, hence the alternative name of Blind Deconvolution. Sources in VBM correspond to the pattern of intensities of a voxel across the population of subjects. We have used the FastICA algorithm implementation available at [2] . We have also used the implementations of Maximum Likelihood ICA [14] (which is equivalent to Infomax ICA), Mean Field ICA [13], Molgedey and Schouster ICA based on dynamic decorrelation [15], which are available at [1]. We have proposed [11,9] a Lattice Computing [8] approach that we call Lattice Independent Component Analysis (LICA) that consists of two steps. Firts it selects Strong Lattice Independent (SLI) vectors from the input dataset using an incremental algorithm, the Incremental Endmember Induction Algorithm (IEIA) [10]. Second, because of the conjectured equivalence between SLI and Aﬃne Independence [12], it performs the linear unmixing of the input dataset based on these endmembers1 . Therefore, the approach is a mixture of linear and nonlinear methods. We assume that the data is generated as a convex combination of a set of endmembers which are the vertices of a convex polytope covering some region of the input data. This assumption is similar to the linear mixture assumed by the ICA approach, however we do not impose any probabilistic assumption on the data. The endmembers discovered by the IEIA are equivalent to the GLM design matrix columns, and the unmixing process is identical to the conventional least squares estimator so LICA is a kind of unsupervised GLM whose regressor functions are mined from the input dataset. If we try to stablish correspondences to the ICA, the endmembers correspond to the unknown sources and the mixing matrix is the one given by the abundance coeﬃcients computed by least squares estimation. The outline of the paper is as follows: Section 2 overviews the LICA. Section 3 presents results of the proposed approach on a VBM case study on an Alzheimer’s Disease population with paired controls. Section 4 provides some conclusions.

2

The Lattice Independent Component Analysis

M The linear mixing model can be expressed as follows: x = i=1 ai ei + w = Ea + w, where x is the d-dimension pattern vector corresponding to the fMRI voxel time series vector, E is a d×M matrix whose columns are the d-dimensional vectors, when these vectors are the vertices of a convex region covering the data they are called endmembers ei , i = 1, .., M, a is the M -dimension vector of linear mixing coeﬃcients, which correspond to fractional abundances in the convex case, and w is the d-dimension additive observation noise vector. The linear mixing model is subjected to two constraints on the abundance coeﬃcients when the data points fall into a simplex whose vertices are the endmembers, all abundance coeﬃcients must be non-negative ai ≥ 0, i = 1, .., M and normalized to unity 1

The original works were devoted to unsupervised hyperspectral image segmentation, therefore the use of the name endmember for the selected vectors.

A Comparison of VBM Results by SPM, ICA and LICA

431

summation M i=1 ai = 1. Under this circumstance, we expect that the vectors in E are aﬃnely independent and that the convex region deﬁned by them includes all the data points. Once the endmembers have been determined, the unmixing process is the computation of the matrix inversion that gives the coordinates of the point relative to the convex region vertices. The simplest approach is the un −1 T constrained least squared error (LSE) estimation given by: a = ET E E x. Even when the vectors in E are aﬃnely independent, the coeﬃcients that result from this estimation do not necessarily fulﬁll the non-negativity and unity normalization. Ensuring both conditions is a complex problem. We call Lattice Independent Component Analysis (LICA) the following approach: 1. Induce from the given data a set of Strongly Lattice Independent vectors. In this paper we apply the Incremental Endmember Induction Algorithm (IEIA) [10,9]. These vectors are taken as a set of aﬃne independent vectors. The advantages of this approach are (1) that we are not imposing statistical assumptions, (2) that the algorithm is one-pass and very fast because it only uses comparisons and addition, (3) that it is unsupervised and incremental, and (4) that it detects naturally the number of endmembers. 2. Apply the unconstrained least squares estimation to obtain the mixing matrix. The detection results are based on the analysis of the coeﬃcients of this matrix. Therefore, the approach is a combination of linear and lattice computing: a linear component analysis where the components have been discovered by non-linear, lattice theory based, algorithms.

3 3.1

A VBM Case Study Experimental Data

Ninety eight right-handed women (aged 65-96 yr) were selected from the Open Access Series of Imaging Studies (OASIS) database (http://www.oasis-brains.org) [16]. OASIS data set has a cross-sectional collection of 416 subjects covering the adult life span aged 18 to 96 including individuals with early-stage Alzheimer’s Disease. We have ruled out a set of 200 subjects whose demographic, clinical or derived anatomic volumes information was incomplete. For the present study there are 49 subjects who have been diagnosed with very mild to mild AD and 49 nondemented. A summary of subject demographics and dementia status is shown in table 1. Multiple (three or four) high-resolution structural T1-weighted magnetizationprepared rapid gradient echo (MP-RAGE) images were acquired [5] on a 1.5-T Vision scanner (Siemens, Erlangen, Germany) in a single imaging session. Image parameters: TR= 9.7 msec., TE= 4.0 msec., Flip angle= 10, TI= 20 msec., TD= 200 msec., 128 sagittal 1.25 mm slices without gaps and pixels resolution of 256×256 (1×1mm).

432

D. Chyzyk, M. Termenon, and A. Savio

Table 1. Summary of subject demographics and dementia status. Education codes correspond to the following levels of education: 1 less than high school grad., 2: high school grad., 3: some college, 4: college grad., 5: beyond college. Categories of socioeconomic status: from 1 (biggest status) to 5 (lowest status). MMSE score ranges from 0 (worst) to 30 (best).

No. of subjects Age Education Socioeconomic status CDR (0.5 / 1 / 2) MMSE

3.2

Very mild to mild AD Normal 49 49 78.08 (66-96) 77.77 (65-94) 2.63 (1-5) 2.87 (1-5) 2.94 (1-5) 2.88 (1-5) 31 / 17 / 1 0 24 (15-30) 28.96 (26-30)

Algorithms Applied

We have applied both SPM and FSL approaches to this data. Figure 1 shows the activation results from a FSL study on this data. We have used the preprocessed volumes as inputs for the ICA and LICA algorithms. Detection of signiﬁcative voxels in ICA and LICA approaches is given by setting the threshold on the mixing/abundance coeﬃcients to the 95% percentil of the empirical distribution (histogram) of this coeﬃcients. We present in ﬁgure 2 the activation results corresponding to the 3d endmember detected by the LICA algorithm, for comparison with the FSL results. It can be appreciated a great agreement. Because both ICA and LICA are unsupervised in the sense that the pattern searched is not prescribed, they suﬀer from the identiﬁcability problem: we do not know beforehand which of the discovered sources/endmembers correspond to the sought signiﬁcative pattern, while SPM and FSL approaches are supervised in the sense that we provide the a priori identiﬁcation of controls and patients, searching for voxels that correlate well with this indicative variable. In order to provide a quantitative assessment of the agreement between the discoveries of the ICA and LICA and the statistical signiﬁcances computed by SPM and FSL we computed the correlations between the abundance/mixture matrices of the ICA approach. Table 2 shows the correlation between the mixing coeﬃcients and the abundance coeﬃcients of the corresponding ICA ML algorithm sources (the one with best results) and the LICA endmembers, both before (left) and after (right) the application of the 95% percentil threshold to determine the signﬁcative voxels. We decide that the best relation is between the third LICA endmember and the second ICA source, because their correlation does not drop after thresholding, contrary to LICA#4 with ICA#1 whose correlation drops dramatically after thresholding for signiﬁcance detection. To give some measure of the meaningfulness of the unsupervised approaches, we must ﬁnd out if they are able to uncover something that has a good agreement with the ﬁndings of either SPM or FSL approaches. Therefore we compute the correlation between the mixing/abundance coeﬃcients of ICA/LICA and the

A Comparison of VBM Results by SPM, ICA and LICA

Fig. 1. FSL signiﬁcative voxel detection

Fig. 2. LICA activation results for the endmember #3

433

434

D. Chyzyk, M. Termenon, and A. Savio

Table 2. Correlation among ICA and LICA mixing coeﬃcients, before (left) and after (right) thresholding for activation detection ICA ML LICA #1 #2 #3 #4 LICA #1 #1 #2 #3 #4

0.05 0.19 0.38 0.69

0.24 0.12 0.67 0.04

0.44 -0.28 0.30 0.26

-0.01 -0.60 0.24 -0.18

ICA ML #2 #3

#4

#1 0.003 0.09 0.34 0.03 #2 0.15 0.05 -0.02 -0.02 #3 0.01 0.66 0.007 0.08 #4 0.26 -0.01 0.13 -0.00

Table 3. Agreement between SPM, FSL, ICA and LICA

ICA vs SPM LICA vs SPM ICA vs FSL LICA vs FSL

#1

#2

#3

#4

-0.11 -0.03 0.08 0.07

0.32 -0.03 0.56 0.02

-0.02 0.23 0.03 0.58

0.02 -0.06 0.07 0.20

statistics computed by SPM and FSL. Table 3 shows these correlations. Here the agreement between the third endmember of LICA and the secod source of ICA ML obtains a further support, because both are the ones that show maximal agreement with SPM and FSL, and in both ICA and LICA the agreement with FSL is greater than with SPM results.

4

Summary and Conclusions

We have proposed and applied a Lattice Independent Component Analysis (LICA) to the model-free (unsupervised) VBM analysis. The LICA is based on the application of a Lattice Computing based algorithm IEIA for the selection of the endmembers, and the linear unmixing of the data based on these endmembers. We compare our results with those obtained by the conventional SPM and FSL algorithms, as well as the ICA unsupervised approach. We ﬁnd a strong agreement between LICA results and those of ICA, and we can identify endmembers and sources that correspond closely to the signiﬁcative detection of results in agreement with SPM and FSL, providing a validation of the approach. The problem with VBM and similar morphometric approaches is that we need to be able to give some interpretation to the ﬁndings of the ICA and LICA algorithms, that is, besides the obvious identiﬁcation of voxels that correlate well with the indicative variable, the problem is to ﬁnd additional regularities and give them some sense. Some kind of hierachical analysis [7] could be advantageous in the future works.

A Comparison of VBM Results by SPM, ICA and LICA

435

References 1. http://isp.imm.dtu.dk/toolbox/ica/index.html 2. http://www.cis.hut.fi/projects/ica/fastica/ 3. Ashburner, J., Friston, K.J.: Voxel-based morphometry: The methods. Neuroimage 11(6), 805–821 (2000) 4. Busatto, G.F., Garrido, G.E.J., Almeida, O.P., Castro, C.C., Camargo, C.H.P., Cid, C.G., Buchpiguel, C.A., Furuie, S., Bottino, C.M.: A voxel-based morphometry study of temporal lobe gray matter reductions in alzheimer’s disease. Neurobiology of Aging 24(2), 221–231 (2003) 5. Fotenos, A.F., Snyder, A.Z., Girton, L.E., Morris, J.C., Buckner, R.L.: Normative estimates of cross-sectional and longitudinal brain volume decline in aging and AD. Neurology 64(6), 1032–1039 (2005) 6. Frisoni, G.B., Testa, C., Zorzan, A., Sabattoli, F., Beltramello, A., Soininen, H., Laakso, M.P.: Detection of grey matter loss in mild alzheimer’s disease with voxel based morphometry. Journal of Neurology, Neurosurgery & Psychiatry 73(6), 657– 664 (2002) 7. Gra˜ na, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 8. Gra˜ na, M.: A brief review of lattice computing. In: Proc. WCCI, pp. 1777–1781 (2008) 9. Gra˜ na, M., Chyzyk, D., Garc´ıa-Sebasti´ an, M., Hern´ andez, C.: Lattice independent component analysis for FMRI. Information Sciences (in press, 2010) 10. Gra˜ na, M., Villaverde, I., Maldonado, J.O., Hernandez, C.: Two lattice computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72(10-12), 2111–2120 (2009) 11. Gra˜ na, M., Savio, A.M., Garcia-Sebastian, M., Fernandez, E.: A lattice computing approach for on-line FMRI analysis. Image and Vision Computing (in press, 2009) 12. Schmalz, M.S., Ritter, G.X., Urcid, G.: Autonomous single-pass endmember approximation using lattice auto-associative memories. Neurocomputing 72(10-12), 2101–2110 (2009) 13. Højen-Sørensen, P., Winther, O., Hansen, L.K.: Mean ﬁeld approaches to independent component analysis. Neural Computation 14, 889–918 (2002) 14. Kolenda, T., Hansen, L.K., Larsen, J.: Blind detection of independent dynamic components. In: Proc. IEEE ICASSP 2001, vol. 5, pp. 3197–3200 (2001) 15. Schuster, H., Molgedey, L.: Separation of independent signals using time-delayed correlations. Physical Review Letters 72(23), 3634–3637 (1994) 16. Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. Journal of Cognitive Neuroscience 19(9), 1498–1507 (2007) 17. Scahill, R.I., Schott, J.M., Stevens, J.M., Rossor, M.N., Fox, N.C.: Mapping the evolution of regional atrophy in alzheimer’s disease: Unbiased analysis of ﬂuidregistered serial MRI. Proceedings of the National Academy of Sciences 99(7), 4703 (2002) 18. Calhoun, T.V.D., Adali, T.: Unmixing FMRI with independent component analysis. IEEE Engineering in Medicine and Biology Magazine 25(2), 79–90 (2006)

Fusion of Single View Soft k-NN Classifiers for Multicamera Human Action Recognition Rodrigo Cilla, Miguel A. Patricio, Antonio Berlanga, and Jose M. Molina Computer Science Department, Universidad Carlos III de Madrid Avda. de la Universidad Carlos III, 22. 28270 Colmenarejo, Madrid. Spain {rcilla,mpatrici}@inf.uc3m.es, {aberlan,molina}@ia.uc3m.es

Abstract. This paper presents two diﬀerent classiﬁer fusion algorithms applied in the domain of Human Action Recognition from video. A set of cameras observes a person performing an action from a predeﬁned set. For each camera view a 2D descriptor is computed and a posterior on the performed activity is obtained using a soft classiﬁer. These posteriors are combined using voting and a bayesian network to obtain a single belief measure to use for the ﬁnal decision on the performed action. Experiments are conducted with diﬀerent low level frame descriptors on the IXMAS dataset, achieving results comparable to state of the art 3D proposals, but only performing 2D processing.

1

Introduction

Human Action Recognition (HAR) from video is one of the most active research areas in computer vision. Diﬀerent surveys of the works in the area have been published during the last years [1]. Applications of HAR systems range from video surveillance [2] and Ambient Assisted Living [3] to automatic annotation of video contents [4]. The recognition of Human Actions from video may be considered as a pattern recognition problem [5]. First, a low level descriptor is computed to try to capture the variance on the input frames. Popular choices at this level are motion templates [6], optical ﬂow descriptors [7], spatio-temporal interest points [4], trajectories [8] or a combination of them [9,2]. This computed descriptor is introduced into a classiﬁer to obtain the action category it belongs to. Common choices include Mixtures of Gaussians [8], Support Vector Machines [10], database searches [7,9,2] or Hierarchical Bayesian Models [11]. A particular feature in the recognition of Human Actions is that the actions do not happen isolated, they happen in a temporal sequence. The most popular technique to model the temporal sequence statistics has been Hidden Markov Models [12]. Other proposed techniques have been Context Free Grammars [13] or Conditional Random Fields [14]. In this work we assume that actions happen isolated, focusing on the descriptor classiﬁcation level. Most of the existing approaches to HAR have considered a single video sensor to perceive the environment where the actions take place. A single sensor may not be enough to accurately perceive the actions, due to the presence of occlusions. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 436–443, 2010. c Springer-Verlag Berlin Heidelberg 2010

Fusion of Single View Soft k-NN Classiﬁers

437

These occlusions may be caused by the relative position of the human body and the camera (self-occlusions) or by the presence of walls and furniture in the environment. To deal with these problems, HAR systems may be improved using a Visual Sensor Networks (VSN) [15] with overlapped cameras. In this paper we study how to obtain a single classiﬁcation of the action perceived by all the cameras from the outputs of a set of single camera soft classiﬁers. Single camera soft classiﬁers provide a posterior for the performed activity based on the frame descriptor previously computed. We try two diﬀerent approaches to solve the problem: the ﬁrst one is based on a weighted voting scheme; the second one is based on using a Bayesian Network to model the error produced by each one of the single view classiﬁers. Our approach avoids computing the 3D visual hull, an expensive and centralized task used by state of the art methods for multiple view human action recognition [16,17], using only 2D pattern recognition techniques. Paper is organized as follows: on section 2 the problem to solve is formally deﬁned; on section 3, the classiﬁer fusion algorithms to be tested are presented; on section 4, we present the single view soft classiﬁer we use to test the classiﬁer fusion algorithms; on section 5, results of applying the proposed algorithms to classify the IXMAS dataset are shown; ﬁnally, on section 6, the conclusions of this work are presented.

2

Problem Statement

Let ft = ft1 , . . . , ftC be a set of action descriptors computed by a set of C cameras at an arbitrary instant t. The posterior probability p (yn | ftc ) of action yn , yn ∈ Y = {y1 , . . . , yN } is obtained applying a soft classiﬁer to the descriptor ftc . Let B = {p (yn | ftc )}∀n, c be the set of all the posterior probabilities obtained after applying the soft classiﬁer to each one of the views. The problem we want to solve is how to combine posteriors in B into a single posterior the single camera for all the cameras, p yn | ft1 , . . . , ft1 , yn ∈ Y, in order to decide what is the activity yn being performed.

3

Fusion of Soft Classifiers

Two diﬀerent algorithms are going to be tested for this task. The ﬁrst one, a voting scheme. The second one, a bayesian network modeling the errors on local classiﬁcations. 3.1

Voting

The ﬁrst algorithm we are going to test for the fusion of single view soft classiﬁcations is deﬁned to be the sum of the posterior probabilities. C p (ai | ftc ) p ai | ft1 , . . . , ftC ∝ c=1

(1)

438

3.2

R. Cilla et al.

Bayesian Network

The second algorithm we are going to test for the fusion of single view soft classiﬁcations is based on the Bayesian Network shown on ﬁgure 1. The network is composed of observation nodes ftc , representing the observation at instant t on camera c, a node αt representing the activity at time t and a set of latent nodes vtc , to model the single view classiﬁcation. Given a set of frame descriptors ft = ft1 , . . . , ftC , a set of latent variables vt = vt1 , . . . , vtc , and the activity label αt , their joint probability is factorized as: p (αt , vt , ft ) = p (αt | vt )

C

p (vtc ) p (ftc | vtc )

(2)

c=1

The probability of αt is deﬁned as a product of independent factors, assuming independence between hidden variables vtc : C . p (αt | vt ) = p (αt | vtc )

(3)

c=1

With this assumption we refuse to model correlations between local classiﬁcation errors. In this way, when adding a new camera to the system only 2 conditional probability distributions need to be estimated, instead of the exponential number of them if the assumption were not made. Thus, equation 2 can be rewritten as: p (αt , vt , ft ) =

C

p (αt | vtc ) p (vtc ) p (ftc | vtc )

(4)

c=1

The posterior probability of an activity label αt and a set of hidden variables vt is proportional to the joint probability: p (αt , vt | ft ) ∝ p (αt , vt , ft )

(5)

Given a set of frame descriptors ft , the posterior probability of the activity label αt is obtained marginalizing equation 5 over the set of latent variables vt :

p (αt = ai | ftc ) =

N C

p (αt = ai | vtc = aj ) p (vtc = aj ) p (ftc | vtc = aj )

(6)

j=1 c=1

p (ftc | vtc = aj ) may be computed in terms of p (vtc = aj | ftc ) using Bayes theorem: p (ftc | vtc = aj ) =

p (vtc = aj | ftc ) p (ftc ) . p (vtc = aj | ftc ) = p (vtc = aj ) p (vtc = aj )

(7)

The term p (ftc ) vanishes assuming that ftc ∼ U nif orm. The ﬁnal expression for the posterior is obtained introducing the RHS of equation 7 into equation 6:

Fusion of Single View Soft k-NN Classiﬁers

439

Fig. 1. Plate model of the Bayesian Network used to combine the outputs from the classiﬁers at each camera

p (αt = ai | ftc ) =

N C

p (αt = ai | vtc = aj ) p (vtc = aj | ftc )

(8)

j=1 c=1

Network parameters are estimated using labeled training samples. p (vtc | ftc ) is known, being provided by the single view soft classiﬁers, so only p (αt | vct ) needs to be estimated. Be Oc = (oc1 , . . . , ocK } the set of K training frame descriptors computed at camera c with their corresponding activity labels Y c = c {y1c , . . . , yK }, ykc ∈ A. Model parameters are estimated as: K

p αt = ai | vtc = aj =

γk p (vtc = aj | ock )

k=1 N K

(9) γk p (vtc

= al |

ock )

l=1 k=1

where γk = 1 if yk = aj and γk = 0 otherwise.

4

Soft Classifier

The classiﬁer we are going to use to obtain the probability of each single frame being an instance of each action category is based on a k-Nearest Neighbor setting (kNN). Let D = {xi , y i }, 1 ≤ i ≤ M be a set of M training samples, being yi ∈ {y1 , yN } the label corresponding to the instance xi . The posterior probability p y | xj of a new sample xj to predict is decided sampling from the neighborhood of xi , transforming the distances to the k nearest neighbors into likelihood values: K γk ρj − xj − xk p y = yn | xj ∝ k=1

(10)

K where ρj = k=1 xj − xk ,i.e. the sum of the distances to the k nearest neighbors of xj ; γk = 1 if yn = yk and γk = 0. The main advantage of this classiﬁer is

440

R. Cilla et al.

that it captures the local structure of the data, being able to model multimodal distributions. Training is also very fast because only requires storing the samples on the database.

5

Experiments

5.1

Experimental Setup

Experiments are going to be conducted on the state-of-the art testbed for human action recognition: the Inria IXMAS dataset 1 . The dataset includes samples of eleven action categories performed by 12 diﬀerent actors 3 times each (36 clips), recorded by 5 diﬀerent camera views. The actions are: check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick and pick up. Two diﬀerent frame descriptors are used to model these actions and test our algorithms. The ﬁrst one is the popular Motion History Image (MHI) [6]. This descriptor is based on a temporal accumulation of the human body shape. The computed descriptors are resized to a box of 35x20 pixels, obtaining a feature vector of length lMHI = 700. The second one is the proposed by Tran et al.[9], including both shape and optical ﬂow information. The extracted descriptor can be obtained from their web2 , being its length lT ran = 286. The evaluation protocol to test the classiﬁcation and fusion algorithms is Leave-One-Clip-Out-Cross Validation: algorithms are trained with all the action clips unless one, that is used for testing. The procedure is repeated until all the clips have been used for testing. The kN N classiﬁcation algorithms are going to be tested using neighborhood values of k = 3 and k = 5. As the length of the descriptors is too large for practical usage, the well known Principal Component Analysis is applied to obtain reduced descriptors ranging from l = 10 to l = 45 with a stepsize of 5. 5.2

Results and Discussion

Single camera classification. Figure 2 shows the result of classifying each one of the camera views with the soft kNN classiﬁer. It is clear that the Tran descriptor predicts the activity performed on a single frame better than the MHI. This behavior was expectable until some point, because Tran’s descriptor includes shape and local motion information, while the MHI only includes shape. The classiﬁers with k = 5 always work better than those with K = 3. Single camera classiﬁcation results also show that while from cameras 1-4 the obtained accuracy is similar, camera 5 accuracy drops about a 10%. Camera 5 provides a top view of the action, preventing descriptors from accurately capturing the dynamics of the performed action. 1 2

http://charibdis.inrialpes.fr/ http://vision.cs.uiuc.edu/projects/activity/

Fusion of Single View Soft k-NN Classiﬁers

(a) Camera 1

(b) Camera 2

(c) Camera 3

(d) Camera 4

441

(e) Camera 5 Fig. 2. Classiﬁcation results at each camera before and after the fusion. The ﬁrst number stands for the number of nearest neighbors used. The suﬃx stands for the fusion algorithm used: V for voting and C for the bayesian network.

Fusion results. The diﬀerent plots shown on ﬁgure 2 also show the results obtained after applying the fusion algorithms to the single camera soft classiﬁcations. The weighted voting proposed on section 3.1 and the bayesian network proposed on section 3.2 have similar results, being voting slightly better. Fusion algorithms improve more the classiﬁcation based on MHI descriptors. This is

442

R. Cilla et al. Table 1. Comparison of the accuracy of our method to others Method Accuracy Type Tran et al. [9] 81 2D Srivastava et al. [18] 81.4 2D Multicamera Our 92.01 Multicamera Weinland et al. [16] 93.33 3D Peng et al. [17] 94.59 3D

probably because as the initial result was worse than when using Tran descriptors, it is easier to improve the results using fusion. Comparison to other proposals. Finally, on table 1, we compare the results obtained by our method to the obtained by other state of the art approaches. Our method performs better than other 2D multicamera approaches, obtaining results comparable to proposals based on computing the 3D visual hull. Results on the table are for sequence classiﬁcation. To obtain them, each frame on a sequence has voted with its posterior distribution to obtain the majority classiﬁcation.

6

Conclusions

In this paper we have shown how the accuracy of the task of human action classiﬁcation can be improved combining the results of single view classiﬁers. We want to remark that our method avoids visual hull computation, being very easy to implement on a distributed environment. Another advantage of the proposed method is that it can integrate other sensors without very much eﬀort, because the fusion level is independent of the type of sensor used. If a posterior for the activity can be obtained from the hypothetical sensor, it can be used in our system. Future works will explore how to model the correlations between the soft classiﬁcations from each camera. We suspect that the independence assumption made between sensor values is too strong, and that fusion results may be highly improved introducing dependencies between sensors in our fusion model. Acknowledgment. This work was supported in part by Projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, CAM CONTEXTS (S2009/TIC-1485) and DPS2008-07029-C02-02.

References 1. Lavee, G., Rivlin, E., Rudzsky, M.: Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video. IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and Reviews 39(5), 489–504 (2009)

Fusion of Single View Soft k-NN Classiﬁers

443

2. Robertson, N., Reid, I.: A general method for human activity recognition in video. Computer Vision and Image Understanding 104(2-3), 232–248 (2006) 3. Cilla, R., Patricio, M., Belanga, A., Molina, J.: Non-supervised discovering of user activities in visual sensor networks for ambient intelligence applications. In: 2nd International Symposium on Applied Sciences in Biomedical and Communication Technologies, ISABEL 2009, November 2009, pp. 1–6 (2009) 4. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, USA (2009) 5. Bishop, C., et al.: Pattern recognition and machine learning. Springer, New York (2006) 6. Bobick, A., Davis, J.: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001) 7. Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (2003) 8. Ribeiro, P., Santos-Victor, J.: Human activity recognition from video: modeling, feature selection and classiﬁcation architecture. In: International Workshop on Human Activity Recognition and Modeling, HAREM (2005) 9. Tran, D., Sorokin, A., Forsyth, D.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008) 10. Cao, D., Masoud, O., Boley, D., Papanikolopoulos, N.: Human motion recognition using support vector machines. Computer Vision and Image Understanding 113(10), 1064–1075 (2009) 11. Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299– 318 (2008) 12. Oliver, N., Rosario, B., Pentland, A.: A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 831–843 (2000) 13. Guerra-Filho, G., Aloimonos, Y.: A Language for Human Action. Computer 40(5), 42–51 (2007) 14. Quattoni, A., Wang, S., Morency, L., Collins, M., Darrell, T.: Hidden-state conditional random ﬁelds. IEEE Transactions on Pattern Analysis and Machine Intelligence (2007) 15. Cucchiara, R., Prati, A., Vezzani, R.: Making the home safer and more secure through visual surveillance. In: Symposium on Automatic detection of abnormal human behaviour using video processing of Measuring Behaviour, Wageningen, The Netherlands (2005) 16. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding 104(2-3), 249– 257 (2006) 17. Peng, B., Qian, G., Rajko, S.: View-Invariant Full-Body Gesture Recognition via Multilinear Analysis of Voxel Data. In: Third ACM/IEEE Conference on Distributed Smart Cameras (September 2009) 18. Srivastava, C., Iwaki, H., Park, J., Kak, A.C.: Distributed and Lightweight MultiCamera Human Activity Classiﬁcation. In: Third ACM/IEEE Conference on Distributed Smart Cameras (September 2009)

Self-adaptive Coordination for Organizations of Agents in Information Fusion Environments Sara Rodríguez, Belén Pérez-Lancho, Javier Bajo, Carolina Zato, and Juan M. Corchado University of Salamanca, Plaza de la Merced s/n, 37008 Salamanca, Spain {srg,lancho,jbajope,carol_zato,corchado}@usal.es

Abstract. Each organization of agents needs to be supported by a coordinated effort that explicitly determines how the agents should be organized and carry out the actions and tasks assigned to them. The interactions of a multi-agent system cannot be related only to the agent and their communication skills, if not that it's necessary to use the concepts of organizational engineering. This research presents a new global coordination model for an agent organization. The innovation of the model consists of the dynamic and adaptive planning capability to distribute tasks among the agent members of the organization as effectively as possible. Keywords: Multi-Agent systems, Virtual Organizations; Dynamic Architectures; Adaptive Environments.

1 Introduction Open MAS should allow the participation of heterogeneous agents with different architectures and even different languages [17][5]. The development of open MAS is still a recent field of the multi-agent system paradigm and its development will allow applying the agent technology in new and more complex application domains. However, this makes it impossible to trust agent behavior unless certain controls based on norms or social rules are imposed. To this end, developers have focused on the organizational aspects of agent societies, using the concepts of organization, norms, roles, etc. to guide the development process of the system. Virtual organizations [9] are a means of understanding system models from a sociological perspective. From a business perspective, a virtual organization model is based on the principles of cooperation among businesses within a shared network, and exploits the distinguishing elements that provide the flexibility and quick response capability that form the strategy aimed at customer satisfaction. Even so, within the development of organizations, both at the business and agent level, we find a set of requirements [15] that call for the use of new social models in which the use of open and adaptive systems is possible [17]. Given the advantages provided by the unique characteristics found in the development of MAS from an organizational perspective, and the absence of an adaptive E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 444–451, 2010. © Springer-Verlag Berlin Heidelberg 2010

Self-adaptive Coordination for Organizations of Agents

445

planning process for any social model, this study proposes a model that can coordinate a dynamic and adaptive planning system in an agent organization. The development of the model enables the use of information to improve the allocation tasks. The proposed notions will be validated via the development of an experimental system consisting of a small-scale fusion and planning model located at each of the participating agents. The article is structured as follows: Section 2 describes the state of the art for current studies of the agent organizations and its adaptation. Section 3 presents the proposed planning model. And finally, Section 4 demonstrates how the model can be used in a case study in an information fusion environments and some conclusions and experimental results.

2 Organizational Approaches There are several different organizational approaches[7][17]. However, while these studies provide mechanisms for creating coordination among participants, there is much less work focused on adapting organizational structures in execution time or norms defined in design time. For example, [12] proposes a model for controlling adaption by creating new norms. [10] propose a distributed model for reorganizing their architecture. [1] requires agents to follow a protocol to adapt the norms. Each of these studies focuses on the structure and/or norms based on adapting the coordination among participants. Another possibility is the development of a MAS that focuses on the concept of organization/institution. One electronic institution [8] should be considered a social middleware between the external participating agents and the selected communication layer responsible for accepting or rejecting the agent actions. The primary difference with the other proposals is that the adaption is carried out by the institution instead of by the agents. Lastly, there are approaches focus on social group mechanisms based on the social information gathered during the interactions [16]. None of these approaches is capable of coordinating tasks for the member agents of the organization to solve a common problem, nor do them consider that task planning should adapt to changes in the environment. The architecture selected for this study is OVAMAH [3][11], which focuses on defining the structure and norms. OVAMAH (Adaptive Virtual Organizations: Mechanisms, Architectures and Tools) is the evolution of architecture THOMAS (MeTHods, techniques and tools for Open Multi-Agent Systems) [6][11].The following section will present the planning model proposed integrated into OVAMAH whose goal is to carry out an adaptive planning process within an agent organization. The architecture is essentially formed by a set of services that are modularly structured. OVAMAH uses the FIPA architecture, expanding its capabilities with respect to the design of the organization, while also expanding the services capacity. OVAMAH has a module with the sole objective of managing organizations that have been introduced into the architecture, and incorporates a new definition of the FIPA Directory Facilitator that is capable of handling services in a much more elaborate way, following the service-oriented architecture directives. From a global perspective, the OVAMAH architecture offers a total integration enabling agents to transparently offer and request services from other agents

446

S. Rodríguez et al.

or entities, at the same time allowing external entities to interact with agents in the architecture by using the services provided.

3 Description of the Model In this research is proposed a planning model that facilitates a self-adaptation feature within an agent society. We will use a cooperative MAS in which each agent is capable of establishing plans dynamically in order to reach its objectives. The global mechanism considers the global objective of the society, as well as its norms and roles. It's obtained a planning model that can, within an architecture geared towards the development of agent organizations (OVAMAH [3][11]), take into account the changes that are produced within an environment during the execution of a plan. The planning process defines the actions that the society of agents will have to execute and should therefore also take into account the particular circumstances of each of its members. To achieve this, a CBP-BDI (Case Based Planning) agent is used, applying the planning model showed in this section, that is particularly suited for organizations. A CBP-BDI agent is a specialization of an CBR-BDI agent [6]. A CBP-BDI agent calculates the plan or intention that is most easy to replan: Most RePlannable Intention (MRPI). This is the plan that can most easily be replaced by another plan in case it is interrupted (for example, if a user changes preferences while the plan is being executed. A plan p within an organization is defined as p=<E, O, O’, R, R’>, where: E is the environment that represents the type of problem that the organization solves, and is characterized by a set of states E = {e0, e*} for each agent, where e0 represents the initial state of the agent when the plan begins, y and e* is the state or set of states that the agents tries to achieve. O represents the set of objectives for the individual agent and O´ is the set of objectives reached once the plan has been executed. R is the set of available resources for the given agent and R´ is the set of resources that the agent has used during the execution of the plan.

Fig. 1. Planning Model

Given the initial state of the organization, the term global planning is used to describe the search for a solution that can reach the final state, all the while complying with a series of requirements for the organization. The problem can be represented in

Self-adaptive Coordination for Organizations of Agents

447

a planning space that is delimited by the restrictions imposed by the requirements. Given a common objective, specified resources available and tasks to perform, the aim is to find a global plan that allows the organization to find the optimal solution, To this end, the planning agent should bear in mind the optimal plans p*(t) obtained for each individual agent. It is not necessary for all of the agents within the organization to know how to meet the objectives, but they should know how to perform some of the tasks that contribute towards reaching those objectives for the organization. Upon initiating the process, certain agents will be retrieved from the data memory of cases to perform at least one of the problem tasks. For each task that is not completed by any of the retrieved agents, at least one new agent will be incorporated. This agent will have the greatest probability of successfully completing the given task. The idea is to count on the necessary agents so that no task is left unassigned. Let us assume that the common objective for agents “m” has “n” states or tasks with m ,n ∈ . Each agent has its own characteristics with regards to which tasks it can perform, which resources to use, and the amount of time available to perform the tasks. In other words, each agent has its own profile. Given a state “j” for each agent “i” where i∈ {1, ,m }m∈ , it can be defined with a tuple zij - where each coordinate in the tuple refers to the characteristic that defines it. The following binary variables are defined as: ⎧1 if agent " i " is assigned to task " j " ai j = ⎨ 0 otherwise ⎩

For each problem related to assigning tasks, an objective function is defined whose goal is to minimize and maximize the cost used by agents “m” to perform the common objective. For example, minimize or maximize the cost of using one of the agents to reach an objective, or maximize an efficiency function as need for each case. A new efficiency function is introduced in order to assign tasks to the agents. Its aim is to visit the greatest number of points with the lowest possible cost. Cost is another function that depends on the time that agent “i” has spent working on task “j”, on the resources used, and on the type of agent assigned to each task. This is represented as: cit r .The efficiency function is defined as: Efficiency=Nº points visited / m n c i ia .Let

∑∑

ij ij

i =1 j =1

t i j ri j

ij

m n

us assume we want to maximize the efficiency function: Max·Nº points visited / ∑∑c it r iai j i =1 j =1

where

ti j

is the time it takes agent “i” to perform the task, and

ti jk

{ }

tij = Máx tijk k

ij ij

where

indicates the time it takes agent “i” to perform task “j” for tourist “k”. Taking the maximum value of “k” (type of tourist), we can ensure that the guide has time to perform the necessary task regardless of the type of tourist. These times are initially estimated. Let us now define the restrictions of the problem. 1. We want each state to be completed by an agent, which in mathematical terms m

can be stated, for each state “k” as:

aik = 1 ∑ i =1

∀k ∈ {1, ,n}

448

S. Rodríguez et al.

2. We want each state to be completed within a specified period of time. Let us t assume that state “k” should be completed within time k . The restriction would m

tikaik ≤ tk ∑ i

∀k ∈ {1, ,n} be:. =1 3. Each state “k” needs a set of resources to be executed. There is no reason for all of the agents to have these resources.

Given state “k”, we need {rkx} x∈{1, ,w} The variables

rhk

rw = m ax{rhk}h∈

k =1, ,n

resources with h ∈

∀k ∈ {1, ,n}

, where

.

are defined in binary form:

⎧1 if the agent " k " needs the resource " x " r =⎨ 0 otherwise ⎩ k x

The agent that performs state “k” must at the very least have at its disposable the resources that are needed to perform state “k”, for which, given state “k”, for each resource from the set m

∑ rixaik ≥ r

k x

tion:. i=1 binary variables:

{rkx}x∈{1,

,w}

∀k ∈ {1, ,n}

∀k ∈ {1, ,n} ;∀x∈ {1, ,w}

we can define the following restric-

. The variables

{r }

ix x∈ 1, ,w { }

∀i∈ {1, ,m }

are

⎧1 if the agent " i " has the resource " x " ri x = ⎨ 0 otherwise ⎩

4. Each agent “i” has a minimum and maximum time for work, depending on the type of agent. These times are represented as

tiTurn on

and

tiTurn off

respec-

n

t ≤ ∑ tij ≤ t ∀i∈ {1, ,m } j=1 For the majority of agents, as we will see in the tively: case study, the maximum number of working hours is equal to a regular 8 hour work day. 5. Every time we assign tasks to an agent, we want it to perform the minimum number of tasks, which varies according to the type of agent: Turn on i

n

aij ≥ Num berTask i ∑ j

Turn off i

∀i∈ {1,

,m }

. If the suggested problem of non-linear programming were incompatible, we would add agents to make it compatible. The agent added would be the one with the highest probability a priori of performing the necessary tasks. If a norm (restriction) changes, it would be necessary to assign tasks once again. This allows us to obtain a plan for the tasks that need to be performed by the agent organization. In other words, we can obtain a global plan composed of all the tasks and agents in the organization that will carry them out. Every agent in the organization recognizes the tasks that it needs to perform. These agents, which are CBP-BDI agents, integrate the 4 phases of a CBR system (retrieval, reuse, revise and retain). =1

4 Experimental System and Results This section presents a case study that tests the defined model. The case study delineates the scope and potential virtual organizations in the design and development

Self-adaptive Coordination for Organizations of Agents

449

of information fusion processors for deployment in multi-agents environments. An organization is implemented by using the model proposed in section 3 and is represented in a virtual world [14] containing a set of cultural heritage sites. The simulation within the virtual world represents a tourist environment in which there are guides and tourists, and in which the tour guide´s tasks will be performed in adherence to a defined set of norms. The roles that have been identified within the case study are: Tourist, Monument, Guide, Visitor, Coordinator, Notification and Manager: The agents that take on the role of Guide are those that will carry out dynamic planning according to the tasks they need to carry out for each group of tourists. The generated plans should ensure that all of the visitors assigned to a tour guide are able to follow their tourist route. They will be personalized according to the Guide´s profile and work habits, and should take into account the restrictions directly related to each agent on an individual basis, as well as the restrictions of the organization itself. These restrictions are imposed according to the norms for the society of agents: (i) the work schedule for a Guide agent (8 hours); (ii) the maximum number of Tourist agents assigned to a guide; (iii) visiting days and hours for certain monuments; (iv)the maximum number of Guide agents that can participate on a route; (v) the minimum number of points to visit on a route. Once the Coordinator has identified all of the agents in the organization that are needed to carry out the plan, it assigns each task to the agent responsible for completing it. At that moment each Guide agent becomes aware of its tasks and designs an individual plan. Each Guide agent is a type of CBP-BDI agent capable of providing efficient plans in execution time. The following paragraph provides a detailed example. Let

Eg = {e0g, ,ehg}

be the tasks carried out by a group of tourists and visitors “g”

{

}

E = ∪ E g= e0 , ,e n

g in order of priority. We have the following problem , where E represents the complete set of tasks that must be completed (for this reason they are not superscripted). Let us assume there are 10 guides. Randomly selecting a Guide i∈{1,● , 10}, (specifically, i=3), the task assignment according to their profile is: (1) Agent Task: Visit the cathedral with tourist group 2 ≡ e ; t =30 min. (2) Agent Task: Take tourist group 2 to the aqueduct ≡ e ; t =15 min. (3) Agent Task: Take tourist group 2 2 2 to the hermitage ≡ e 3 ; t =10 min. (4) Agent Task: Visit the hermitage ≡ e 4 ; t 34 =10 min. 2 1

2 2

31

32

33

t

(5) Agent Task: Take tourist group 2 to the Roman city ≡ e 5 ; 35 =20 min. (6) Agent 2 Task: Visit the Roman city ≡ e 6 ; t =30 min. (7) Agent Task: Take tourist group 2 to 2 the ravine ≡ e 7 ; t =50 min. (8) Agent Task: Hike along the ravine with group 2 ≡ e ; 2 t =20 min. (9) Agent Task: Return to the cathedral with group 2 ≡ e 9 ; t 39 =10 min. Calculating the assigned tasks ensures both that the total amount of time assigned to a Guide does not exceed 8 hours, and that any other restrictions corresponding to the norms of the organization are also respected. Each task has a set of objectives that must be met so that the global plan can be successfully completed. To perform each task, the Guide agent should have the number of available resources. For example, the task "Buy tickets for museum 1” corresponds to the objective “Visit museum 2

36

37

2 8

38

1”

≡ O0

and "breakfast, lunch, tea and dinner" correspond to the objective

450

S. Rodríguez et al.

≡ O2,4,6,7

(task 2 indicates breakfast, task 4 indicates lunch, task 6 indicates tea, and 7 indicates dinner). A similar coding is used for resources. As shown in Fig. 2a, value 1 indicates the resource that is needed or the objective to be met, while zero denotes the contrary. Fig. 2a shows the representation of a space ℜ for tasks according to the following three coordinates: time, number of objectives achieved, and number of resources used (coordinates taken from similar retrieved cases). Specifically, Fig. 2a shows a hyper plan of restrictions and the plan followed for a case retrieved from the beliefs base, considered to be similar to the new case. There are 120 possible routes, not all of which are viable because of the previously mentioned restrictions. In a simulated scenario where the Coordinator assigned this group of tourists to a Guide, the planning process used by the Guide for the tasks it needed to perform is the same as that shown in Fig 2a. 3

Fig. 2. Representation of a space ℜ for tasks (a) and replanned tasks (b). Number of agents working simultaneously (c). 3

Figure 2 illustrates the plan as it was carried out. To understand the graphical representation, let us focus on the initial task e1 and the final task e9. In between these two tasks, the Guide agent could carry out other tasks that would involve the same or different tourists and visitors. The idea presented in the planning model is to select the optimal plan, the one with the most plans surrounding it, as the solution. The following studies were carried out: Given the same tourists attractions to be visited on the same day, and the same number of tourists per group, one group used the planner and the other did not. The results for different days, as far as the number of Guide agents used, can be observed in Fig.2c. The color blue represents the average number of guides needed each day using the planner, and red the number without using it. The proposed model helps the organization utilize fewer guides, thus minimizing its costs. In conclusion, we can affirm to have achieved out stated objectives: (i) Develop agent societies; (ii) Simulate the behavior of an organization in a specific case involving the coordination and adaption of its agents; and (iii) Validate the proposed planning model through a simulation of the organization in a case study. As previously mentioned, it is increasingly common to model a MAS not only from the perspective of the agent and its communication capabilities, but by including organizational engineering as well. Acknowledgments. This work has been supported by the MICINN TIN 2009-13839C03-03 project.

Self-adaptive Coordination for Organizations of Agents

451

References [1] Argente Villaplana, E.: Gormas: Guías para el desarrollo de sistemas multi-agente abiertos basados en organizaciones. Ph.D. thesis, Universidad Politécnica de Valencia (2008) [2] Artikis, A., Kaponis, D., Pitt, J.: Dynamic Specications of Norm-Governed Systems. In: Multi-Agent Systems: Semantics and Dynamics of Organisational Models. IGI Globa (2009) [3] Carrascosa, C., Giret, C.A., Julian, V., Rebollo, M., Argente, E., Botti, V.: Service Oriented MAS: An open architecture (Short Paper). In: Decker, S., Sierra, C. (eds.) Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, May 10-15, pp. 1291–1292 (2009) [4] Castanedo, F., Patricio, M.A., García, J.M., Molina, J.: Data Fusion to Improve Trajectory Tracking in Cooperative Surveillance Multi-Agent Architecture? Information Fusion. An International Journal. Special issue on Agent-based Information Fusion 11, 243–255 (2010) [5] Corchado, E., Pellicer, M.A., Borrajo, M.L.: A MLHL Based Method to an Agent-Based Architecture. International Journal of Computer Mathematics 86(10, 11), 1760–1768 (2008) [6] Corchado, J.M., Glez-Bedia, M., de Paz, Y., Bajo, J., de Paz, J.F.: Concept, formulation and mechanism for agent replanification: MRP Architecture. In: Computational Intelligence. Blackwell Publishers, Malden (2008) [7] Dignum, V.: A model for organizational interaction: Based on agents, founded in logic, PhD. Thesis (2004) [8] Esteva, M.: Electronic Institutions: from specification to development. Ph. D. Thesis, Technical University of Catalonia (2003) [9] Ferber, J., Gutknecht, O., Michel, F.: From Agents to Organizations: an Organizational View of Multi-Agent Systems. In: Giorgini, P., Müller, J.P., Odell, J.J. (eds.) AOSE 2003. LNCS, vol. 2935, pp. 214–230. Springer, Heidelberg (2004) [10] Gasser, L., Ishida, T.: A dynamic organizational architecture for adaptive problem solving. In: Proc. of AAAI 1991, pp. 185–190 (1991) [11] Giret, A., Julian, V., Rebollo, M., Argente, E., Carrascosa, C., Botti, V.: An Open Architecture for Service-Oriented Virtual Organizations. In: Seventh International Workshop on Programming Multi-Agent Systems, PROMAS 2009, pp. 23–33 (2009) [12] Hubner, J.F., Sichman, J.S., Boissier, O.: Using the Moise+ for a cooperative framework of mas reorganisation. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 506–515. Springer, Heidelberg (2004) [13] Huhns, M., Stephens, L.: Multiagent Systems and Societies of Agents. In: Weiss, G. (ed.) Multi-agent Systems: a Modern Approach to Distributed Artificial Intelligence. MIT, Cambridge (1999) [14] http://repast.sourceforge.net (2009) [15] Rodríguez, S., Pérez-Lancho, B., De Paz, J.F., Bajo, J., Corchado, J.M.: Ovamah: Multiagent-based Adaptive Virtual Organizations. In: 12th International Conference on Information Fusion, Seattle, Washington, USA (July 2009) [16] Villatoro, D., Sabater-Mir, J.: Categorizing Social Norms in a Simulated Resource Gathering Society. In: Hübner, J.F., Matson, E., Boissier, O., Dignum, V. (eds.) COIN@AAMAS 2008. LNCS, vol. 5428, pp. 235–249. Springer, Heidelberg (2009) [17] Zambonelli, F., Jennings, N.R., Wooldridge, M.: Developing Multiagent Systems: The Gaia Methodology. ACM Transactions on Software Engineering and Methodology 12, 317–370 (2003)

Sensor Management: A New Paradigm for Automatic Video Surveillance Lauro Snidaro, Ingrid Visentini, and Gian Luca Foresti Department of Mathematics and Computer Science University of Udine Udine, Italy [email protected]

Abstract. In this paper we discuss the new paradigm of Sensor Management that could be taken into consideration for the design of next generation surveillance systems. The paradigm is meant to optimize the sensing capabilities of a system by taking into account the state of the environment being observed along with contextual information that can drive the choice of sensing modalities and platforms. We thus provide a brief account on how the Sensor Management concepts developed within the data fusion community could be applied in the design of the next generation of surveillance systems. Keywords: Data fusion; Video Surveillance; Sensor Management.

1

Introduction

Video surveillance systems have always been based on multiple sensors since their ﬁrst generation (CCTV systems) [1]. Video streams from analog cameras were multiplexed on video terminals in control rooms to help human operators monitor entire buildings or wide open areas. The last generation makes use of digital equipment to capture and transmit images that can be viewed virtually everywhere by using Internet. Initially, multi-sensor systems were employed to extend surveillance coverage over wide areas. The recent advances in sensor and communication technology, in addition to lower costs, allow to use multiple sensors for the monitoring of the same area [2,3]. This has opened new possibilities in the ﬁeld of surveillance as multiple and possibly heterogeneous sensors observing the same scene provide redundant or possibly improved data that can be exploited to improve detections accuracy and robustness, enlarging monitoring coverage, reducing uncertainty [2]. While the advantages of using multiple sources of information are well know to the data fusion community [4], the full potential of multi-sensor surveillance is yet to be discovered. In particular, the enrichment of available sensor assets has allowed to take advantage of data fusion techniques for solving speciﬁc tasks like for example target localization and tracking [5], or person identiﬁcation [6]. This can be formalized as the application of JDL Level 1 and 2 fusion techniques [4] to surveillance strictly following a processing stream that exploits multi-sensor E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 452–459, 2010. c Springer-Verlag Berlin Heidelberg 2010

Sensor Management: A New Paradigm for Automatic Video Surveillance

453

data to achieve better system perception performance an in the end improved situational awareness. A brief exempliﬁcation of the techniques that can be employed in Level 1 and 2 will be presented in Section 3. While many technical problems remain to be solved for integrating heterogeneous suites of sensors for wide area surveillance, a principled top-down approach is probably still left unexplored. Given the acknowledged increased complexity of architectures that can be developed nowadays, a full exploitation of this potential is probably beyond the possibilities of a human operator. Think for example to all the possible combinations of conﬁgurations that are made available by modern sensors: PanTilt-Zoom (PTZ) cameras can be controlled to cover diﬀerent areas, day/night sensors oﬀer diﬀerent sensing modalities, radars can operate at diﬀerent frequencies, etc. The larger the system the more likely it will be called to address many diﬀerent surveillance needs. A top-down approach would be needed in order to develop surveillance systems that are able to automatically manage large arrays of sensors in order to enforce surveillance directives provided by the operator that in turn translate the security policies of the owning organization. Therefore, a new paradigm is needed to guide the design of architectures and algorithms in order to build the next generation of surveillance systems that are able to organize themselves to collect the data relevant to the objectives speciﬁed by the operator. This new paradigm would probably need to take inspiration by the principles behind the Sensor Management policies foreseen by JDL Level 4 [7]. The JDL model Level 4 is also called Process Reﬁnement step as it implies adaptive data acquisition and processing to support mission objectives. Conceptually, this reﬁnement step should be able to manage the system in its entirety: from controlling hardware resources (e.g. sensors, processors, storage, etc.) to adjusting the processing ﬂow in order to optimize the behaviour of the system to best achieve mission goals. It is therefore apparent that the Process Reﬁnement step encompasses a broad spectrum of techniques and algorithms that operate at very diﬀerent logical levels. In this regard, an implemented full-ﬂedged Process Reﬁnement would provide the system a form of awareness of its own capabilities and how they relate and interact with the observed environment.

2

The JDL Fusion Process Model

Several fusion process models have been developed over the years. The ﬁrst and most known originates from the US Joint Directors of Laboratories (JDL) in 1985 under the guidance of the Department of Defense (DoD). The JDL model [8] comprises ﬁve levels of data processing and a database, which are all interconnected by a bus. The ﬁve levels are not meant to be processed in a strict order and can also be executed concurrently. Steinberg and Bowman proposed revisions and expansions of the JDL model involving broadening the functional model, relating the taxonomy to ﬁelds beyond the original military focus, and integrating a data fusion tree architecture model for system description, design, and development [9]. This updated model, sketched in Figure 1, is composed by the following levels:

454

L. Snidaro, I. Visentini, and G.L. Foresti

Fig. 1. The JDL data fusion process model

– Level 0 - Sub-Object Data Assessment: estimation and prediction of signal/object observable states on the basis of pixel/signal level data association and characterization; – Level 1 - Object Assessment: estimation and prediction of entity states on the basis of observation-to-track association, continuous state estimation (e.g. kinematics) and discrete state estimation (e.g. target type and ID); – Level 2 - Situation Assessment: estimation and prediction of relations among entities, to include force structure and cross force relations, communications and perceptual inﬂuences, physical context, etc.; – Level 3 - Impact Assessment: estimation and prediction of eﬀects on situations of planned or estimated/predicted actions by the participants; to include interactions between action plans of multiple players (e.g. assessing susceptibilities and vulnerabilities to estimated/predicted threat actions given ones own planned actions); – Level 4 - Process Refinement: adaptive data acquisition and processing to support mission objectives. The model is deliberately very abstract which sometimes makes it diﬃcult to properly interpret its parts and to appropriately apply it to speciﬁc problems. However, as already mentioned, it was originally conceived more as a basis for common understanding and discussion between scientists rather than a real guide for developers in identifying the methods that should be used [8]. A recent paper by Llinas et al. [7] suggests revisions and extensions of the model in order to cope with issues and functions of nowadays applications. In particular, further extensions of the JDL Model-version are proposed with an emphasis in four areas: (1) remarks on issues related to quality control, reliability, and consistency in data fusion (DF) processing, (2) assertions about the need for co-processing of abductive/inductive and deductive inferencing processes, (3) remarks about the need for and exploitation of an ontologically-based approach to DF process design, and (4) discussion on the role for Distributed Data Fusion (DDF).

Sensor Management: A New Paradigm for Automatic Video Surveillance

3

455

Data Fusion and Surveillance

While in the past ambient security systems were focused on the extensive usage of arrays of single-type sensors [10,11,12,13], modern surveillance systems aim to combine information coming from diﬀerent types of sources. Multi-modal systems [6,14], even more often used in biometrics, or multi-sensor multi-cue approaches [15,16] fuse heterogeneous data in order to provide a more robust response and enhance situational awareness.

Fig. 2. Example of contextualization of the the JDL scheme of Figure 1 to a surveillance scenario

The JDL model presented in section 2 can be contextualized and ﬁtted to a surveillance context. In particular, we can imagine a typical surveillance scenario where multiple cameras monitor a wide area. A concrete example on how the levels of the JDL scheme can be reinterpreted can be found in Figure 2. In the proposed example, the levels correspond to speciﬁc video-surveillance tasks or patterns as follows: – Level 0 - Sub-Object Data Assessment: the raw data streams coming from the cameras can be individually pre-processed. For example, they can be ﬁltered to reduced noise, processed to increase contrast, scaled down to reduce the processing time of subsequent elaborations. – Level 1 - Object Assessment: multiple objects in the scene (e.g. typically pedestrians, vehicles, etc.) can be detected, tracked, classiﬁed and recognized.

456

L. Snidaro, I. Visentini, and G.L. Foresti

The objects are the entities of the process, but no relationships are involved yet at this point. Additional data as, for instance, the map or sensitive areas are a priori contextual information. – Level 2 - Situation Assessment: spatial or temporal relationships between entities are here drawn: a target moving, for instance, from a sensitive Zone1 to another zone Zone2 can constitute an event. Simple atomic events are built considering brief stand-alone actions, while more complex events are obtained joining several simple events. Possible alarms to the operator are given at this point. – Level 3 - Impact Assessment: a prediction of an event can be an example of what, in practice, may happen at this step. An estimation of a trajectory of a potential target, or a prediction of the behaviour of an entity can be topic of this level. For instance, knowing that an object crossed Zone1 heading to Zone2 , we can presume it will cross even Zone3 according to the current trajectory. – Level 4 - Process Refinement: after the prediction given by Level 3, several countermeasures can be taken in this phase regarding all the previous levels. For instance, the sensors can be relocated to better monitor Zone3 , new thresholds can be imposed in Level 0 procedures, or diﬀerent algorithms can be employed in Level 1.

4

Sensor Management

The Process Reﬁnement part dedicated to sensors and data sources is often called Sensor Management and it can be deﬁned as “a process that seeks to manage, or coordinate, the use of a set of sensors in a dynamic, uncertain environment, to improve the performance of the system” [17]. In other words, a Sensor Management process should be able to, given the current state of aﬀairs of the observed environment, translate mission plans or human directives into sensing actions directed to acquire needed additional or missing information in order to improve situational awareness and fulﬁll the objectives. Sensor Management has been reasonably well-studied in the Data Fusion community but the focus of adaptive sensor control has largely been on improved estimation [18]. In the USA, a large DARPA program called Dynamic Tactical Targeting addressed Sensor Management at a high level but again was largely with a single objective in mind [19]. A ﬁve-layered procedure has been proposed in [17] and is reproduced in Figure 3. The chart schematizes a general sensor management process that can be used to guide the design of a real sensor management module. In the following, the diﬀerent levels will be contextualized in the case of a surveillance system. 4.1

Mission Planning

This level takes as input the current situation and the requests from the human operator and performs a ﬁrst breakdown of the objectives by trying to

Sensor Management: A New Paradigm for Automatic Video Surveillance

457

Fig. 3. Five-layered sensor managing process [17]

match them with the available services and functionalities of the system. In a surveillance system the requests from the operator can be events of interest to be detected (e.g. a vehicle being stationary outside a parking slot) and alarm conditions (e.g. a person trespassing a forbidden area). Each of the events should be given a priority by the operator. The Mission Planning module is in charge of selecting the functions to be used in order to detect the required events (e.g. target tracking, classiﬁcation, plate reading, face recognition, trajectory analysis, etc.). Actually this module should work in a way similar to a compiler, starting from the description of the events of interest expressed in a high level language, parsing the description and determining the relevant services to be employed. The module will also identify the areas to be monitored, the targets to look for, the frequency of measurements and the accuracy level. 4.2

Resource Deployment

This level identiﬁes the sensors to be used among the available ones. If mobile and/or active sensors are available their repositioning may be needed [20]. In particular, this level would take into consideration aspects such as coverage and sensing modality. For example, depending on the time of the day a certain event is to be detected a sensor may be preferred to another. 4.3

Resource Planning

This level is in charge of tasking the individual sensors (e.g. movement planning for active sensors [21,20]) and coordinating them (e.g. sensor hand-overs) in order to carry out a certain task (e.g. tracking). The level also deals with sensor selection techniques that can choose for every instant and every target the optimal subset of sensors for tracking or classifying it. Several approaches to sensor selection have been proposed in the literature such as, for example, information gain based [22] and detection quality based [5]. 4.4

Sensor Scheduling

Depending on the planning and requests coming from the Resource Planning, this level is in charge of determining a detailed schedule of commands for each sensor. This is particularly appropriate for active (i.e. PTZ cameras), mobile (e.g. robots) and multi-mode (day/night cameras, multi-frequency radar) sensors. The problem of sensor scheduling has been addressed in [23], and a recent contribution on the scheduling of visual sensors can be found in [24].

458

4.5

L. Snidaro, I. Visentini, and G.L. Foresti

Sensor Control

This is the lowest level and possibly also the simplest. The purpose of this level is to optimize sensor parameters given the current command imposed by Level 1 and 2. For video sensors this may involve regulating iris and focus to optimize image quality. Although this is performed automatically by sensor hardware in most of the cases, it could be beneﬁcial to manage sensor parameters directly according to some ﬁgure of merit which is dependent on the content of the image. For example, contrast and focus may be adjusted speciﬁcally for a given target. An early treatment of the subject may be found in [25], while a recent survey may be found in [26].

5

Conclusions

We discussed how the combination of heterogeneous data can lead to better situation awareness in a surveillance scenario. We have also presented a possible new paradigm regarding sensor management that could be taken into account in the design of next generation surveillance systems. In particular, sensor management should provide a principled way for exploiting sensory information in light of the contextual information available to the system. We believe that this new paradigm will drive the development of the next generation of surveillance systems. In this way, the new systems will include a form of awareness of their own capabilities and how they relate and interact with the observed environment. This will in turn provide better system perception performance and in the end improved situational awareness.

References 1. Regazzoni, C.S., Visvanathan, R., Foresti, G.L.: Scanning the issue / technology - Special Issue on Video Communications, processing and understanding for third generation surveillance systems. Proceedings of the IEEE 89(10), 1355–1367 (2001) 2. Snidaro, L., Niu, R., Foresti, G., Varshney, P.: Quality-Based Fusion of Multiple Video Sensors for Video Surveillance. IEEE Trans. Syst. Man, Cybern. B 37(4), 1044–1051 (2007) 3. Aghajan, H., Cavallaro, A. (eds.): Multi-Camera Networks. Elsevier, Amsterdam (2009) 4. Liggins, M.E., Hall, D.L., Llinas, J.: Multisensor data fusion: theory and practice. The electrical engineering & applied signal processing series, vol. 2. CRC Press, Boca Raton (2008) 5. Snidaro, L., Visentini, I., Foresti, G.: Quality based multi-sensor fusion for object detection in video-surveillance. In: Intelligent Video Surveillance: Systems and Technology, pp. 363–388. CRC Press, Boca Raton (2009) 6. Ross, A., Jain, A.: Multimodal biometrics: An overview. In: Proc. XII European Signal Processing Conf., pp. 1221–1224 (2004) 7. Llinas, J., Bowman, C.L., Rogova, G.L., Steinberg, A.N., Waltz, E.L., White, F.E.: Revisiting the JDL data fusion model II. In: Svensson, P., Schubert, J. (eds.) Proceedings of the Seventh International Conference on Information Fusion, Stockholm, Sweden, International Society of Information Fusion, June 2004, vol. II, pp. 1218–1230 (2004)

Sensor Management: A New Paradigm for Automatic Video Surveillance

459

8. Hall, D.L., Llinas, J.: An introduction to multisensor data fusion. Proceedings of the IEEE 85(1), 6–23 (1997) 9. Steinberg, A.N., Bowman, C.: Revisions to the JDL data fusion process model. In: Proceedings of the 1999 National Symposium on Sensor Data Fusion (May 1999) 10. Javed, O., Rasheed, Z., Shafique, K., Shah, M.: Tracking across multiple cameras with disjoint views. In: Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 952–957 (2003) 11. Valin, J.M., Michaud, F., Rouat, J.: Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robot. Auton. Syst. 55(3), 216–228 (2007) 12. Kang, J., Cohen, I., Medioni, G.: Multi-views tracking within and across uncalibrated camera streams. In: IWVS 2003: First ACM SIGMM international workshop on Video surveillance, pp. 21–33. ACM, New York (2003) 13. Monekosso, D., Remagnino, P.: Monitoring behavior with an array of sensors. Computational Intelligence 23(4), 420–438 (2007) 14. Jain, A., Hong, L., Kulkarni, Y.: A multimodal biometric system using fingerprints, face and speech. In: 2nd International Conference on Audio- and Video-based Biometric Person Authentication, pp. 182–187 (1999) 15. Liu, H., Yu, Z., Zha, H., Zou, Y., Zhang, L.: Robust human tracking based on multi-cue integration and mean-shift. Pattern Recognition Letters 30(9), 827–837 (2009) 16. Gavrila, D.M., Munder, S.: Multi-cue pedestrian detection and tracking from a moving vehicle. Int. J. Comput. Vision 73(1), 41–59 (2007) 17. Xiong, N., Svensson, P.: Multi-sensor management for information fusion: issues and approaches. Information fusion 3(2), 163–186 (2002) 18. Hero, A.O., Castan, D.A., Cochran, D., Kastella, K.: Foundations and Applications of Sensor Management. Springer Publishing Company, Incorporated, Heidelberg (2008) 19. Hanselman, P.B., Lawrence, C., Fortunato, E., Tenney, R.R., Blasch, E.P.: Dynamic tactical targeting. In: Suresh, R. (ed.) Battlespace Digitization and NetworkCentric Systems IV, July 2004. Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 5441, pp. 36–47 (2004) 20. Mittal, A., Davis, L.: A general method for sensor planning in multi-sensor systems: Extension to random occlusion. International Journal of Computer Vision 76(1), 31–52 (2008) 21. Denzler, J., Brown, C.: Information theoretic sensor data selection for active object recognition and state estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 145–157 (2002) 22. Kreucher, C., Kastella, K., Hero III, A.: Sensor management using an active sensing approach. Signal Processing 85(3), 607–624 (2005) 23. McIntyre, G., Hintz, K.: Sensor measurement scheduling: an enhanced dynamic, preemptive algorithm. Optical Engineering 37, 517 (1998) 24. Qureshi, F., Terzopoulos, D.: Surveillance camera scheduling: A virtual vision approach. Multimedia Systems 12(3), 269–283 (2006) 25. Tarabanis, K., Allen, P., Tsai, R.: A survey of sensor planning in computer vision. IEEE transactions on Robotics and Automation 11(1), 86–104 (1995) 26. Abidi, B.R., Aragam, N.R., Yao, Y., Abidi, M.A.: Survey and analysis of multimodal sensor planning and integration for wide area surveillance. ACM Computing Surveys 41(1), 1–36 (2008)

A Simulation Framework for UAV Sensor Fusion Enrique Martí, Jesús García, and Jose Manuel Molina Group of Applied Artificial Intelligence, Universidad Carlos III de Madrid, Av. de la Universidad Carlos III, 22, 28270 Colmenarejo, Madrid (Spain) [email protected], [email protected], [email protected]

Abstract. The design of complex fusion systems requires experimental analysis, following the classical structure of experiment design, data acquisition, experiment execution and analysis of the obtained results. We present here a framework with simulation capabilities for sensor fusion in aerial vehicles. Thanks to its abstraction level it only requires a few high level properties for defining a whole experiment. Its modular design offers flexibility and makes easy to complete its functionality. Finally, it includes a set of tools for fast development and more accurate analysis of the experimental results. Keywords: sensor fusion, simulation framework, unmanned air vehicle.

1 Introduction The research of fusion solutions to real-world complex problems is a time costly process, plagued by accessory tasks which demand a great effort to be done. The market offers powerful tools for accelerating some of the more generic parts, as data analysis or visualization. Nonetheless, as we focus in a more reduced and specialized field, it is common to find that one has to perform the expensive task of implementing its own tools. Counting with an effective piece of software really makes a difference. Apart from the time saving, having a good toolbox for data representation/visualization can suppose detecting an otherwise ignored problem, or knowing how to improve the analyzed algorithms. This paper presents a generic framework for experimentation on unmanned air vehicles (UAV) sensor fusion. Bearing in mind the way in which such a tool is used, the whole system has been implemented in MATLABTM to make it flexible, easily modifiable, as well as speeding up data visualization [1]. With illustrative purposes, this document shows its application to the multisensory navigation subsystem of a vehicle that is performing maneuvers related with air traffic management (ATM)[2]. Nonetheless, it can manage any other type of flight trajectories and even other kind of vehicles (such as maritime or terrestrial). The structure of the simulator will be reviewed, with special attention to the design and functioning details of each module. We will also present our simulation and experimentation methodology, illustrating the process with some of the figures generated for results analysis and validation. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 460–467, 2010. © Springer-Verlag Berlin Heidelberg 2010

A Simulation Framework for UAV Sensor Fusion

461

2 Simulator Architecture

SIMULATOR

The simulator is composed by three modules for data generation (see Fig. 1), plus an additional module for fusion algorithms and another one for performance evaluation. Data generation begins creating the specification of a vehicle trajectory. The resulting data is then feed to the aerodynamic model, generating the flight simulation. That process results in a set of values related with the dynamics of the UAV (such as position, attitude or accelerations), that can be easily used to synthesize realistic sensor measurements. The fusion module takes the outputs of the selected sensors and processes them sequentially using the desired technique among the available library of implemented algorithms: (Extended) Kalman Filters, Particle Filters, etc. Agustín.

Trajectory generation

Aerodynamic model

Sensor model 2

1 High level trajectory spec.

Low level trajectory spec.

Simulated flight data (ideal)

Sensor measurements for input trajectory

Performance evaluation

4 Analyzed algorithms (KF, PF…)

FUSION

3

Fig. 1. Framework schematic view. Specification for a desired trajectory (1), sensors measurement models (2), fusion process (3), performance evaluation (4).

2.1 Trajectory Generation This module is composed by several scripts and functions. They are focused on generating the input for the aerodynamic simulation from a high level specification of the trajectory. The framework is provided with a set of functions for simple maneuvers such as straight flight at constant speed or with longitudinal accelerations/decelerations, or turns around a single axis of the vehicle local coordinate system. These basic pieces can be combined and concatenated then to generate more complex trajectories. In the case of ATM trajectories, we have created scripts for typical scenarios as the

462

E. Martí, J. García, and J. M. Molina

racetrack (performed during the waiting time before landing of an aircraft in order to fit with the time scheduled. They have the shape of a hippodrome, a rectangle with two semicircles attached to its shorter sides (Fig. 2).

Z coordinate

Ideal Trajectory

200 150 100 50

-4000 -3500 -3000 -2500 -2000 -1500 -1000 -500 0

X coordinate

500 1000

0 200 400 600 800

Y coordinate

Fig. 2. 3D view of simulated racetrack+landing trajectory

The output of the system consists in six arrays of pairs instant-value. The three first arrays contain the forces in the body-fixed frame of reference of the vehicle ( , , ), which determine the translation. The three remaining are the moments , , ) in the same frame, which determine the vehicle rotations. ( 2.2 Aerodynamic Simulation The following step is the simulation of the vehicle dynamics. The selected model in our case has been, for the sake of simplicity, a rigid body with six degrees of freedom (6DoF in advance). Our reference implementation uses the aerospace MATLABTM Aerosim Aeronautical Simulation Block Set(1)[3][4], that provides a complete set of tools for rapid development of detailed 6DoF nonlinear generic aerial vehicle models and also a graphical view to check the behavior of system under test. This generic motion model can be substituted by a more detailed scheme. The only requirement for the replacement is to generate all the real data needed to synthesize the measurements on the next phase. In our example we store the position, speed, attitude (in quaternion and Euler angles), accelerations and angular rates of the body. This information is enough for simulating all the common sensors. As Fig. 3, the 6DoF dynamic model integrated both ideal segments to compose the simulated trajectory, and the simulated noisy sensor data (in its lower part). This is especially useful for online simulations, but is not recommended for experiments because the result will be sensitive to the particular generation of random noise injected in the data. The “real” data of the flight is stored in separate files for later use. This is useful for creating persistent datasets and for saving computation time, because the simulation of a flight is usually a costly process implying complex numerical operations. Even in the case of a standard 6DoF system, it is required to solve some differential equations at each time step –and typical time step resolution is in the order of a few milliseconds.

A Simulation Framework for UAV Sensor Fusion

463

Fig. 3. Simulation of ideal traajectory and IMU measures with MATLAB Aerospace blocksset

2.3 Generating Realistic Sensor Data It is commonly known thatt sensors do not provide perfect information because tthey suffer from different effectts such as inappropriate calibration, time drifts or interferences from external entitiees. Instead, their measures of real magnitudes are usuaally altered with an added rando om noise, and systematic effects as biases. Sensor measures can bee generated from “ideal” data quite straightforwardly, because all the aforementioneed effects can be subsequently incorporated through sim mple operations –for instance, ad ddition or matrix multiplication. We can find an example in Fig. 4, where the ideal flight altitude (ascending linee of crosses in the bottom part) is taken as starting point for simulating the measure oof a barometric altimeter. The process consists in three consecutive steps: • Simulate noise, by pertturbing each value with a random sample drawn from m a Gaussian distribution. Th he result is shown in hollow circles, also in the bottom. • Add a constant value to mimic the altimeter bias. The bias is an eff ffect mainly caused by differeences between the sea level atmospheric pressure assum med by the device and its reeal value. The hollow triangles in the upper part are the result. • The last effect to be sim mulated is the quantization step. Altimeters based in a baarometer do not provide co ontinuous measures: output values are quantized (here, the step is 50 meters). The fiinal measures are the solid squares. Thus, the reason for separaating flight simulation from sensor measurement syntheesis is quite simple: it simplifiees the production of several generations of measuremeents for the same trajectory, and d swapping among different sensor models.

464

E. Martí, J. García, an nd J. M. Molina

Fig. 4. Example of measurem ment generation using the model of a barometric altimeter. The starting point is the original ideeal flight altitude.

The implemented framew work applies the noise model of each sensor to every avvailable sample of unaltered data, d disregarding its temporal resolution. The produuced values are individually marrked with a timestamp, resulting in a very dense (and unrealistic) set of measures. This T will allow later selection and tuning of the sensor update ratio, and the emulation n of more advanced effects such as measurement loss. Each sensor model is implemented as a separate function. It receives the iddeal flight data and the parametters of the measurement model, and returns the simulaated values.

3 Sensor Fusion After generating all the neccessary data, experiments can be performed. The first sstep consists on defining the seet of available sensors and their features (including nooise model and update ratio), the t fusion architecture and the concrete algorithms too be used. Once the architecture is defined [5], the great majority of the experiments cann be run using a fixed scheme. All our experiments have been implemented over a ffew script templates. The next step s is to configure the fusion algorithms. One of the avvailable tracking techniques haas to be selected and configured to make use of the seleccted inputs. The result at each tim me step is registered together with its timestamp. The integration of GPS S with inertial sensors attract a considerable numberr of research attention [6]. Plen nty of classical and advanced techniques to increase the robustness have been applieed, such as unscented Kalman, particle filters or soft coomputing paradigms [7][8][9]. There are many available algorithms available for thheir direct integration in Matlab b. At this moment, our framework includes Kalman Fillter, Extended Kalman Filter, Unscented Kalman Filter and Particle Filter [10][11]. As the references show, we have adapted a existing libraries to work with our framework, such that its inclusion in the codee consists in a few lines.

A Simulation Framework for UAV Sensor Fusion

465

Once the whole trajectory has been filtered, we obtain a different interpretation of the flight trajectory. It can be directly compared with both real data and sensor measurements because the three sets provide values for the same sequence of time instants. Back to our illustrative example, we have performed several experiments of interest, as shown in the next subsections. A centralized processing architecture is the selected option for all the single-vehicle problems, given the coupling requirements to estimate sensor corrections together with trajectory parameters: a single algorithm of loosely coupled type will track the whole UAV state using the information from all the available sensors. Typically, it is interesting to generate the measures in the same script the fusion is performed, because it allows to experiment also with sensor features, as the noise models to be used. One of the problems of dealing with noisy data is that a certain scenario can be particularly favorable (or unfavorable) for the applied algorithm, leading to non representative results. To avoid this, the whole trajectory is not filtered once per experiment. We will follow a Monte Carlo approach instead. This means that for the same trajectory, several sets of measures will be generated. The noise used in the synthesis of each set of measures is, by definition, random and different on each generation, so the final set will be unique. The random number generator seed can be stored for assuring experiment reproducibility. With the simulated data sets, the Monte Carlo methodology allows for different experiments which can be run in order to perform rigorous statistical analysis on the output (root mean squared errors, t-test for performance comparison, integrity analysis, etc.).

4 Results Analysis and Validation Evaluating the performance of a solution can be complex task. In order to facilitate it, our framework provides tools for supervising the process while it is executed, and to analyze the results one the trajectory has been filtered. As an example of a tool of the first category, Fig. 5 shows a 3D plot obtained for a Particle Filter (PF) during the tracking of a trajectory using a GPS, an accelerometer and a gyroscope as sensors. Each particle (the faded cloud in the right of the figure) is drawn as an arrow, with color intensities and sizes directly proportional to the weights of the particles. Note how only a few particles in the bottom are considered important after the last GPS measure is received (just over the X axis, near the -3590 meters mark), while the vast majority of the population is represented in a very light pale tone. The estimation of the filter is the wide arrow with triangles in the extremes. Intermediate figures of this kind have multiple uses, such as visualizing with high detail on each step to diagnose the causes of a previously detected problem. For instance, we can supervise if the resampling stage is introducing enough variability in the population of particles. The overall performance of a certain solution, however, can only be evaluated after the experiment is finished. MATLABTM makes very easy the calculation of different statistics and plotting the desired variables. The real dare is to select the appropriate quality indicators. Next figures are some examples obtained using our framework.

466

E. Martí, J. García, an nd J. M. Molina

Fig. 5. Auxiliary plot to help visualizing current system state during fusion process

p estimation accuracy of a filtering algorithm againstt the Fig. 6. Comparison between position raw GPS error. Gyroscope output shows context about turns and straight segments. estimated gyroscope bia ases

estimated gyroscope biases 0.4

0.06

roll pitch yaw

0.04

roll pitch yaw

0.3

angular rate bias (rad/s)

angular rate bias (rad/s)

0.2

0.02

0

-0.02

0.1 0 -0.1 -0.2

-0.04 -0.3

-0.06 50

100

150 time (s)

200

250

300

-0.4 0

50

100

150 time (s)

200

250

3 300

Fig. 7. Gyroscope bias estimaation for an EKF- Fig. 8. Unstable gyroscope bias estimaation for an EKF-based solution of a GPS+IIMU based solution of GPS+IMU fu usion fusion

5 Conclusions A framework for experimen nting on sensor fusion has been presented in this document. Apart from detailing its strructure, we have shown how it can be used for creatinng a

A Simulation Framework for UAV Sensor Fusion

467

whole experiment, starting with the generation of a simulated flight trajectory and ending with graphical descriptions of the results. Using this software, we have reduced the implementation time of an already designed experiment to just a few minutes. Evaluating how a change in a variable affects the result is trivial, as well as changing the value of any set of configuration parameters. If required, the functionality can be completed by adding new algorithms, measure models or trajectory definitions.

Acknowledgements This work was supported in part by Projects ATLANTIDA, CICYT TIN2008-06742C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, SINPROB, CAM CONTEXTS S2009/TIC-1485 and DPS2008-07029-C02-02.

References [1] Gade, K.: NAVLAB, a Generic Simulation and Post-processing Tool for Navigation. European Journal of Navigation 2(4), 51–59 (2004) [2] Rodriguez, A.L., et al.: Real time sensor acquisition platform for experimental UAV research. In: IEEE/AIAA 28th DASC 2009, pp. 5.C.5-1–5.C.5-10 (October 2009) [3] Aerospace Toolbox - MATLAB. The MathWorks, http://www.mathworks.com/products/aerotb/ (Cited: 03 15, 2010) [4] Kurnaz, S., Cetin, O., Kaynak, O.: Fuzzy Logic Based Approach to Design of Flight Control and Navigation Tasks for Autonomous Unmanned Aerial Vehicles. Journal of Intelligent and Robotic Systems 54(1-3), 229–244 (2009) [5] García, J., et al.: Data fusion architectures for autonomous vehicles using heterogeneous sensors. In: 1st ESA NAVITEC. Noordwikj, Holland (December 2006) [6] Wagner, J.F., Wienekeb, T.: Integrating satellite and inertial navigation—conventional and new fusion approaches. Control Engineering Practice 11(5), 543–550 (2003) [7] van der Merwe, R., Wan, E., Julier, S.: Sigma Point Kalman Filters for Nonlinear Estimation and Sensor Fusion: Applications to Integrated Navigation. In: AIAA Guidance, Navigation and Controls Conference, Providence, USA (August 2004) [8] Crassidis, J.: Sigma-Point Kalman Filtering for Integrated GPS and Inertial Navigation. IEEE Trans. on AES 42(2) (April 2006) [9] Chiang, K.W., Huang, Y.W.: An intelligent navigator for seamless INS/GPS integrated land vehicle navigation applications. Applied Soft Computing 8(1), 722–733 (2008) [10] Hartikainen, J., Sarkka, S.: Optimal filtering with Kalman filters and smoothers-a Manual for Matlab toolbox EKF/UKF (2007), http://www.lce.hut.fi/research/mm/ekfukf/ [11] Chen, L., et al.: PFLib - An Object Oriented MATLAB Toolbox for Particle Filtering. Department of Statistics - Colorado State University (2007), http://www.stat.colostate.edu/~chihoon/ paper-6567-25-revised.pdf (Cited: 03 14, 2010)

An Embeddable Fusion Framework to Manage Context Information in Mobile Devices Ana M. Bernardos, Eva Madrazo, and José R. Casar Telecommunications Engineering School, Technical University of Madrid, Av. Complutense 30, 28040, Madrid, Spain {abernardos,eva.madrazo,jramon}@grpss.ssr.upm.es

Abstract. Conveniently fused and combined with data from external sources, information from sensors embedded in a mobile device may offer a dynamic view of the user’s situation, sufficient to build adaptive context-aware services. In order to shorten the development cycle of these applications, an embeddable framework to acquire, fuse and reason on context information is hereby described. ‘CASanDRA Mobile’ is designed to work autonomously in resourceconstrained devices, offering to application developers a transparent management of context information. Based on a service-oriented architecture implemented in mobile OSGi, it offers a scalable infrastructure of bundles which decouple context acquisition and automate context inference from application development. ‘CASanDRA Mobile’ aims at providing the user with full control on his private context data, by using privacy policies suitable to handle P2P context sharing. To exemplify how to use the framework features, the design procedure for a context-aware wellness application is described. Keywords: Context-aware system; data fusion; mobile reasoning; activity recognition; mobile framework.

1 Introduction Since 1994, when Schilit and Theimer coined the term ‘context-awareness’ [1], a good number of architectures to handle context information have been described in literature [2]. Most of these proposals, which aim at decoupling the process of data acquisition from application development, base their performance on the existence of a centralized infrastructure-based module capable of fusing data to extract context information. Mobile devices are increasingly equipped with better sensing, processing and storage capabilities, so it is possible to design a light infrastructure-independent middleware to infer context information. Light context-aware systems enabling distributed peer-to-peer context-sharing provides a important feature to implement the ‘Internet of Things’ concept, by which different types of objects and devices are able to opportunistically communicate among them. It is important to note that this autonomous approach of mobile computing is not opposed to the ‘cloud computing’ trend, but complements it (e.g. by exploiting short-distance data connections and device off-line autonomous capabilities). E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 468–477, 2010. © Springer-Verlag Berlin Heidelberg 2010

An Embeddable Fusion Framework to Manage Context Information

469

Following we describe our embeddable framework to provide Context Acquisition Services anD Reasoning Algorithms, from now on called ‘CASanDRA Mobile’. This middleware is the result of a learning process starting with the design and development of our infrastructure-based system for context-awareness, CASanDRA [3], which implements the fusion architecture described in [4]. Of course, the light version of the system present different challenges, but it is capable of offering a significant collection of features already considered in CASanDRA, while including domain-specific ones. CASanDRA Mobile bundle a set of standard services, conceptually layered (from data acquisition to high-level fusion), but functionally sharing the same communication and inference resources. The developer can use context information at different levels of abstraction, as the middleware provides APIs to access all the available data. The middleware, developed by using an implementation of a Service Oriented Architecture [5], is prepared to handle P2P context-sharing and, in future, it will provide Quality of Context management in order to auto-regulate its performance. Section II reviews previous approaches to mobile computing middleware. Section III lists CASanDRA Mobile’s standard functionalities and justifies our choice for a SOA-based implementation. Section IV describes the middleware itself and Section V explains a practical implementation of a context-aware application using CASanDRA Mobile’s activity inference capabilities. Finally, Section VI considers open issues which need to be addressed in further developments.

2 A Review of Light Architectures for Context Management Numerous context-aware frameworks have been described in literature during the last decade; some of them are autonomous embeddable systems, designed to be installed in mobile resource-constrained devices. Following there is a chronological short review of some of these proposals; their features, architectural approaches and learned lessons have inspired the current design of CASanDRA Mobile. Released in 2003 (when developments such as LIME, XMIDDLE, or Mobiware had already addressed general aspects of middleware development for mobile computing), MobiPADS [6] appears as one of the first designs considering context-awareness in mobile middleware. The platform enables active service deployment and dynamic service composition depending on the context, in order to optimize – in terms of resources – the overall operation of mobile applications. MobiPADS is composed by two parts: the system components, providing essential management services for deployment and configuration, and the service space, where a series of mobilets (MobiPADS’ services) can be chained to provide the applications with aggregated functionalities. Mobilets access the components through the mobilet API, which also have interfaces to allow communication and configuration of the system’s components. MobiPADS software claims to be reflective. Reflection [7] as design paradigm is also considered to build CARISMA [8]. This middleware offers customized services to applications by using policies, which depend on the context configuration. The middleware provides applications with an API (meta-interface) to inspect and alter the middleware behavior, as encoded in application profiles. CORTEX [9] focuses on the configuration of a distributed network for context management. The system is composed of sentient objects, independent software components capable of acquiring context data and performing inference. Information sharing among neighbors and dynamic resource discovery are two of its important features. CASanDRA Mobile also bases is architecture on the composition of dynamic software

470

A.M. Bernardos, E. Madrazo, and J.R. Casar

structures, adopting the ‘reflection’ paradigm: the middleware is able to manage and change its internal composition to provide the applications on top of it with the information they need, while minimizing its memory footprint and resource consumption. The ReMMoC proposal [10] focuses on dealing with the platform heterogeneity problem by proposing a web services-based reflective middleware that allows mobile clients to be developed independently of both discovery and interaction mechanisms. In the same direction, the Obje infrastructure [11] uses abstract models common to every object/device in the network, which expose information about how a device can connect to another, provide metadata about itself, be controlled or provide references to other devices. CASanDRA Mobile extends these concepts to its components, which can be automatically discovered in the system. The middleware can then dynamically create and manage the life cycle of customized aggregation services, ready perform unplanned tasks for context-aware applications. The ContextPhone [12] is a prototyping platform running on mobile phones using Symbian OS. It consists of four interconnected modules: Sensors, Communications, Customizable applications (which can seamlessly augment or replace built-in applications) and System Services. The architecture mirrors the widget approach of the wellknown Context Toolkit, using a publish-subscribe model for its components. It is possible to add new data types for context extension at compile time. Citron [13] presents an alternative for internal data sharing. The framework, conceived to be fully operative in the mobile device, uses a blackboard approach to handle information tuples gathered from ‘workers’ (components which handle access to sensors). CASAnDRA Mobile opts by the publish-subscribe model, implemented by using the standard features of a SOA platform. MADAM [14] employs component framework to design applications that can be adapted by reconfiguration; an application is assembled from a recursive structure of component frameworks. The middleware is based on extended goal policies expressed as utility functions, leaving the system to reason on the actions required to implement those policies. For MADAM, it is necessary to describe the application structure, the application’s variability and distribution aspects, the properties of each variant and the utility functions for comparing variants. MARKS [15] faces context management in ad-hoc networks. MARKS’ authors claim to incorporate some unexplored attributes – such as ‘knowledge usability’, ‘resource discovery’ and ‘self-healing’ to a pervasive middleware for mobile devices, in order to optimize the use of physical resources but to ensure security and privacy too. MARKS is composed of core components and services: the former include the object request broker, resource discovery, trust management and universal service access unit. CASanDRA Mobile brings concepts such as ‘reflection’, ‘resource discovery’, ‘service chain’, ‘privacy policies’ or ‘event-based architectures’ together. Additionally, it aims at managing Quality of Context by using bottom-up probabilistic methods. As far as we know, it is the first attempt to benefit from the use of a Service Oriented Architecture [5] based on mobile OSGi (mOSGi), highly modularizable and dynamically configurable in real time operation. mOSGi enable the design of a middleware capable of handling context asynchronous (event-based) communications and supporting off-line performance.

An Embeddable Fusion Framework to Manage Context Information

471

3 CASanDRA Mobile: Functionalities and Approach CASanDRA Mobile aims at offering a set of standard off-the-shelf features for the development of context-aware applications, in order to accelerate the application design and development life cycle. The framework, thought to be easily and modularly scalable over the time, is component-based and infrastructure-independent. It is ready to maximize its functionalities even when no data connection with external infrastructures is available. For this reason, the middleware needs to offer an off-line functional alternative to services which may need infrastructure support (e.g. indoor positioning systems). CASanDRA Mobile can be used by both native and in-thecloud applications; it is the developer who determines which kind of context information the application needs to handle. The middleware includes P2P capabilities, mainly driven to make possible context information sharing among different devices equipped with CASanDRA Mobile. Context data will be labeled depending on privacy policies controlled by the user; as a consequence, the P2P sharing functionality will handle and expose context data taking into account the privacy restrictions. The framework will have internal procedures for handling the quality of context it manages, in order to optimize the acquisition and processing procedures while considering the computational and resource costs. The applications will be aware of the quality of the information they receive, being able to adapt their behavior depending on the accuracy, up-to-dateness and other QoC features. This implies that uncertainty needs to be controlled over all the context composition lifecycle in a coordinated and reliable way. The framework will deliver access-to-sensor and fusion features, but also reasoning tools for automatic processing of complex context information. This means that an inference engine will be available inside the framework. Initially, this inference engine will be assimilated to a rule-base engine, but the final objective for CASanDRA Mobile is to expand its capabilities and manage elaborated data models (e.g. light ontologies). CASanDRA Mobile aims at adapting its components’ behavior at runtime, starting, stopping and hibernating components when needed. Real-time scalability (dynamic discovery of context sources) and easy maintenance are also key features. This means that, for example, hot installation and remote update of components (to be done without restarting the framework) will be possible. Finally, components are to be weakly integrated, allowing modular and independent software development. Considering these requirements, we have opted to implement the architecture of CASanDRA Mobile over a SOA implementation (mOSGi). Service Oriented Architectures handle ‘services’ as software units and use them to implement the key concepts of ‘visibility’, ‘interaction’ and ‘effect’. As defined in the standard [5], a ‘service’ needs to be able to perform work for another, specifying the work offered for another and also offering to another to perform the work. For this reason, services need to have interfaces to be externally invoked, and to publish their functionalities for applications to use them. In SOA architectures, modularization improves the reusability of software components and makes parallel development and testing

472

A.M. Bernardos, E. Madrazo, and J.R. Casar

simpler (each service may be independently developed and mock-uped when not available). With respect to its core functionalities, mobile OSGi enables automatic service registration and component management, and allows hot deployment of new services, not requiring a stop-start procedure when updating the service offering.

4 Description of CASanDRA Mobile Middleware 4.1 Introduction to CASanDRA Mobile’s Design The architecture of CASanDRA Mobile is composed by three building blocks - Acquisition Layer, Context Inference Layer and Core System. The Acquisition Layer decouples the access to embedded and external sensors from upper processing levels by using software ‘Sensors’, which deal with low-level hardware information retrieval. The Context Inference Layer gathers a number of ‘Enablers’ - modules that process data coming from ‘Sensors’, fuse them, and infer complex context parameters. Finally, the Core System provides several features to integrate these components in the middleware, such as discovery and registry management of new elements and some common utility libraries. Both ‘sensors’ and ‘enablers’ publish their output data in the middleware through an event manager. The main difference between these two types of components is that the formers only act as data providers, not being ready to consume data from other components. Applications run on top of CASanDRA Mobile middleware, consuming context information provided by Enablers and Sensors and using its standard features. Regarding information retrieval, CASanDRA Core offers a set of APIs to handle ‘subscription-based’ and ‘on-demand’ information queries. In the first case, a component (or an application) can subscribe to a set of context parameters: this method makes possible to receive periodic updates for the selected context data via asynchronous communications (events). Consumer elements are able to configure context patterns in order to combine and filter notification events. CASanDRA Mobile middleware is implemented in Java; it runs on a mobile OSGi platform based on J9 inside a Windows Mobile (WM) device. OSGi is a Dynamic Module System for Java, handling modules called ‘bundles’, cohesive, selfcontained units, which explicitly define their dependencies to other modules/services and their external service API. OSGi improves encapsulation and reusability, simplifies the implementation of a modular system, and provides a very useful diversity of standardized optional services including the logging service or the eventAdmin service to manage events. Additionally, several implementations of mOSGi frameworks are available for mobile operative systems such as Symbian or Android, so CASanDRA Mobile concept can be flavored to cover other type of devices beyond WM. 4.2 CASanDRA Mobile’s Components Fig. 1 shows the general architecture of CASanDRA Mobile and its APIs.

An Embeddable Fusion Framework to Manage Context Information

MEASURE NAME

SUBSCRIPTOR

BT

Registry

BBDD / History

API

Communications Manager

SubscriptionManager

ContextManager

MEASURE NAME

473

COMPONENT

COM

Privacy Manager

HTTP

͙

Logging

Discovery

Component Manager

CORE SYSTEM Sensor events

API

API

Context events

Context events

SENSOR SENSOR ͙ Accel. GPS

Inference Engine

ACQUISITION LAYER

Context ͙ Enablers

P2P Context Sharing

Performance Manager

CONTEXT INFERENCE LAYER API

Fig. 1. Software components

CASanDRA Core System. CASanDRA Mobile main modules are encapsulated together in the Core System bundle, which controls the components’ integration and their lifecycle and facilitates. In brief, the Core System is composed by: 1. The Context Manager, the main module in the Core bundle, stores and manages the publication of context parameters. It controls when a subscription is done or removed and consequently asks the Component Manager to initiate or stop the needed components. Components are started and stopped in a lazy manner, that’s to say that the middleware only starts a component when necessary and stops it when it is not needed by any of the components in the middleware. The objective is to adapt the structure of the framework to the consumer application needs, keeping the component deployment as simple as possible, in order to improve the middleware performance. This module provides an API allowing the access to stored measures/context parameters and another to require on demand measures. Context information is stored in name-value tuples aggregated to compose measure objects (for instance, the three axis acceleration values compose an acceleration ‘measure’). This facilitates the management and the storage of measures in an objectoriented database (db4o). 2. The Component Manager manages the components life cycle according to the needs of active applications. It is able to start, stop or configure components under the supervision of the Context Manager. 3. The Subscription Manager allows components and applications to subscribe to a measure/context parameter. It is aware of every active subscription (having information about the subscribed component/application), and provides an API to retrieve subscription data. 4. The Registry gathers all the available context measure names, together with the component that publishes each measure. The Registry API allows components and applications to ask for available measures/context parameters at any moment. 5. The Discovery module listens to new components registration queries and adds these components to the Registry. Components must use some special parameters

474

A.M. Bernardos, E. Madrazo, and J.R. Casar

when registering, so the discovery module can effectively make them available in the middleware. 6. The Communications Manager centralizes the access to available communication interfaces in the mobile device: every component needing a Bluetooth connection, COM ports, or HTTP connections, will use this library. It also includes additional features; for instance, the Bluetooth Manager performs the periodic search for new near devices, and provides an API used by the P2P Context Sharing for this purpose. 7. The Logging Module includes some basic logging facilities. 8. The Privacy Manager controls the access to the user’s private data. It stores credentials and privacy profiles, and manages application authorizations. This is especially useful in order to share context parameters with other devices via P2P context sharing. CASanDRA Core is packaged in a single OSGi bundle. This core API is sufficient to develop sensor components, enablers and applications, and to make them work together. CASanDRA Mobile Sensors and Enablers. ‘Sensors’ and ‘Enablers’ are CASanDRA’s components. A component is a bundle implementing the ComponentInterface interface, and offering this interface as a service by registering itself in the middleware. When a component is started, it subscribes to all the components providing the needed data to compute its output; automatically, the middleware starts every component in the middleware that generate those data. This chain reaction ends with the startup of the sensors that directly acquire raw data from hardware. CASanDRA Mobile aims at providing a complete set of ‘sensors’, software pieces accessing in a customizable way the sensing resources embedded in (or attached to) the mobile device, but also managing connections to retrieve data from external sensors which may be connected through Bluetooth or ZigBee interfaces. For example, CASanDRA Mobile offers sensing modules to retrieve a) acceleration data from an embedded inertial system, b) the received signal strength indicator of a WiFi connection (in order to enable localization algorithms) or c) biometric data from an external BT oxymeter. ‘Enablers’ performs data fusion at different levels, from signal to situation and impact fusion. The middleware currently includes a P2P Context Sharing Enabler, which listens to other devices executing CASanDRA middleware, in order to share with them context parameters (P2P context sharing makes possible to enhance the context image locally treated in each device) and a group of general Context Enablers, which e.g. includes a Location Broker - to fuse position information coming from different Location Enablers in order to provide seamless position estimation- or an Activity Enabler – working on acceleration data to infer activity estimates. Two additional enablers offer horizontal functionalities. On one hand an Inference Engine based on the rule engine 3aplm offers an API to configure rules to be executed when needed, providing with reasoning capabilities to external components. Applications and internal components may subscribe to receive events from the Inference Engine. On the other hand, the Performance Manager watches variables such as the available memory or the number of working threads, and generates context parameters in relation to the system state.

An Embeddable Fusion Framework to Manage Context Information

475

Every sensor or enabler registers the measure or context parameter it provides when initialized by using a ‘name’. These names are encouraged to follow a coherent taxonomy for improving the use of patterns in subscriptions. For instance, all the components publishing a location context parameter should name it ‘location.X’ so that any component can subscribe to ‘location.*’ and receive every location context parameter available (e.g. location.gps.internal, location.wifi, location.bluetooth).

5 Deployment of an Application on Top of CASanDRA Mobile 5.1 Service Scenario: A Wellness Application to Control Sedentary Behavior The application to develop on top of CASanDRA Mobile is a native context-aware wellness application, which aims at persuading the user to increase his physical activity in order to avoid or minimize sedentary behaviors. Each hour, the application evaluates the user’s activity level, and makes a positive or negative verdict. This balance is visually communicated to the user: the scenery serving as wallpaper for the application turns into a dark landscape each day at midnight, which progressively evolves to a greener scene if the user’s activity level is adequate. Additionally, the application triggers context-aware notifications in order to help the user to increase its activity when low levels are detected. The application needs to be aware of the user’s daily activity, assumed to be a combination of ‘atomic activity estimates’ (at rest-walking-running), covered walking distance and time. The user is not wearing any external device but his personal mobile phone, ideally located in a chest-pocket in the user’s shirt. This simplified scenario makes possible to estimate the user’s movement through the accelerometer in the mobile device. Activity information is combined with GPS data when outdoors and with a WiFi-based location estimates when indoors. 5.2 Development Methodology and Application’s Design CASanDRA Mobile can be easily improved at the same time that applications are built, if some general guides are taken into account when designing and developing the application’s building blocks. An important recommendation is to develop every enabler/sensor/application in a separate bundle: this allows specialization, division of labor, component reusability and parallel design and testing life-cycles. Therefore, when building an application, the developer has to define the bundles –sensors, enablers and application bundles- needed to have a full-featured application, and match his needs with the services available in CASanDRA Mobile. For the proposed scenario, three sensors are needed: ‘accelerometerSensor’, ‘wifirssSensor’ and ‘GPSSensor’. Then, it is necessary to process sensor data in order to infer the context feature ‘activity’. The ‘activityEnabler’ will be able to discriminate among a set of simple activities such as ‘at rest’, ‘running’ or ‘walking’, by computing the variance of the acceleration signal in each axis (x, y, z). The ‘activityEnabler’ will be in charge of publishing the activity estimation, in order to be used by the consumer components. The ‘wifirssSensor’ retrieve signal strength data from the WiFi network, for the ‘wifiLocationEnabler’ to process them. The ‘locationBrokerEnabler’ gathers both GPS data and WiFi estimates to provide seamless location information. The application will be able to use the Inference Engine’s API to configure the specific behavioral patterns’ checking, in order to make the activity evaluation easier. For example, the application may be interested in getting a notification from the Inference

476

A.M. Bernardos, E. Madrazo, and J.R. Casar

Engine if no activity and no position change is detected during two hours in a working day. The logic on how to deal with this situation still remains in the application side, which may decide to send an email to the user with some healthy advices in order to a) check that the user is carrying his mobile phone with him and b) foster the user’s physical activity. In future, the application may deal with learning methods to cope with uncertainty and minimize notifications.

Fig. 2. Bundle deployment in CASanDRA Mobile

The application will be finally deployed in a separate bundle. That makes eight independent bundles (Fig. 2) that can be developed in parallel, using mock components to simulate the others. Final integration is expected to be seamless if every single bundle is adequately tested.

6 Conclusions and Further Work The development of light fusion strategies for context management remains a challenge: from stable and efficient and accurate context feature extraction, to complex reasoning or uncertainty management and context sharing, there is a way to go to have systems for resource-constrained mobile nodes. Additionally to the specific issues related to context management, common problems of mobile application development (such as portability, modularity or scalability) remain without universal solution. CASanDRA Mobile is an attempt to address both aspects, delivering a full-featured but light fusion framework for context management in mobile devices. The platform is still in its infancy, but its first version demonstrates the feasibility and convenience of building the framework on the service oriented architecture implemented through mOSGi. We expect to deliver results on performance tests over an intensive consumer of context data application. The framework, which aims at being transparent to the applications developer, works on a general model of resources and context elements, so it defines interfaces which enable that middleware services use new resources as they appear in the environment.

An Embeddable Fusion Framework to Manage Context Information

477

Our current lines of work are currently focused on improving: 1) a light strategy for ‘quality of context’ control during all the fusion process; 2) a fusion module to manage position estimation in a seamless manner; 3) an stable activity inference system which uses Bayesian logic; 4) a model for context sharing context among different devices with the objective of improving context estimation and 4) a reasoning service including ontology processing. Of course, these are still a small part of the fusion problems to solve in order to have an operative framework. Acknowledgments. This work has been supported by the Government of Madrid under grant S-0505/TIC-0255 and by the Spanish Ministry of Science and Innovation under grant TIN2008-06742-C02-01.

References 1. Schilit, B.N., Theimer, M.M.: Disseminating active map information to mobile hosts. IEEE Network, 22–32 (September/October 1994) 2. Hong, J.-Y., Suh, E.-H., Kim, S.-J.: Context-aware systems: A literature review and classification. Expert Systems and Applications 36, 8509–8522 (2009) 3. Bernardos, A.M., Tarrío, P., Casar, J.R.: CASanDRA: A framework to provide Context Acquisition Services And Reasoning Algorithms for Ambient Intelligence Applications. In: Proc. Int. C. on Parallel and Distributed Computing, Apps. and Tech., Hiroshima (2009) 4. Bernardos, A.M., Tarrío, P., Casar, J.R.: A data fusion framework for context-aware mobile services. In: Proc. of the IEEE International Conf. in Multisensor Fusion and Integration for Intelligent Systems, Seoul, pp. 606–613 (2008) 5. OASIS Standard, Reference Model for Service Oriented Architecture 1.0 (2006) 6. Chan, A.T.S., Chuang, S.: MobiPADS: A Reflective Middleware for Context-Aware Mobile Computing. IEEE Transactions on Software Engineering 29(12) (2003) 7. Sobel, J.M., Friedman, D.P.: An introduction to reflection-oriented programming. In: Proceedings of Reflection 1996, San Francisco (1996) 8. Capra, L., Emmerich, W., Mascolo, C.: CARISMA: Context-Aware Reflective mIddleware System for Mobile Applications. IEEE T. on SW Engin. 29(10), 929–945 (2003) 9. Sørensen, C., Wu, M., Sivaharan, T., Blair, G.S., Okanda, P., Friday, A., Duran-Limón, H.: A Context-Aware Middleware for Applications in Mobile Ad Hoc Environments. In: Proc. 2nd W. on Middleware for Pervasive and Ad hoc Computing, pp. 107–110. ACM, NY (2004) 10. Grace, P., Blair, G.S., Samuel, S.: A Reflective Framework for Discovery and Interaction in Heterogeneous Mobile Environments. ACM SIGMOBILE Mobile Computing and Comms. Review on Discovery and Interaction of Mobile Services 9(1), 2–14 (2005) 11. Edwards, W.K., Newman, M.W., Sedivy, J.Z., Smith, T.F.: Bringing Network Effects to Pervasive Spaces. Pervasive Computing 4(3), 15–17 (2005) 12. Raento, M., Oulasvirta, A., Petit, R., Toivonen, H.: ContextPhone: A Prototyping Platform for Context-Aware Mobile Applications. IEEE Pervasive Comp., 51–59 (April-June 2005) 13. Yamabe, T., Takagi, A., Nakajima, T.: Citron: A Context Information Acquisition Framework for Personal Devices. In: Proc. 11th Int. Conf. on Embedded and Real-Time Computing Systems and Apps., pp. 489–495. IEEE Computer Society, Los Alamitos (2005) 14. Alia, M., Horn, G., Eliassen, F., Khan, M.U., Fricke, R., Reichle, R.: A Component-Based Planning Framework for Adaptive Systems. In: Meersman, R., Tari, Z. (eds.) OTM 2006. LNCS, vol. 4276, pp. 1686–1704. Springer, Heidelberg (2006) 15. Sharmin, M., Ahmed, S., Ahamed, S.I.: MARKS for Mobile Devices of Pervasive Computing Environments. In: Proc. 3rd Int. Conf. on Information Tech., pp. 306–313 (2006)

Embodied Moving-Target Seeking with Prediction and Planning Noelia Oses1 , Matej Hoﬀmann2 , and Randal A. Koene1 1

2

Fundaci´ on FATRONIK-Tecnalia, Mikeletegi pasealekua 7, 20009 Donostia-San Sebasti´ an, Spain {noses,rkoene}@fatronik.com Artiﬁcial Intelligence Laboratory, Department of Informatics, University of Zurich Andreasstrasse 15, 8050 Zurich, Switzerland [email protected]

Abstract. We present a bio-inspired control method for moving-target seeking with a mobile robot, which resembles a predator-prey scenario. The motor repertoire of a simulated Khepera robot was restricted to a discrete number of ‘gaits’. After an exploration phase, the robot automatically synthesizes a model of its motor repertoire, acquiring a forward model. Two additional components were introduced for the task of catching a prey robot. First, an inverse model to the forward model, which is used to determine the action (gait) needed to reach a desired location. Second, while hunting the prey, a model of the prey’s behavior is learned online by the hunter robot. All the models are learned ab initio, without assumptions, work in egocentric coordinates, and are probabilistic in nature. Our architecture can be applied to robots with any physical constraints (or embodiment), such as legged robots. Keywords: bio-inspired control; forward model; inverse model; prediction; planning; egocentric coordinates.

1

Introduction

This paper deals with the problem of moving-target seeking by a mobile robot, a predator-prey scenario. This problem has been long solved in nature, hence we use bio-inspired control methods to approach it. In order to approximate the real-world conditions we use a Khepera robot model with speciﬁc physical constraints. We deﬁne a set of 10 gaits, each gait being a pair of velocities for left and right motors. This restricted repertoire of gaits helps us to approximate the context of animal behavior (our ﬁnal goal is to address more complex platforms such as legged robots). We implement a forward model, which enables the robot to learn to predict how a set of motor commands from its repertoire will inﬂuence its state in the environment [1,2]. The robot needs to learn its own dynamics model for navigation, in accordance with its limited set of gaits. We achieve this through autonomous exploration inspired by the motor-babbling observed in infants [3]. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 478–485, 2010. c Springer-Verlag Berlin Heidelberg 2010

Embodied Moving-Target Seeking

479

The inverse model of the forward model is used to determine the gait needed to reach a speciﬁc location in one time-step, such as the expected relative location of the prey. If a single time-step does not suﬃce then a sequence of gaits is planned. The number of possible combinations of gaits increases exponentially with the length of the sequence, so that eﬃcient heuristics are needed. Finally, the hunter learns a model of the prey’s behavior online and without any prior knowledge or assumptions. This prey model is used to predict future prey locations. All models operate in egocentric (robot-centered) coordinates, without any assumptions on the action space, and incorporate uncertainty. The combination of dynamics and uncertainty (in the robot’s forward model and in prediction of prey behavior) provide a useful approximation of real-world conditions. In these conditions extensive planning is unfeasible, because algorithms need to operate in real-time and a deep plan would need to be updated too rapidly to be useful (the frame problem [4]). Instead, we begin with a bottomup approach: Find a solution that is as reactive as possible; then add lookahead prediction and planning that is required to catch the prey. Planning is therefore added only to the extent that the combination outperforms a simple reactive architecture.

2

Learning a Forward Model in an Egocentric Coordinate System

We use a relative reference system in polar coordinates centered in the hunter robot’s center of mass (Fig. 1a). Angle is measured clockwise from the robot’s posteroanterior vector (PA), i.e. the hunter’s heading is zero degrees. Location and heading constitute a robot’s pose. For the forward model, the hunter’s reference system at time t is used to express the next pose after one time-step. We indicate the heading of the hunter at time t plus one time-step as the angle that the hunter robot’s PA vector subtends measured clockwise with respect to the hunter robot’s PA vector at time t (Fig. 1a).

(a)

(b)

(c)

Next position GAIT

Distance Distance

Angle

Heading

angle Initial position

Next bearing Previous bearing

Bearing

Fig. 1. (a) Egocentric coordinates. (b) Bayesian network for the forward model. (c) Plot showing the outcome of applying diﬀerent gaits.

480

N. Oses, M. Hoﬀmann, and R.A. Koene

The robot needs to learn to predict the outcomes of its actions. The forward model enables this. We deﬁne the application of a speciﬁc gait for a speciﬁc amount of time as an action. The consequence of such an action is a new pose of the robot. We implement the forward model as a Bayesian network (BN), as in Demiris and Dearden [5], because BNs provide a powerful probabilistic framework in which to express the causal nature of a robot’s control system. A motor command (Gait) and the observations Distance, Angle, and Heading are each represented as random variables in the BN (Fig. 1b). We use a na¨ıve Bayes classiﬁer, which is often quite eﬀective even when the attribute values are not conditionally independent [6,7]. The BN parameters (the conditional probability distributions) are learned oﬄine from data obtained during motor-babbling (randomly applied gaits, see Fig. 1c). The data is complete, the structure of the network is known and the prior probability distribution for the gaits is uniform (gaits were applied randomly with equal probability). Maximum a posteriori (MAP) learning therefore reduces to maximum-likelihood parameter learning.

3

Inverse Model and Prey Model

The inverse of the forward model describes which gait to take in order to achieve a desired location (distance, angle) in one time-step. We can obtain this inverse model, P (Gait|Distance, Angle), through inference from P (Distance|Gait) and P (Angle|Gait), which are provided by the BN of the forward model. We approximate (distance, angle) tuples with the nearest learned polar coordinates encountered during learning. The hunter learns a probabilistic transition model of the prey’s movement online, independent of the models for its own movement. The hunter observes how the prey moves, new prey pose as a function of prey pose one time-step earlier (Fig. 2a). Currently, the hunter robot gets the GPS data corresponding to the location of the prey at each time-step; at time t+Δt (Δt being the time-step) transforms that into the prey’s egocentric coordinates with respect to the prey’s reference system at time t; and incorporates the egocentric coordinates to the prey model. This prey model is used to predict the prey’s future positions. This approach resembles Thrun et al. [8], except that we make no a-priori assumptions about the way in which the prey moves or about its possible actions (unlike [9]). We note the frequency of each pose transition observed in terms of distance, angle and heading. An illustrative plot of a prey transition model can be seen next (Fig. 2b).

4

Models and Experiments

We develop a reactive model, a prey prediction model, and a planning model, and we assess the performance of each with the same experiment, conducted both in a walled-in environment and in an open environment. The experiment has seven initial states. These consist in the prey being located at ﬁve bodies’ distance from

Embodied Moving-Target Seeking (a)

481

(b) Next prey position

Distance

angle Initial prey position

Hunter

Fig. 2. (a) Illustration of one prey transition. (b) Example transition model of the prey. (a)

(b) Theta 5 Theta 6

Theta 4 Hunter

Theta 0

Theta 3 Theta 1 Theta 2

Fig. 3. (a) Experiment set-ups. (b) Results of the reactive model.

the hunter and with the prey at angles θ = 0, 1, 2, 3, 4, 5 and 6 radians, with identical headings for hunter and prey (Fig. 3a). Cyberbotics’ Webots’TM [10] Braitenberg controller runs the prey, so that it moves straight ahead until it senses an obstacle and turns. The hunter performs no obstacle avoidance. The simulated time elapsed until the hunter catches the prey is recorded. Simulations end when the prey is caught or after one simulated minute. An experiment consists of 100 simulations for each initial state. 4.1

Reactive Model

The hunter applies a gait determined by the inverse model in accordance with the current prey position. Resulting reactive behavior only enables the hunter to catch the prey in very concrete circumstances. In general, the hunter appears to follow the prey around (Fig. 3b). Out of 700 runs only 102 were successful (14.57% success rate). When the prey started oﬀ at θ = 1 radians the hunter

482

N. Oses, M. Hoﬀmann, and R.A. Koene

was always successful. The hunter also caught the prey on 2 occasions when the prey started oﬀ at θ = 2 radians. 4.2

Prey Prediction Model

The hunter learns the prey model online and uses it to predict the prey’s future position (Fig. 4a). At each time-step, the prey’s predicted position is used as target position for the inverse model, which determines the hunter’s gait. The prey’s position can be predicted ahead for a number of time-steps (T), and the optimal number depends on the distance between hunter and prey. We set T to the nearest integer to half of that distance. (a)

(b)

(c)

Prey t+3Δ t Prey t+2Δ t

Prey t+Δt

Prey t Hunter

Fig. 4. (a) Prey prediction. (b) Results of the prey prediction model in the closed environment. (c) Results of the prey prediction model in the open environment.

Figure 4b shows results in a walled-in environment. The prey was caught in 655 of 700 runs (92.8% success rate), with average catch time (including misses): 15.287 seconds. Figure 4c shows results in an open environment. The prey was caught in 473 of 700 runs (67.6% success rate). The lower success rate was inﬂuenced by the prey controller, as the prey can continue to run straight when nothing forces it to turn. 4.3

Planning Model

For a better success rate in the open environment, the hunter needs to plan more than one gait ahead, composing gaits to catch the prey. We now predict the prey’s position at successive time-steps and select the minimum at which a composition of gaits will minimize the distance between the hunter and the prey. Heuristic Solution with Best-First Search: The theoretical solution would involve calculating the probability distribution for the distance between the hunter and the prey. In doing so we would encounter the “curse of dimensionality” due to the exponential increase in the size of the state space with each level of the search tree (Fig. 5). We can avoid this by using sampling to predict hunter

Embodied Moving-Target Seeking

G1

G1G1

G1G2

G2

G1G3

G2G1

G2G2

483

G3

G2G3

G3G1 G3G2

G2G1G1 G2G1G2 G2G1G3 G2G2G1 G2G2G2 G2G2G3 G2G3G1 G2G3G2

G3G3

G2G3G3

Fig. 5. Search tree for planning sequence of gaits (shown only for 3 gaits for the sake of clarity) (a)

(b)

Predicted and actual hunter and prey trajectories 0.4 0.35

Predicted prey trajectory Predicted hunter trajectory Actual prey trajectory Actual hunter trajectory

0.3

Y (m)

0.25 0.2 0.15 0.1 0.05 0 0

0.05

0.1

0.15

0.2 0.25 X (m)

0.3

0.35

0.4

Fig. 6. (a) Example of a planning iteration. (b) Results of the planning model in an open environment.

position. We calculate samples for each time step and each diﬀerent sequence of gaits. Each node in the tree has associated information: T (time-steps or depth the node plans), Gait[t] (sequence of gaits applied at time-steps t < T ), cost in terms of number of gaits used in planning, cost in terms of number of gait transitions (transitions will be important in the legged robot scenario), Hunter[t] (predicted hunter coordinates at time-steps t < T relative to hunter pose during planning), Value (ﬁnal predicted distance between hunter and prey). Choosing a sequence of gaits is a combinatorial optimization problem. A breadth-ﬁrst search of the tree was too slow, so we proceeded to use a bestﬁrst search. The best-ﬁrst search algorithm explores a graph by expanding the most promising node. In our case, the most promising node is the one that most reduces the distance to predicted prey position. The search tree with g T nodes needs to be pruned further, for example by eliminating combinations with more than one gait transition. With the planning model (Fig. 6b), 591 of 700 runs were successful (84.4% success rate).

484

5

N. Oses, M. Hoﬀmann, and R.A. Koene

Discussion and Conclusions

We have presented a bio-inspired control architecture that allows a mobile robot to: (1) learn a model of its own action repertoire (a forward model); (2) learn a model of an object (prey) it is seeking; (3) combine the forward model and the prey model to seek the prey. Braitenberg [11] and Brooks [12] showed that robots that rely on embodiment, purely reactive behaviors and that exploit interaction with the environment could address real-world dynamic problems that representations in classical A.I. could not adequately deal with. Such robots exhibit sophisticated behaviors and properties such as adaptivity, robustness, versatility and agility found in biological organisms, yet without emphasizing cognitive capabilities such as planning, abstract reasoning or language. Following this inspiration, we took a bottom-up approach by developing a reactive model ﬁrst and only adding cognitive capabilities as and when necessary. Our architecture has the following properties: (1) An egocentric coordinate system is used; (2) The model can deal with an arbitrary action repertoire of the hunter and the prey. There are no assumptions on the behavior of the hunter or prey; (3) The action space is discrete; (4) The models are learned ab initio. The hunter’s forward model is is learned as a result of a motor-babbling phase. The prey’s model is learned online and incrementally updated; (5) Our model accounts for and plans with uncertainty. We see two possible uses for our architecture. First, it can be applied as a whole to moving-target seeking by an autonomous vehicle, for instance. Or, only individual components can be utilized. The forward model implementation would allow an arbitrary robot to learn its motor repertoire and plan with it. The prey model can be applied to any target object, such as in a person-following scenario [9]. Second, our scenario could serve to model biology. By adding details about particular behaviors we may test hypotheses for the way in which animals achieve similar behaviors, for example: the prey-catching behavior of the spider Portia [18] or hunting in vertebrates. At the same time, our scenario is a case for minimalistic model of cognition which is ﬁrmly grounded in body dynamics [13,14,15,16,17]. Future work planned includes extending our model to a legged platform which uses real gaits, adding real sensing of the prey (through a camera on the hunter, for instance), and studying various cost functions for the trajectory planning of the hunter. These can include energy consumption, or computational complexity/reaction time. Acknowledgments. The authors would like to thank the AI Lab of the University of Zurich for welcoming the ﬁrst author to the lab through an internship. We would also like to thank Juan Pablo Carbajal for many interesting and informative discussions about the project. This work was undertaken in the context of the “From locomotion to cognition” project funded by the Swiss National Science Foundation, Grant Nr. 200020-122279/1.

Embodied Moving-Target Seeking

485

References 1. Webb, B.: Neural mechanisms for prediction: do insects have forward models? Trends in Neurosciences 27, 278–282 (2004) 2. Wolpert, D.M., Miall, R.C., Kawato, M.: Internal models in the cerebellum. Trends in Cognitive Sciences 2, 338–347 (1998) 3. Meltzoﬀ, A.N., Moore, M.K.: Explaining facial imitation: a theoretical model. Early Development and parenting 6(2), 157, 1–14 (1997) 4. Pfeifer, R., Scheier, C.: Understanding intelligence. The MIT Press, Cambridge (2001) 5. Demiris, Y., Dearden, A.: From motor babbling to hierarchical learning by imitation: a robot developmental pathway. In: Proceedings of the Fifth International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, vol. 123, pp. 31–37 (2005) 6. Mitchell, T.M.: Machine learning. McGraw-Hill, New York (1997) 7. Russel, S., Norvig, P.: Artiﬁcial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliﬀs (1995) 8. Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics. The MIT Press, Cambridge (2005) 9. Vazquez, A.: Incremental learning for motion prediction of pedestrians and vehicles. PhD thesis, Institut National Polytechnique de Grenoble (2007) 10. Michel, O.: Webots: Professional Mobile Robot Simulation. International Journal of Advanced Robotic Systems 1(1), 39–42 (2004) 11. Braitenberg, V.: Vehicles Experiments in Synthetic Psychology. The MIT Press, Cambridge (1986) 12. Brooks, R.A.: Intelligence Without Representation. Artiﬁcial Intelligence Journal 47, 139–159 (1991) 13. Pezzulo, G.: Anticipation and Future-Oriented Capabilities in Natural and Artiﬁcial Cognition. In: Lungarella, M., Iida, F., Bongard, J.C., Pfeifer, R. (eds.) 50 Years of Aritﬁcial Intelligence. LNCS (LNAI), vol. 4850, pp. 258–271. Springer, Heidelberg (2007) 14. Duro, R.J., Gra˜ na, M., de Lope, J.: On the potential contributions of hybrid intelligent approaches to multicomponent robotic system development. Information Sciences (in press, 2010) 15. Clark, A., Grush, R.: Towards a Cognitive Robotics. Adaptive Behavior 7(1), 5–16 (1999) 16. Grush, R.: The emulation theory of representation: motor control, imagery, and perception. Behavioral and Brain Sciences 27, 377–442 (2004) 17. Schomaker, L.: Anticipation in cybernetic systems: A case against mindless antirepresentationalism. In: IEEE International Conference on Systems, Man and Cybernetics, The Hague, Netherlands (2004) 18. Tarsitano, M.: Route selection by a jumping spider (Portia labiata) during the locomotory phase of a detour. Animal Behavior 72, 1437–1442 (2006)

Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces Zorana Banković, Elena Romero, Javier Blesa, José M. Moya, David Fraga, Juan Carlos Vallejo, Álvaro Araujo, Pedro Malagón, Juan-Mariano de Goyeneche, Daniel Villanueva, and Octavio Nieto-Taladriz Dep. Ingeniería Electrónica, Universidad Politécnica de Madrid, Av. Complutense 30, 28040 Madrid, Spain {zorana,elena,jblesa,josem,dfraga,jcvallejo,araujo,malagon, goyeneche,danielvg,nieto}@die.upm.es

Abstract. The area of Human-Machine Interface is growing fast due to its high importance in all technological systems. The basic idea behind designing Human-Machine interfaces is to enrich the communication with the technology in a natural and easy way. Gesture interfaces are a good example of transparent interfaces. Such interfaces must perform the action the user wants, so that the proper gesture recognition is of the highest importance. However, most of the systems based on gesture recognition use complex methods requiring highresource devices. In this work we propose to model gestures capturing their temporal properties, significantly reducing the storage requirements, and using self-organizing maps for their classification. The main advantage of the approach is its simplicity, which enables the implementation using devices with limited resources, and therefore low cost. First testing results demonstrate its high potential. Keywords. Gesture recognition, intelligent environments, self-organizing maps.

1 Introduction Human-machine interaction has been the subject of intense research over the past few decades. The human-computer interaction must be designed as naturally and as easily as possible, without resulting in the perception of an intrusive technology. User interaction should not require the user to adapt to special conventions or rules; it should be the environment who should adapt to the natural way of user interaction. In the recent past, new natural and flexible interfaces, embedded in the objects the people use on everyday basis, have been developed. These new interfaces have been designed and adapted for the end users' needs. One of the most natural and comfortable ways to interact with the system is by hand gestures. Most of the systems based on gesture recognition use complex methods or algorithms, which require high-resource devices to be efficiently performed. For this E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 486–492, 2010. © Springer-Verlag Berlin Heidelberg 2010

Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces

487

reason, there are many interesting works on gesture recognition using general purpose computers. However, we have to take into account that the embedded systems connected to the cameras usually have very limited resources. We need to do as much processing as possible in the camera processor to try to reduce the spectrum occupation but we must pay attention, since this increases the consumption of the sensor. In this way, the main advantage of our proposal is its simplicity and its low resource consumption. This permits us to perform our algorithm on a device with limited resources, and therefore low cost, as embedded systems usually are. In this article we present a low-cost gesture interface that can control different systems of an environment with simple and fast processing in the embedded systems, minimizing the need for communications. First we propose to model gestures capturing their temporal properties, and after that we deploy a self-organizing maps (SOM) algorithm for clustering the gestures. The paper is organized as follows. In Section 2 we present the previous work on the subject. Section 3 details the characterization of gestures. In Section 4 we give further details of the SOM implementation. Finally, results are presented in Section 5 and conclusions are drawn in Section 6.

2 Previous Work on Self-Organizing Maps for Gesture Recognition There are numbers of papers that deploy SOM algorithm for gesture recognition [1, 2, 3, 4]. The common aspect of all of them is that they have two stages and that SOM is either hierarchical or deployed together with another technique that captures temporal properties of the gestures. Furthermore, all of them deploy rather standard characterization that contains information such as trajectory of the hand, resultant direction of the movement, velocity of the movement. Extraction of these features introduces additional computational overhead, and with their complexity, i.e. the need of training two learning algorithms, total computational overhead can be too high for their implementation in devices with limited resources. On the other hand, our characterization implicitly contains above information and its calculation is straightforward. Since it also captures the temporal properties of gestures, we do not need two learning algorithms. This makes our approach less complicated and more appropriate for implementations in devices with limited resources.

3 Gesture Characterization Each gesture is captured as a set of frames of variable size, as presented in the figure below for the gesture up-down.

488

Z. Banković et al.

Fig. 1. Gesture captured in 12 consecutive frames

We propose to divide each frame into n x n smaller parts and assign to each part a value that corresponds to its luminosity. After that we characterize temporal evolution of each part in the following way. If, for example, the values are: 0 0 20 40 50 60 70 50 10, we characterize it with the time-windows of a certain size and the value of each feature is its frequency in the captured gesture. For time window of size 3, it would be the following: 0 0 20 0 20 40 20 40 50 40 50 60 50 60 70 70 50 10

0.16 0.16 0.16 0.16 0.16 0.16

Having in mind that the number of the features extracted in this way does not have to be fixed, in order to find the difference, i.e. distance between two of them, we deploy distance function proposed in [5] that calculates distance between sequences.

Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces

489

Further, the distance between two captured gestures is simply the sum of absolute distances of the corresponding sub-parts. In this way, in the gesture characterization we have captured the temporal evolution of the gesture. The following step is the clustering using Self-organizing map (SOM) algorithm.

4 Self-Organizing Maps Algorithm Self organizing maps (SOM), also known as Kohonen networks, are an unsupervised type of neural networks [6]. As in neural networks, the basic idea of SOM has origins in certain brain operations, specifically in projection of multidimensional inputs to one-dimensional or two-dimensional neuronal structures on cortex. For example, perception of color depends on three different light receptors (red, green and blue), plus the eyes capture information about the position, size or texture of objects. It has been demonstrated that this multidimensional signal is processed by the planar cortex structure. Further, the areas of the brain responsible for different signals from the body preserve topology, e.g. the area responsible for the signals that come from the arms is close to the area responsible for the signals that come from the hand. These are precisely the basic ideas of SOM that consist in the following: 1. Multidimensional data and their dependencies are presented and captured in a lower dimension SOM network (usually 2D). 2. The proximity of the nodes in the lattice reflects similarity of the data mapped to the nodes. For these reasons, SOMs have been widely deployed for clustering and good visualization of clustering problem. If we project the resulting clusters to RGB space, we can visualize the similarities of the adjacent clusters. They have been successfully deployed in different fields such as image processing [7], robotics [8] (for both visual and motor functions), function approximation in mathematics [9], network security [10], detection of outliers in data [11], etc. 4.1 Implementation Details The designed SOM algorithm mainly follows the steps of the standard SOM algorithm [12]. The only specific part is the update of the node. Namely, if the node does not contain a feature from a certain input, we add it to the node with the value 0. However, having in mind that in this way, nodes may end up having many features which would introduce significant computational overhead, we discard all the features that have at least 100 smaller value of the maximal feature value of the node, as this does not affect significantly on the final result. After having finished the training, in the current implementation we label the nodes with the label of the gesture from the set of labeled gestures that is closest to the node according to the distance function explained above. The process is depicted in Figure 2.

490

Z. Banković et al.

Fig. 2. The process of gesture recognition

4.2 Advantages of the Approach The main advantage of our proposal is its simplicity. Our characterization of gestures captures the temporal evolution of the gesture, and we distinguish gestures simply by clustering them. This is another advantage, as in essence we do not have to label all the gestures (only those used for cluster labeling). Furthermore, the characterization significantly reduces the memory needed to store a gesture. For example, a captured gesture that occupies 507kB is reduced to 625B (for 5x5 division of the frame), around 1000 times. This permits us to perform our algorithm on a device with limited resources, as embedded systems deployed in ambient intelligence usually are.

5 Empirical Evaluation 5.1 Training and Testing Dataset In order to test the proposed algorithm, we have captured five types of gesture: leftright, right-left, up-down, down-up and the fifth type are random gestures labeled as unknown. Twelve persons were making the gestures and in total 760 gestures were made. In order to illustrate the memory reduction that our approach provides, we will give the numbers of occupied memories in both cases. The captured gestures occupy 1.08GB of storage space, while after the division of each frame into 5x5 blocks, they occupied 3.11MB, taking around 350 times less storage space. 5.2 Results and Discussion We have tested our algorithm on different training and testing scenarios (by taking different portions of data explained above). We have performed testing with both 3x3 and 5x5 frame partitions. Furthermore, we have experimented with different sizes of time window mention in the Sec. 3. The main conclusion is that in general smaller window sizes (3 for example) exhibit better performances than higher. In general, we have obtained very high detection rates for gestures up-down (94%) and down-up (up to 100%), as well as the unknown gestures (up to 100%). However,

Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces

491

Table 1. Maximal Detection Rates for Each Gesture Gesture Unknown Down-up Up-down Left-right Right-left Overall

Detection Rate (%) 88 100 92 13 13 80

the detection of gestures left-right and right-left did not give satisfactory results as these have been mostly confused with each other (and sometimes with unknown gestures). In the future we plan on further working on this issue. These results are summarized in Table 1.

6 Conclusions In this work we have presented low-cost algorithm for gesture classification. We have proposed the characterization of gestures that captures temporal properties of gesture. We have further clustered the gestures using SOM algorithm, achieving detection rate of up to 100% for certain gestures and overall detection of 80% at most. In the future we plan to add one more stage of SOM clustering in order to detect the users. Acknowledgments. This work was funded by the Spanish Ministry of Industry, Tourism and Trade, under Research Grant TSI-020301-2009-18 (eCID), the Spanish Ministry of Science and Innovation, under Research Grant TEC2009-14595-C02-01, and the CENIT Project Segur@.

References 1. Ishikawa, M., Sasaki, N.: Gesture Recognition based on SOM using Multiple Sensors. In: 9th International Conference on Neural Information Processing, pp. 1300–1304. IEEE Xplore (2002) 2. Shimada, A., Taniguchi, R.: Gesture Recognition Using Sparse Code of Hierarchical SOM. In: 19th International Conference on Pattern Recognition, pp. 1–4. IEEE Xplore (2008) 3. Caridakis, G., Karpouzis, K., Drosopoulos, A.I., Kollias, S.D.: SOMM: Self organizing Markov map for gesture recognition. Pattern Recognition Letters 31(1), 52–59 (2010) 4. Caridakis, G., Karpouzis, K., Pateritsas, C., Drosopoulos, A.I., Stafylopatis, A., Kollias, S.D.: Hand trajectory based gesture recognition using self-organizing feature maps and Markov models. In: ICME 2008, pp. 1105–1108 (2008) 5. Rieck, K., Laskov, P.: Linear Time Computation of Similarity for Sequential Data. Journal of Machine Learning Research 9, 23–48 (2008) 6. Rojas, R.: Neural Networks. Springer, Berlin (1996)

492

Z. Banković et al.

7. Littmann, E., Drees, A., Ritter, H.: Neural Recognition of Human Pointing Gestures in Real Images. In: Neural Processing Letters, pp. 61–71. Kluwer Academic Publishers, Dordrecht (1996) 8. Vleugels, J.M., Kok, J.N., Overmars, M.H.: A self-organizing neural network for robot motion planning. In: Gielen, S., Kappen, B. (eds.) ICANN 1993 Art. Neural Networks Conf. Proc., pp. 281–284. Springer, Heidelberg (1993) 9. Aupetit, M., Couturier, P., Massote, P.: Function Approximation with Continuous SelfOrganizing Maps Using Neighboring Influence Interpolation. In: Proc. of Neural Computation (NC 2000), Berlin, Germany (May 2000) 10. Lane Thames, J., Abler, R., Saad, A.: Hybrid intelligent systems for network security. In: ACM Southeast Regional Conference, Proceedings of the 44th annual Southeast regional conference, pp. 286–289 (2006) 11. Muñoz, A., Muruzábal, J.: Self-Organizing Maps for Outlier Detection. Neurocomputing 18(1-3), 33–60 (1998) 12. SOM Algorithm, http://www.ai-junkie.comñ/ann/som/som2.html

A SVM and k-NN Restricted Stacking to Improve Land Use and Land Cover Classification Jorge Garcia-Gutierrez, Daniel Mateos-Garcia, and Jose C. Riquelme-Santos Department of Computer Science E.T.S.I.I. - University of Seville, Spain {jgarcia,mateos,riquelme}@lsi.us.es

Abstract. Land use and land cover (LULC) maps are remote sensing products that are used to classify areas into diﬀerent landscapes. The newest techniques have been applied to improve the ﬁnal LULC classiﬁcation and most of them are based on SVM classiﬁers. In this paper, a new method based on a multiple classiﬁers ensemble to improve LULC map accuracy is shown. The method builds a statistical raster from LIDAR and image fusion data following a pixel-oriented strategy. Then, the pixels from a training area are used to build a SVM and k-NN restricted stacking taking into account the special characteristics of spatial data. A comparison between a SVM and the restricted stacking is carried out. The results of the tests show that our approach improves the results in the context of the real data from a riparian area of Huelva (Spain).

1

Introduction

Remote sensing has become a very important tool to carry out many diﬀerent tasks for the Natural Environment. In this way, remote sensing has successfully been applied to important activities like ﬂood control, forestal inventories or invasive species control in protected or specially interesting areas. Although remote sensing usually works with images exclusively, data fusion has been of high interest since the appearance of new active sensors (i.e., data is produced as a response for a stimulus which is not the solar light). They complement images and overcome some of their limitations, e.g., the problems associated to shadows. These limitations cause fusion of sensors can be found as a proper technique specially interesting to improve the results of the classical remote sensing approaches. One of the most active research lines has been based on LIDAR (LIght Detection And Ranging) technology. This technology is able to register object heights and it is specially recommended to be applied on complex landscapes like riparian zones. Thus, Verrelst et al.[1] use LIDAR to study vegetal species communities and Antonorakis et al.[2] develop a new methodology to identify diﬀerent types of commercial wood in riparian zones using only LIDAR. An automatic pixel classiﬁcation which is generally supervised is usually the ﬁrst step to extract knowledge from remote sensing data. Several techniques from machine learning have been used with satisfactory results though support vector machines (SVM) are the predominant technique to obtain the best results E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 493–500, 2010. c Springer-Verlag Berlin Heidelberg 2010

494

J. Garcia-Gutierrez, D. Mateos-Garcia, and J.C. Riquelme-Santos

in most cases [3]. Despite the SVM’s high accuracy, improvement is needed to reach the standards for products like land uses and land cover (LULC) maps [4]. LULC maps are remote sensing products that are used to classify areas into diﬀerent landscapes subject to their own characteristics or functionality. The newest techniques have been applied to improve the ﬁnal classiﬁcation to develop LULC maps. Fauvel et al. [5] apply an SVM to classify the pixels depending on morphologic and hyperspectral data. In Mitrakis et al. [6], a neuronal network with weights determined by a genetic algorithm obtains the ﬁnal classiﬁcation using fusion operators and fuzzy logic. It is important to underline that ensembles are one of the most powerful tools in machine learning and so they are in remote sensing where they have also been applied profusely. A very clear example can be seen in [7] where an stacking of several SVM’s and a random forest is used to carry out the pixel classiﬁcation. This work explores the application of ensembles on remote sensing taking advantage of contextual information [8] from multi-source (LIDAR and aerial images) data. Thus, a novel supervised method called R-STACK (based on a stacking of a SVM and multiple NN classiﬁers) is shown with two purposes: – Show an easy way to improve the quality of models when intelligent techniques are applied on LIDAR and imagery fusion data. – Improve the general accuracy of an automatically generated LULC map. The rest of the paper is organized as follows. Section 2 provides a description of the data used in this work. Section 3 describes the methodology used, highlighting the feature set and the model extraction process. The results achieved are shown in section 4 and, ﬁnally, section 5 is devoted to summary the conclusions and to discuss future lines of work.

2

Data Description

A LIDAR system is an optical sensor technology that measures properties of scattered light (usually laser) to ﬁnd range and/or other information of a distant target. The whole process starts with the emission of polarized light, typically, in the ultraviolet visible or near infrared. Then, LIDAR catches the reﬂected signal from the topographic surface and measures the time employed for each return to establish the distance between the emitter and the object that produced the return. This process is helped by a global positioning system (GPS) to give rise to a cloud point database in which for every point, it is possible to ﬁnd: spatial position(i.e., x, y and z coordinates), intensity of return, number of the return in a sequence (if a pulse caused multiple impacts), etc. This features and the RGB values in an orthophoto are used in this work to obtain statistical measures on which the method is based and they will be explained in section 3. The LIDAR data was taken in coastal areas of the province of Huelva. The pulses were geo-referenced and correctly validated by the distributor of the data and having 1,384,875 records for an area of 1.5 km2 . The reported precision indicates a maximum error of 0.5 m in the x-y positions, and of 0.15 m in the

A SVM and k-NN Restricted Stacking to Improve LULC Classiﬁcation

495

z position. Along with the LIDAR ﬂight, the aerial photographs were taken of the area with a resolution of 0.5 m2 . The study area is situated in the south of Spain, in the mouth of the Tinto and Odiel rivers. This area is near the city of Huelva and presents a mix of urban and natural areas. The natural areas can be classiﬁed in ﬁve subclasses: watered zones, marshland and vegetation (low, middle and high). The high vegetation is formed by scarce trees of the genus eucalyptus in the area. The middle vegetation is formed by diﬀerent types of Mediterranean bushes that principally surround roads and urban areas. Pastures are classiﬁed as low vegetation and include bare earth areas. In addition, the urban areas are also classiﬁed in ﬁve subclasses: roads and railways, buildings, coal deposits, dumps and mixed areas.

3

Method

Our LULC development method (see Figure 3) follows a pixel-oriented strategy which obliges us to create a matrix or raster where each element is a pixel. Each pixel represents an area in function of the resolution. The value of resolution must be provided by the user as a method parameter to determine the area within each pixel. The resolution depends on the LIDAR point density and the orthophoto resolution. In our case, the selected resolution is set at 3 m2 . Lesser resolution could damage the smallest classes (roads) classiﬁcation and bigger resolution cannot be possible due to the LIDAR resolution (0.5 points/m2 ). Apart from the resolution, it is necessary to supply a digital elevation model (DEM) to extract the actual heights of the LIDAR returns. In our case, this process is carried out by an adaptative morphologic ﬁlter [9]. In addition, expert knowledge has been applied to manually classify over a 2% of total data (7399 instances). Expert knowledge leaned on the photographs taken in the same ﬂight as LIDAR data and previous information from the Regional Ministry of Andalusia (LULC map from 2003) was collected by an operator to build the training set. The next step (step 2 in Figure 3) is to calculate a set of variables from image RGB values, LIDAR intensity, heights and distribution of the LIDAR returns for each pixel (a total of 500,000 pixels). In this manner, sixty-one diﬀerent measures were calculated for every pixel. Most of variables used have been extracted from literature [10][2]. In Table 1, a summary of these features can be seen. Specially interesting is the case of the normalized diﬀerence vegetation index (NDVI). The classical NDVI is generated from near infrared band (NIR) and the red band (R) as can be seen in Equation 1. In our case, it cannot be calculated since the NIR band is not available in LIDAR or orthophotography. Thus, the new attribute SNDVI has been used to simulate the NDVI using the intensity (I) from LIDAR (Equation 2) as near-infrared value which approximates the real NIR value. N DV I =

N IR − R N IR + R

SN DV I =

I −R I +R

(1) (2)

496

J. Garcia-Gutierrez, D. Mateos-Garcia, and J.C. Riquelme-Santos

A new method called R-STACK based on a modiﬁed stacking of two well-known classiﬁers (SVM and k-NN) has been developed for the model generation. The Weka [11] implementation of SOM and a ad-hoc k-NN implementation were used for each classiﬁer respectively. Moreover, the general scheme for stacking has been modiﬁed to adapt it to geographic data. In this way, the ﬁrst level (steps 5 and 6) consists in a SVM which takes every feature from the pixels in the training area to build an initial model which classiﬁes every pixel from the study zone. At that point, a classical SVM application on images is resulted.

Input l: LIDAR data o: Orthophotograhy data Output m: LULC map Begin 1. Develop a matrix raster in which every cell involves a physical position 2. Add the corresponding statistics from l and o to each pixel in raster Select a training set from raster, called train Label each pixel in train using expert knowledge Build a SVM model, svm, from train Use svm to classify every pixel in raster For each pixel p in raster 7.1. Collect the neigbourhood of p in a set s 7.2. Build a k-NN model, knn, from s 7.3. Use knn to classify p 8. Return a map m with every pixel spatial position and its label End 3. 4. 5. 6. 7.

Fig. 1. The LULC classiﬁcation method based on a R-STACK algorithm (steps 6 to 8)

The novelty of the R-STACK method settles on the second level (step 7). Particularly, on the application of several classiﬁers (k-NN) and the way they are trained. Thus, a k-NN is build for each pixel taking its neighbours in the raster as training set which involves a strong relation (physical dependence) among the training pixels and the current pixel to classify. For the study area, we work with k = 3 and an 8-adjacency that is, each 3-NN is developed with just 8 instances of the pixel surrounding area. For this reason, the process can be tackled from the point of view of eﬃciency and complexity. In the end, the k-NN classiﬁes the current pixel again having used the model built by its neighbours. In this way, possible inconsistences and non-desired eﬀects can be removed. It is important to point out that it is necessary to make a raster copy before this last process and whilst the classes in the original raster are modiﬁed, every k-NN has to be build taking the neighbours from the raster copy in order to avoid collateral

A SVM and k-NN Restricted Stacking to Improve LULC Classiﬁcation

497

Table 1. Sixty-one candidate variables. Variables with (*) are calculated for each band of a pixel: Height(H), Intensity(I), Red(R), Green(G) and Blue(B). Variable Description SNDVIMIN SNDVI minimum SNDVIMAX SNDVI maximum SNDVISTD SNDVI Standard deviation SNDVIAVG SNDVI average MIN(*) Minimum MAX(*) Maximum STD(*) Standard deviation AVG(*) Average VAR(*) Variance SKEW(*) Skewness KURT(*) Kurtosis RANGE(*) Range NOTFIRST Second or later return EMP Empty neighbours

Variable Description ICV Intensity coeﬃcient of variation HCV Height coeﬃcient of variation SLP Slope CRR Canopy relief ratio PEC Penetration coeﬃcient TOTALR Total of returns PCTN1 Unique return percentage PCTN2 Double return percentage PCTN3 Three or more returns percentage PCTR1 First return percentage PCTR2 Second return percentage PCTR3 Third or later return percentage PCTR31 PCTR3 over PCTR1 PCTR21 PCTR2 over PCTR1 PCTR32 PCTR3 over PCTR2

eﬀects. Otherwise, the new classiﬁcation sequence would aﬀect the result of the remaining pixels.

4

Results

Two kinds of testing have been carried out to compare the eﬃciency of our approach against a classical SVM. The ﬁrst test is based on statistical techniques. Since remote sensing data is expensive to generate, the comparison has to rest on an artiﬁcial data split. In our case, 100 splits are created from the original data so that each split contains about 740 instances. Then, a 10-fold-cross-validation process is made for every split. The results are registered for the following comparison process. We have used the procedure suggested in several works [12] for robustly comparing classiﬁers across multiple datasets in order to evaluate the statistical signiﬁcance of the measured diﬀerences in algorithm ranks. The chosen procedure involves the use of a statistical test to compare classiﬁers one each other. Our objective was to compare a classical SVM to our approach in terms of accuracy. Thus, the Wilcoxon procedure was selected as the appropriate test. A fair comparison of the algorithms is obtained by average ranks and in this case, after the previous 100 10-fold-cross-validation results, our approach ranks ﬁrst. With the measured average ranks, the Wilcoxon test checks whether the average ranks are signiﬁcantly diﬀerent from the mean rank r = 1.5 expected under the null hypothesis. Leaning on a statistical package (MATLAB), p value for the Wilcoxon test have resulted on a value less than 5.72e − 06 so the null

498

J. Garcia-Gutierrez, D. Mateos-Garcia, and J.C. Riquelme-Santos

Table 2. A summing up of the hold-out test for the SVM classical approach User class \sample Water Marshland Roads or railways Low Veg. Middle Veg. High Veg. Buildings Coal deposits Dumps Mixed areas TP Rate FP Rate Precision KIA Correctly classiﬁed

Water Marsh Roads or Low Middle High Buildings Coal Dumps Mixed railways Veg. Veg. Veg. deposits areas 178 5 0 0 0 0 0 0 0 2 0 100 1 2 2 1 0 2 0 1 0 0 0

4 4 9

69 2 2

0 50 2

6 1 21

0 0 3

0 0 0

0 0 0

1 0 0

0 0 0

0 0

0 2

0 3

0 0

0 2

26 1

0 31

0 0

0 0

0 0

0 0 0 0.795 0 1

10 0 0 0.625 0.003 0.833

0 1 1 0 0 0 0.962 0.917 0.002 0.051 0.994 0.8 0.815

0 0 17 0.863 0.048 0.734

4 1 0 0 0 0 0 0 0 0.877 0.568 1.0 0.015 0.021 0.009 0.862 0.636 0.839

0 0 21 9 1 2 0.677 0.1 0.004 0.021 0.913 0.143

0.846

Table 3. A summing up of the hold-out test for the SVM + k-NN restricted stacking User class \sample Water Marshland Roads or railways Low Veg. Middle Veg. High Veg. Buildings Coal deposits Dumps Mixed areas TP Rate FP Rate Precision KIA Correctly classiﬁed

Water Marsh Roads or Low Middle High Buildings Coal Dumps Mixed railways Veg. Veg. Veg. deposits areas 181 3 0 0 0 0 0 0 0 1 1 98 1 5 1 1 0 0 0 2 0 0 0 0 0

4 2 4 0 2

0 1 1 0 0 0 0.978 0.899 0.005 0.033 0.989 0.86 0.847 0.873

72 2 0 0 2 1 0 15 0.9 0.04 0.774

0 53 5 0 0

2 0 25 0 2

0 0 3 26 2

4 0 0 0 0 0 1 0 0 0.93 0.676 1.0 0.028 0.009 0.01 0.779 0.833 0.813

0 0 0 0 31

0 0 0 0 0

0 0 0 0.795 0 1

10 0 0 0.625 0 1

0 0 0 0 0

2 0 0 0 0

0 0 24 6 0 4 0.774 0.2 0 0.019 1 0.267

A SVM and k-NN Restricted Stacking to Improve LULC Classiﬁcation

499

hypothesis is rejected. Having found that the measured average ranks are signiﬁcantly diﬀerent (at α = 0.05), our analysis based on ranks reveals that the accuracy of classical SVM is signiﬁcantly worse than that of our approach for this kind of data. The second type of testing is a hold-out process with data previously classiﬁed. This is the common testing in remote sensing. The test data set(600 instances) was selected from the original data set because of its special diﬃculty to be classiﬁed and it is not part of the training set. In Table 3 and Table 2, results for our approach and classic SVM are shown when the hold-out process is carried out. The general improvement is a 3% which is a very important advance.

5

Conclusions

In this paper, a new method based on a multiple classiﬁers ensemble was used to improve LULC map accuracy. The method built a statistical raster from LIDAR and image fusion data following a pixel-oriented strategy. Then, the pixels from a training area were used to train a SVM and k-NN restricted stacking (called R-STACK) taking into account the special characteristics of spatial data. A comparison between a SVM and the R-STACK method was carried out. The results in a riparian area of Huelva (Spain) showed a global accuracy of 84.6% for the classical SVM and 87.6% for the new approach which means a signiﬁcant advance. Even though results are satisfactory, there are still several problems to ﬁx. Some of them are related to shadows from images and its weight on the ﬁnal classiﬁcation which has to be taken into account. Hence, a control of weights for each feature has to be implemented in order to avoid their misclassiﬁcation eﬀects. Genetic algorithms could be a very suitable tool to solve this problem. In addition, dependence on the training set can be a more important problem. Sometimes, the training set can be incomplete or not enough to describe the real space. These problems are harder to ﬁx. Despite the fact that a semi-supervised approach seems to be more suitable to sort out this kind of problems, very few semi-supervised proposals can be found yet and more research is needed in order to develop them with the required accuracy. Finally, some problems are inherent in pixel-oriented approaches such as the detection of partial artiﬁcial structures. In the future, it would be very interesting to apply a prior phase in which at low addition to the computational cost, an object-oriented segmentation and classiﬁcation could be carried out to extract the most diﬃcult structures to classify, using visual recognition techniques from the computer vision world. Acknowledgments. We would like to thank the Regional Ministry of Andalusia for all the support received in the development of this work and especially, to thank Irene Carpintero, Juan Jos´e Vales and Daniel Laguna for their very ´ appreciated comments. We would also like to thank Francisco Mart´ınez-Alvarez and Luis Gon¸calves-Seco for all the time they invested that allowed this work to be completed.

500

J. Garcia-Gutierrez, D. Mateos-Garcia, and J.C. Riquelme-Santos

References 1. Verrelst, J., Geerling, G., Sykora, K., Clevers, J.: Mapping of aggregated ﬂoodplain plant communities using image fusion of casi and lidar data. International Journal of Applied Earth Observation and Geoinformation (11), 83–94 (2009) 2. Antonarakis, A., Richards, K., Brasington, J.: Object-based land cover classiﬁcation using airborne LIDAR. Remote Sensing of Environment (112), 2988–2998 (2008) 3. Dalponte, M., Bruzzone, L., Vescovo, L., Damiano, G.: The role of spectral resolution and classiﬁer complexity in the analysis of hyperspectral images of forest areas. Remote Sensing of Environment (113), 2345–2355 (2009) 4. Shao, G., Wu, J.: On the accuracy of landscape pattern analysis using remote sensing data. Landscape Ecology (23), 505–511 (2008) 5. Fauvel, M., Benediktsson, J., Chanussot, J., Sveinsson, J.: Spectral and spatial classiﬁcation of hyperspectral data using SVMs and morphological proﬁles. IEEE Transactions on Geoscience and Remote Sensing 46(11), 3804–3814 (2008) 6. Mitrakis, N., Topaloglou, C., Alexandridis, T., Theocharis, J., Zalidis, G.: Decision fusion of GA self-organizing neuro-fuzzy multilayered classiﬁers for land cover classiﬁcation using textural and spectral features. IEEE Transactions on Geoscience and Remote Sensing 46(7), 2137–2152 (2008) 7. Waske, B., van der Linden, S.: Classifying multilevel imagery from sar and optical sensors by decision fusion. IEEE Transactions on Geoscience and Remote Sensing 46(5), 1457–1466 (2008) 8. Cortijo, F.J., Blanca, N.P.D.L.: Improving classical contextual classiﬁcations. International Journal of Remote Sensing 19(8) (1998) 9. Goncalves-Seco, L., Miranda, D., Crecente, R., Farto, J.: Digital terrain model generation using airborne LIDAR in ﬂorested area of Galicia. In: Proceedings of 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Spain, pp. 169–180 (2006) 10. Hudak, A.T., Crookston, N.L., Evans, J.S., Halls, D.E., Falkowski, M.J.: Nearest neighbor imputation of species-level, plot-scale forest structure attributes from lidar data. Remote Sensing of Environment 112, 2232–2245 (2008) 11. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1) (2009) 12. Garcia, S., Herrera, F.: An extension on statistical comparisons of classiﬁers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)

A Bio-inspired Fusion Method for Data Visualization Bruno Baruque1 and Emilio Corchado2 1 2

Department of Civil Engineering, University of Burgos. C/ Francisco de Vitoria s/n, 09006 Burgos, Spain Departamento de Informática y Automática, University of Salamanca. Plaza de la Merced s/n, 37008, Salamanca, Spain [email protected], [email protected]

Abstract. This research presents a novel bio-inspired fusion algorithm based on the application of a topology preserving map called Visualization Induced SOM (ViSOM) under the umbrella of an ensemble summarization algorithm, the Weighted Voting Superposition (WeVoS). The presented model aims to obtain more accurate and robust maps, also increasing the models stability by means of the use of an ensemble training schema and a posterior fusion algorithm, been those very suitable for visualization and also classiﬁcation purposes. This model may be applied alone or under the frame of hybrid intelligent systems, when used for instance in the recovery phase of a case based reasoning system. For the sake of completeness, the comparison of the performance with other topology preserving maps and previous fusion algorithms with several public data set obtained from the UCI repository are also included.

1

Introduction

One of the main problems for data analysis nowadays is not the diﬃculty in obtaining the data, but the extraction of useful information from the huge amount of data that almost every business management, industrial or scientiﬁc process generates. Also, the organization and classiﬁcation of the already existent data for a posterior use is a primary concern when talking about knowledge management and applications. Among the variety of tools at our disposition for these kind of tasks, one of the most useful are Artiﬁcial Neural Networks (ANNs) [1] and those making use of unsupervised learning in particular, as no prior knowledge about the data set is needed for their training. Among these models the topology preserving map family has proven as a very useful one in tasks such as visual data inspection [2], data clustering and organization -due to their pattern matching capabilities- [3] or image processing [4] among others. It is a well-known phenomenon that due to the usual use of randomness in the ANNs training process, the results of training two networks with the same parameters can lead to somewhat diﬀerent results, being a diﬃcult task to identify one as better than the other. Ensemble [5] and fusion theory aim to to obtain more accurate and robust models, also increasing their stability.This research is E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 501–509, 2010. c Springer-Verlag Berlin Heidelberg 2010

502

B. Baruque and E. Corchado

based then on the application of ensembles to a topology preserving map called Visualization Induced Self-Organizing Map (ViSOM) [6]. This type of algorithms can be used for classiﬁcation and visualization purposes. Then they are very suitable to perform these kind of tasks under the frame of hybrid intelligent systems, when used for instance in the recovery phase of a case based reasoning system (CBR) [7]. For the sake of completeness the comparison of the performance with other topology preserving maps and previously presented fusion algorithms with several public data set are also included.

2

Topology Preserving Mapping

The topology preserving maps comprises a family of techniques conceived as a visualization tool to enable the representation of high-dimensional data sets on 2-dimensional maps and thereby facilitating data interpretation tasks for human experts. The best known technique among them is the Self-Organizing Map (SOM) algorithm [8]. It is based on a type of unsupervised learning called competitive learning; an adaptive process in which the neurons in a neural network gradually become sensitive to diﬀerent input categories, sets of samples in a speciﬁc domain of the input space [9]. One interesting extension of this algorithm is the Visualization Induced SOM (ViSOM) [6], proposed to directly preserve the local distance information on the map, along with the topology. The ViSOM constrains the lateral contraction forces between neurons and hence regularizes the inter-neuron distances so that distances between neurons in the data space are in proportion to those in the input space [10]. The diﬀerence between the SOM and the ViSOM hence lies in the update of the weights of the neighbours of the winner neuron as can be seen from Eq. 1 and Eq. 2. Update of neighbourhood neurons in SOM: wk (t + 1) = wk (t) + α(t)η(v, k, t)(x(t) − wk (t))

(1)

where, x denotes the network input; wk the characteristics vector of each neuron; α is the learning rate of the algorithm; and η(v, k, t) is the neighbourhood function, in which v represents the position of the winning neuron or Best Matching Unit (BMU) in the lattice, and k the positions of the neurons in its neighbourhood. Update of neighbourhood neurons in ViSOM is done, on its hand, using the following expression: dvk − vk λ wk (t + 1) = wk (t) + α(t)η(v, k, t) (x(t) − wv (t)) + (wv (t) − wk (t)) vk λ (2) where, dvk and vk are the distances between neurons in the data space v and k on the unit grid or map, respectively, and λ is a positive pre-speciﬁed resolution parameter. It represents the desired inter-neuron distance -of two neighbouring nodes- reﬂected in the input space. The most common neighbourhood function used in this kind of models is the Gaussian function or, in particular cases, the diﬀerence of Gaussian.

A Bio-inspired Fusion Method for Data Visualization

3 3.1

503

Previous Work on Topology Preserving Algorithms Fusion The Ensemble Meta-algorithm

In the ﬁeld of AI, ensemble learning [5] is the process by which multiple models, such as classiﬁers or experts, are strategically generated and combined to solve a particular computational intelligence problem. In the case of this study, one of the simplest ensemble-based algorithms is used: the Bagging algorithm [11]. By randomly selecting a sub-set of the original data set several times, this algorithm obtains several “replicated” data sets in which various entries from the original data set will appear -one or many timesand other entries will not appear at all. Training the same algorithm over each of the sub-sets gives as a result a set of slightly diﬀerent automated learning models, that are hoped to overcome the problems that could arise with a single one. Especially related to ensemble learning, classiﬁer fusion techniques have been subject of study by many researchers [12,13]. The aim of these kind of techniques is to obtain a single classiﬁer able to improve the performance of a single one by training an array of several simpler but similar classiﬁers and ﬁnally “summarizing” them into a ﬁnal one. A particular way of performing this fusion is at the model level. The advantage of this, apart from the improvement of the classiﬁcation performance and stability is the obtaining of a single model, which is easier to deal with. In the case of the present study the objective is the calculation of a single map, as when a visual inspection of a data set is required, the simplicity is an essential characteristic. 3.2

Map Fusion by Euclidean Distance

The Map Fusion by Euclidean Distance algorithm [14] ﬁrst searches for the neurons that are closer in the input space (selecting only one neuron in each network of the ensemble) then it fuses them to obtain the ﬁnal neuron in the fused map. This process is repeated until all the neurons have been fused. The main characteristic of this approach is that a pair-wise match of the neurons of each network will always take place. When completing the fusion of two neurons, the neighbouring neurons are not taken into account. Fusing two neurons will result in a neuron associated with a slightly diﬀerent characteristic vector. In visual terms, this is the same as “shifting” the position of a neuron in a map. If this is done without taking account of the neighbouring neurons, two neurons considered neighbours will not necessary be the two closest neurons of the network in the ﬁnal fused map. The complete algorithm implementing this fusion method is detailed in the original publication [14].

504

4

B. Baruque and E. Corchado

A Novel Fusion Algorithm: Weighted Voting Superposition

The idea behind the novel fusion method presented in this study, Weighted Voting Superposition (WeVoS) [15], is to obtain the best position for a neuron, but also for their neighbours, unlike the previously explained method. As a consequence, the ﬁnal map keeps one of the most important features of this type of algorithms: its topological ordering. In this study, the WeVoS is applied for the ﬁrst time to the ViSOM using well-know data sets to perform a thorough study and comparison of its capabilities. The ﬁrst step in this meta-algorithm is to calculate the “quality” -or rather, error measure [16,17]- of each of the neurons composing each map, in order to relay some kind of informed decision for the fusion of neurons. The ﬁnal map is obtained also on a neuron-by-neuron basis. First, the neurons of the ﬁnal map are initialized by calculating the centroids of the neurons in the same position of the map grid in each of the trained maps. Then, a recalculation of the ﬁnal position of that neuron uses the information associated with the neurons in that same position in each map of the ensemble. For each neuron, a sort of voting process is performed, as in Eq. 3: bp,m qp,m · M Vp,m = M i=1 bp,i i=1 qp,i

(3)

where Vp,m is the weight of the vote for the neuron included in map m of the ensemble, in position p. M is the total number of maps in the ensemble, bp,m is the binary vector used for marking the data set entries recognized by neuron in position p of map m, and qp,m is the value of the desired quality measure for a neuron in position p of map m. The weights of the neurons are fed into the ﬁnal network as with the data inputs during the training phase of a SOM, considering the “homologous” neuron in the ﬁnal map as the BMU. The weights of the ﬁnal neuron will be updated towards the weights of the composing neuron. The diﬀerence of the updating performed for each “homologous” neuron in the composing maps depends on the quality measure calculated for each neuron. The higher the quality (or the lower the error) of the neuron of the composing map, the stronger the neuron of the fused map updated towards the weights of that neuron. The number of data inputs recognized by each neuron is also taken into account in this quantization of the “best suitability” of one neuron or another for the same position in the ﬁnal map. So, in comparison with previously presented method -Fusion by Euclidean Distance-, when updating the characteristics of a single neuron, this approach takes into account not only the characteristics of that neuron, but also the topographic ordering of its neighbour. It is expected that this new approach will obtain more maps that are more faithful to the inner structure of the data set from a visualization point of view.

A Bio-inspired Fusion Method for Data Visualization

5

505

Experiments and Results

Several experiments have been performed to check the suitability of using the previously described fusion techniques under the frame of the mentioned topology preserving models. The data sets selected are Iris and Echocardiogram that were obtained from the UCI repository [18]. There can be found in literature many diﬀerent measures to asses the quality of a topology preserving maps [17]. Dealing with usually such subjective aspects as visual inspection, among them there is no one being able to capture all aspects of the performance of this kind of algorithm. Rather than that, many researchers use diﬀerent measures, considering them complementary. In the case of this work, two diﬀerent quality measures have been used: Distortion [19] and “Goodness of Map” [20]. Both of them are considered error measures, so lower values for them are considered good results. For all the tests involving the fusion of networks, the procedure is the same. A simple n-fold cross-validation is used in order to employ all data available for training and testing the model and having several executions to calculate an average of its performance. In each step of the cross-validation, ﬁrst an ensemble of networks must be obtained -by using the bagging algorithm-. Then the computation of both fusion algorithms is performed. Finally the quality measures are calculated employing the test fold, both for the single models and the fusion obtained from the ensemble. In Fig. 1, a representation of four of the previously discussed maps are represented in the input space of the data set -ﬁrst 2 Principal Components- so the way each of them adapts to the data set can be clearly compared. It can be seen how the WeVoS models (Fig. 1c and Fig. 1d) slightly modiﬁes the position of the grid of their corresponding single model (Fig. 1a and Fig. 1b) to better cover the input data space. The remaining model -Fusion by Euclidean Distance- was not included due to space constraints. The second set of experiments performed (Fig. 2 and Fig.3) consisted in progressively reducing the size of the data set used in order to observe how this reduction, which progressively introduces instability to the data set, aﬀects the performance of the models. Experimentally, ﬁve maps have been selected as the most suitable number of components for the ensemble in this experiment, as it is important to have a certain variability in the ensemble to obtain signiﬁcant results. From these analytical results it can clearly be inferred that Weighted Voting Superposition obtains consistently better results for the Distortion measure (Fig. 2a and Fig.3a), accounting for the topographic preservation of the models. As a consequence of that clear advantage, the Goodness of Map (Fig. 2b and Fig.3b), which accounts both for the quantization and topology error is also worse for the Fusion by Euclidean Distance than for the WeVoS in almost all test. Other general observation that can be extracted from the showed results is that, as the Iris data set (Fig. 2) is quite simple and well deﬁned, the eﬀect of reducing the number of samples in the test does not signiﬁcantly aﬀect the results of the maps, with the exception of the Fusion by Euclidean Distance.

506

B. Baruque and E. Corchado

(a) SOM

(b) ViSOM

(c) WeVoS-SOM

(d) WeVoS-ViSOM

Fig. 1. Four of models discussed -two single models and two ensemble fusion modelsembedded in a 2D representation of the Iris data set. Both the data set and the grids are projected over the ﬁrst two Principal Components of the data set.

(a)

(b)

Fig. 2. Results comparing single algorithms and the two ensemble fusion algorithms. Experiments performed varying the number of samples used from the Iris data set.

A Bio-inspired Fusion Method for Data Visualization

(a)

507

(b)

Fig. 3. Results comparing single algorithms and the two ensemble fusion algorithms. Experiments performed varying the number of samples used from the Echo-Cardiogram data set.

This is due to the way the position of the ﬁnal units of this fusion is calculate that, as explained before, does not take into account their neighbourhood. On the other hand, when decreasing the number of samples of the Echocardiogram data set (Fig. 3) this clearly increases the instability of the results, although not a clear tendency appears. Studying more in detail the analytical results it can be concluded that when dealing with a rather simple data set as is the Iris (150 entries, 4 dimensions), the use of the ensemble algorithm does not necessarily lead to better results. Seeing Fig. 2a, it can be seen, that Distortion is very similar for the single SOM and ViSOM models and slightly lower for both the WeVoS-SOM and WeVoS-ViSOM. Although for a very small margin the ViSOM proves to be a little better than the SOM and the WeVoS-ViSOM a bit better than the WeVoS-SOM. On the contrary, the case is inverted for the Goodness of Map measure (Fig. 2b). In this case, although again with little diﬀerence, the single models obtain lower error than the WeVoS counterparts. These results mean that, although the WeVoS helps improve the topographic preservation of the models, it can degrade its vector quantization performance. As can be seen from Fig. 3 when the data set is more complicated to interpret, such as the Echo-cardiogram data set (104 entries, 9 dimensions), the clear improvement of the results makes extra eﬀort of training an ensemble of maps worthy. The Distortion measure (Fig. 3a) is clearly improved by the Fusion algorithms -both Fusion by Euclidean Distance and WeVoS-SOM-, being the WeVoS-SOM the model obtaining the lowest error of the three. In this case, the ViSOM obtains clear lower error than the SOM. The Fusion by Distance of ViSOM is not able to outperform the single model, but the WeVoS-ViSOM obtains generally better results than the single ViSOM. For the Goodness of Map (Fig. 3b), in this case the Fusion by Distance obtains far worse results than the rest. Single SOM and ViSOM obtain mixed results, although they are very close, especially when the size of the data set decreases to 63 entries, it could be concluded that in this case the SOM performs a bit better. In the case of the

508

B. Baruque and E. Corchado

WeVoS models this situation is inverted, as the WeVoS-ViSOM obtains lower error than the WeVoS-SOM for a size of data set of 63 entries or less.

6

Conclusions and Future Work

This work has presented a model for topology preserving map algorithms fusion. Its aim is to obtain a more truthful representation of the data set by enhancing one of its main features: topology preservation. In this case it has been tested with a model such as the ViSOM for the ﬁrst time. The present work includes comparison of the WeVoS-ViSOM model with previously devised fusion methods and its application to other topology preserving models such as the SOM. Results seem to point to the fact that the use of the WeVoS algorithm improves topology preservation of the ﬁnal maps, as the decrease in the Distortion error proves. Due to the use of a subset of the whole training set, it is able to concentrate in some interesting features found in each map, that might not have been clearly registered when training a map on the whole data set. By doing so, the ﬁnal map will most likely perform worse when measuring the quantization error, as it is not so focused on each of the samples of the data set. Therefore, this technique will be of more use when trying to obtain a visual representation of the data set structure, rather than to be used as a vector quantization algorithm. Also it can be concluded from the results that, as expected, the method is manly useful when analyzing a complex data set with a high number of dimensions or low number of entries. In this cases, the extra complexity of performing several trains of the algorithm are compensated by a clear improvement of the results. Future work includes a wider comparison of the WeVoS with other topology preserving maps and with more complex ensemble training algorithms, such as boosting; to clearly conﬁrm the strengths and weaknesses of the algorithm. It also includes the use of the WeVoS into a real-life application that beneﬁts from its particular characteristics, such as its inclusion in hybrid intelligent systems based, for instance, on the use of the CBR or multi-agent system methodologies. Acknowledgments. This research has been partially supported through projects CIT-020000-2008-2 and CIT-020000-2009-12 of the Spanish Ministry of Education and Innovation and BU006A08 of the Junta of Castilla and León (JCyL). The authors would also like to thank the manufacturer of components for vehicle interiors, Grupo Antolín Ingeniería, S.A. in the framework of the project MAGNO 2008 – 1028 – CENIT Project funded by the Spanish Ministry of Science and Innovation.

References 1. Bishop, C.: Neural Networks for Pattern Recognition, Oxford (1995) 2. Deboeck, G., Kohonen, T. (eds.): Visual Explorations in Finance: with SelfOrganizing Maps. Springer, Heidelberg (1998)

A Bio-inspired Fusion Method for Data Visualization

509

3. Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM - self-organizing maps of document collections. Neurocomputing 21(1-3), 101–117 (1998) 4. Kohonen, T., Oja, E., Simula, O., Visa, A., Kangas, J.: Engineering applications of the self-organizing map. In: Proceedings of the IEEE, vol. 84, pp. 1358–1384 (1996) 5. Sharkey, A., Sharkey, N.: Diversity, selection and ensembles of artiﬁcial neural nets. In: Third International Conference on Neural Networks and their Applications. IUSPIM, pp. 205–212 (1997) 6. Yin, H.: ViSOM - a novel method for multivariate data projection and structure visualization. IEEE Transactions on Neural Networks 13(1), 237–243 (2002) 7. Baruque, B., Corchado, E., Mata, A., Corchado, J.M.: A forecasting solution to the oil spill problem based on a hybrid intelligent system. Information Sciences 180(10), 2029–2043 (2010) 8. Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, Berlin (1995) 9. Kohonen, T., Lehtio, P., Rovamo, J., Hyvarinen, J., Bry, K., Vainio, L.: A principle of neural associative memory. Neuroscience 2(6), 1065–1076 (1977) 10. Yin, H.: Data visualisation and manifold mapping using the ViSOM. Neural Networks 15(8-9), 1005–1016 (2002) 11. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996) 12. Sharkey, A., Sharkey, N.: Combining diverse neural nets. Knowledge Engineering Review 12(3), 1–17 (1997) 13. Ruta, D., Gabrys, B.: An overview of classiﬁer fusion methods. Computing and Information Systems 7(1), 1–10 (2000) 14. Georgakis, A., Li, H., Gordan, M.: An ensemble of SOM networks for document organization and retrieval. In: International Conference on Adaptive Knowledge Representation and Reasoning (AKRR 2005), pp. 6–141 (2005) 15. Baruque, B., Corchado, E.: A weighted voting summarization of SOM ensembles. In: Data Mining and Knowledge Discovery (January 2010), doi:10.1007/s10618– 009–0160–3 16. Polani, D.: Measures for the organization of self-organizing maps. In: Seiﬀert, U., Jain, L.C. (eds.) Self-organizing Neural Networks: Recent Advances and Applications. Studies in Fuzziness and Soft Computing, vol. 16, pp. 13–44. Physica-Verlag, Heidelberg (2003) 17. Pozlbauer, G.: Survey and comparison of quality measures for self-organizing maps. In: Paralic, J., Polzlbauer, G., Andreas, R. (eds.) Fifth Workshop on Data Analysis (WDA 2004), pp. 67–82. Elfa Academic Press, London (2004) 18. Asuncion, A., Newman, D.J.: UCI machine learning repository (2007) 19. Vesanto, J., Sulkava, M., Hollmen, J.: On the decomposition of the self-organizing map distortion measure. In: Proceedings of the Workshop on Self-Organizing Maps (WSOM 2003), pp. 11–16 (2003) 20. Kaski, S., Lagus, K.: Comparing Self-Organizing Maps. In: von der Malsburg, C., Vorbrüggen, J.C., von Seelen, W., Sendhoﬀ, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 809–814. Springer, Heidelberg (1996)

CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks Cristian Pinzón1,2, Álvaro Herrero3, Juan F. De Paz1, Emilio Corchado1, and Javier Bajo1 1

Departamento de Informática y Automática, Universidad de Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain {cristian_ivanp,fcofds,escorchado,jbajope}@usal.es 2 Universidad Tecnológica de Panamá, A.P: 0819-07289, Panamá, Rep. De Panamá 3 Department of Civil Engineering, University of Burgos, Spain C/ Francisco de Vitoria s/n, 09006 Burgos, Spain [email protected]

Abstract. One of the most serious security threats to recently deployed databases has been the SQL Injection attack. This paper presents an agent specialised in the detection of SQL injection attacks. The agent incorporates a Case-Based Reasoning engine which is equipped with a learning and adaptation capacity for the classification of malicious codes. The agent also incorporates advanced algorithms in the reasoning cycle stages. The reuse phase uses an innovative classification model based on a mixture of a neuronal network together with a Support Vector Machine in order to classify the received SQL queries in the most reliable way. Finally, a visualisation neural technique is incorporated, which notably eases the revision stage carried out by human experts in the case of suspicious queries. The Classifier Agent was tested in a real-traffic case study and its experimental results, which validate the performance of the proposed approach, are presented here. Keywords: SQL Injection, Intrusion Detection, CBR, SVM, Neural Networks.

1 Introduction Over recent years, one of the most serious security threats around databases has been the SQL Injection attack [1]. In spite of it being a well-known type of attack, the SQL injection remains at the top of the published threat list. The solutions proposed so far [2], [3], [4], [5], [6], [7], [8] seem insufficient to prevent and block this type of attack because these solutions lack the learning and adaptation capabilities for dealing with attacks and their possible variations in the future. In addition, the vast majority of solutions are based on centralized mechanisms with little capacity to work in distributed and dynamic environments. This study presents the intelligent agent CBRid4SQL (a CBR Intrusion Detector), capable of detecting attacks based on SQL code injection. CBRid4SQL is an agent specially designed following the strategy of an Intrusion Detection System (IDS) and is defined as a Hybrid Artificial Intelligence System (HAIS). This agent is E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 510–519, 2010. © Springer-Verlag Berlin Heidelberg 2010

CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks

511

the principal component of a distributed hierarchical multi-agent system aimed at detecting attacks in dynamic and distributed environments. The CBRid4SQL agent is a CBR agent [9] that is characterized by the integration of a CBR (Case-Based Reasoning) mechanism. This mechanism provides the agents with a greater level of adaptation and learning capability, since CBR systems make use of past experiences to solve new problems [9]. This is very effective for blocking SQL injection attacks as the mechanism uses a strategy based on anomaly detection [10]. Additional to the incorporated CBR motor in the CBRid4SQL agent’s internal structure, an integrated mixture through an Artificial Neural Network (ANN) and a Support Vector Machine (SVM) are used as a mechanism of classification. Through the use of this mixture, it is possible to exploit the advantages of both strategies in order to classify the SQL queries in a more reliable way. Finally, to assist the expert in the making of decisions regarding those queries classified as suspicious, a visualization mechanism is proposed which combines clustering techniques and neural models to reduce the dimensionality based on unsupervised learning. The rest of the paper is structured as follows: section 2 presents the problem that has prompted most of this research work. Section 3 explains the internal structure of the CBRid4SQL agent used as a classifier agent. Finally, the conclusions and experimental results of this work are presented in section 4.

2 SQL Injection Attacks An SQL injection attack takes place when a hacker changes the semantic or syntactic logic of an SQL text string by inserting SQL keywords or special symbols within the original SQL command which is executed at the database layer of an application [1]. Different attack techniques exist which include the use of SQL Tautologies, Logic errors / Illegal queries, union query and piggy-backed queries. Other more advanced techniques use injection based on interference and alternative codification [1]. The cause of the SQL injection attacks is relatively simple: an inadequate input validation on the user interface. As a result of this attack, a hacker can be responsible for unauthorized data handling, retrieval of confidential information, and in the worst possible case, taking over control of the application server [1]. Different strategies have been presented as a solution to the problem of SQL injection attacks [1], with special attention given to strategies based on IDSs [2], [3], [4], [5], [6], [7], [8]. One approach based on anomaly detection was proposed by [2], applying a clustering strategy to group similar queries and isolate queries which are considered malicious. The main disadvantage of this approach is in its high computational overhead which would affect a real-time detection. Kemalis and Tzouramanis propose SQL-IDS (SQL Injection Detection System) [3] that uses security specifications to capture the syntactic structure of the SQL queries generated by the applications. The main limitation of this approach is the computational cost while comparing the new query with the predefined structure at runtime. In [4] two types of SQL injection attacks are raised: tautology attacks and those based on the UNION operator. Through the syntactic analysis of SQL query strings, the data of the HTTP requests are extracted to later be used in the training phase and

512

C. Pinzón et al.

to determine the threshold to use in the evaluation phase. Bertino, Kamra and Early [4] propose an anomaly detection mechanism applying data mining techniques. The main problem of this approach is to find an adequate threshold to maintain a low rate of both false positives and false negatives. Another anomaly-based approach is proposed by Robertson, Vigna, Kruegel and Kemmerer [6]. The approach uses generalisation techniques to convert suspicious requests within abnormal signatures. These signatures are later used to group malicious requests which present similar characteristics. Another of the techniques used is characterization; deducing the type of attack associated with the malicious request. A low computational overhead is generated. However, it is susceptible to generating false positives. The algorithm ID3, presented by Garcia, Monroy and Quintana [7], proposes the detection of attacks targeted at web applications. The algorithm ID3 is used to detect and filter malicious SQL string. This approach presents a significant percentage of incorrect classifications. Valeur, Mutz, and Vigna [8] propose the use of anomaly detection through the generation of a series of models beginning with a set of recovered queries. At execution time, they monitor the applications in order to identify requests which are not associated with the aforementioned models.

3 An Agent for Detecting SQL Injection Attacks Agents are characterized by their autonomy, which gives them the ability to work independently and in real-time environments [11]. The CBRid4SQL agent presented in this study interacts with other agents within the architecture. These agents carry out tasks related to capturing messages, syntactic analysis, administration, and user interaction. As opposed to the tasks for these agents, the CBRid4SQL agent executes classification SQL queries that we will subsequently define in greater detail. CBR is a paradigm which is based on the idea that similar problems have similar solutions. Thus, a new problem is resolved by consulting the case memory to find a similar case which has been resolved in the past. When working with this type of system, the key concept is that of “case”. A case is defined as a previous experience and is composed of three elements: a description of the problem that depicts the initial problem; a solution that describes the sequence of actions performed in order to solve the problem; and the final state, which describes the state that has been achieved once the solution is applied. As previously mentioned, the CBRid4SQL agent is an specialization of a CBR agent which is the key component of a multi-agent architecture and is geared towards classifying SQL queries for the detection of SQL injection attacks. Below, a new classification mechanism incorporated in the internal structure of the CBRid4SQL agent is explained in detail. 3.1 CBRid4SQL Agent In this section the CBRid4SQL agent is presented, with special attention paid to its internal structure and the classification mechanism of SQL attacks. This mechanism combines the advantages of CBR systems, such as learning and adaptation, with the predictive capabilities of a combination integrated by ANNs and SVMs. The use of

CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks

513

this combination of techniques is based on the possibility of using two classifiers together to detect suspicious queries in the most reliable way possible. In terms of CBR, the case is composed of elements of the SQL Query described as follows: (a) Problem Description that describes the initial information available for generating a plan. The problem description consists of: case identification, user session and SQL query elements. (b) Solution that describes the action carried out in order to solve the problem description, in this case, prediction models. (c) Final State that describes the state achieved after that the solution has been applied. The fields defining a case are as follows: IdCase, Session, User, IP_Address, Query_SQL, Affected_table, Affected_field, Command_type, Word_GroupBy, Word_Having, Word_OrderBy, Numer_And, Numer_Or, Number_literals, Number_LOL, Length_SQL_String, Start_Time_Execution, End_Time_Execution, and Query_Category. Additionally, the information related to the prediction models used is stored as well.

Fig. 1. CBR cycle and classification mechanism of the CBRid4SQL agent

In Figure 1, the different stages applied in the reasoning cycle can be seen. In summary, in the retrieval stage, there is a selection of queries sorted by type and by the memory’s classification models. In the reuse phase, as seen in figure 1, a Multilayer Perceptron (MLP) and an SVM are applied simultaneously to carry out the prediction of the new query. Subsequently, a new inspection is performed which can be done automatically or by a human expert. In the case of the query resulting as suspicious, further inspection will be carried out manually by a human expert. At this stage the most similar cases will be selected by means of a Growing Cell Structures (GCS) network [12], visualized by a dimensionality reduction technique which employs the

514

C. Pinzón et al.

neuronal model called Cooperative Maximum Likelihood Hebbian Learning (CMLHL). As a result, the human expert will graphically see the relationship between the suspicious query and the recovered queries. During learning, memory information regarding the cases and models will be updated. Below, the different stages of the CBR reasoning cycle associated with the system are described in more detail. 3.1.1 Retrieve The retrieval phase is broken down into two phases; case retrieval and model retrieval. Case retrieval is performed by using the Query_Category attribute which retrieves queries from the case memory (Cr) which were used for a similar query in accordance with attributes of the new case cn. Subsequently, the models for the multilayer perceptron mlpr and svmr associated with the recovered cases are retrieved. The recovery of these memory models allows the improvement of the system’s performance so that the time necessary for the creation of models will be considerably reduced, mainly in the case of the ANN training. 3.1.2 Reuse The reuse phase is performed beginning with the information of the retrieved cases Cr and the recovered models mlpr and svmr. The combination of both techniques is fundamental in the reduction of the rate of false negatives. The inputs of the MLP are: Query_SQL, Affected_table, Affected_field, Command_type, Word_GroupBy, Word_Having, Word_OrderBy, Numer_And, Numer_Or, Number_literals, Number_LOL, and Length_SQL_String. The number of neurons in the hidden layer is 2n+1, where n is the number of neurons in the input layer. Finally, at output, one neuron is had. The activation function selected for the different layers has been the sigmoid. Taking into account the activation function fj, the calculation of output values are given by the following expression N

y jp = fj (∑ w ji(t ) xip(t) + θ j )

(1)

i =1

r

The outputs correspond to x . As the neurons exiting from the hidden layer of the neural network contain sigmoidal neurons with values between [0, 1], the incoming variables are redefined so that their range falls between [0.2-0.8]. This transformation is necessary because the network does not deal with values that fall outside of this range. The outgoing values are similarly limited to the range of [0.2, 0.8] with the value 0.2 corresponding to a non-attack and the value 0.8 corresponding to an attack. The network training is carried out through the error Backpropagation Algorithm [13]. At the same time as the estimation through the use of neuronal networks is performed, estimation is also carried out by the SVM application, a supervised learning technique applied to the classification and regression of elements. The algorithm represents an extension of nonlinear models [14]. SVM also allows the separation of element classes which are not linearly separable. For this the space of initial coordinates is mapped in a high dimensionality space through the use of functions. Due to the fact that the dimensionality of the new space can be very high, it is not feasible to calculate hyperplanes that allow the production of linear separability. For this, a series of non-linear functions called kernels is used.

CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks

Let us consider a set of patterns

T = {( x1 , y1 ), ( x2 , y2 ),...,( xm , ym )}

is a vector of the dimension n. The idea is to convert the elements

xi

515

where xi

in a space of

high dimensionality through the application of a function, in such a way that the set of original patterns is converted into the following set

Φ(T ) = {(Φ( x1 ), y1 ), (Φ( x2 ), y2 ),..., (Φ( xm ), ym )}

that, depending on the se-

lected function Φ (x ) , could be linearly separable. To carry out the classification, this equation sign is studied [15]:

⎛ m ⎞ class( xk ) = sign⎜ ∑ λi yi Φ( xi )Φ( xk ) + b ⎟ ⎝ i =1 ⎠

(2)

The selected kernel function in this problem was polynomial. The values used for the estimation are dominated by decision values and are related to the distance from the points to the hyperplane. Once the output values for the ANN and the SVM are obtained, the mixture is performed by way of a weighted average in function of the error rate of each one of the techniques. Before carrying out the average, the values are normalized to the interval [0,1], as SVM provides positive and negative values and those of greater magnitude, so that it could affect the final value in greater measure if it is not redimensioned. 3.1.3 Revise The revise phase can be manual or automatic depending on the output values. The automatic review is given for non-suspicious cases during the estimation obtained for the reuse phase. For cases detected as suspicious, with output values determined experimentally in the interval [0.35, 0.6]), a review by a human expert is performed. As CBR learns, the interval values are automatically adjusted to the smallest of the false negatives. The greater limit is constantly maintained throughout the iterations. The review consists of recovering queries similar to the current one together with previous classifications. This combines a clustering technique for the selection of similar requests with a neuronal model for the reduction of dimensionality, which permits visualisation in 2D or 3D. The selection of similar cases is carried out through the use of a neuronal GCS network, the different cases are distributed in meshes and the mesh in which the new case is found is selected. To visualize the cases (those in the selected mesh), the dimensionality of data is reduced by means of the CMLHL neuronal model [16] which performs Exploratory Projection Pursuit by unsupervised learning. Considering an Ndimensional input vector ( x ), and an M-dimensional output vector ( y ), with Wij being the weight (linking input

j to output i ), then CMLHL can be expressed as:

Feed-forward step: Lateral activation passing:

N

yi = ∑Wij x j ,∀i

(3)

y i (t + 1) = [ y i (t) + τ (b − Ay )]

+

(4)

516

C. Pinzón et al.

Feedback step:

M

e j = x j − ∑ Wij yi , ∀j º

(5)

ΔWij = η . yi .sign(e j ) | e j | p −1

(6)

i =1

Weight change:

η is the learning rate, τ is the “strength” of the lateral connections, b the bias parameter, p a parameter related to the energy function [14], [15] and A is a Where:

symmetric matrix used to modify the response to the data [14]. The effect of this matrix is based on the relation between the distances separating the output neurons. Finally, the information is represented and the associated queries are recovered with the retrieved mesh, as can be seen in Fig. 2. 3.1.4 Retain The learning phase updates the information of the new classified case and reconstructs the classifiers offline to leave the system available for new classifications. The ANN classifier is reconstructed only when an erroneous classification is produced. In the case of a reference to inspection of suspicious queries, information and classifiers are updated when the expert updates the information.

4 Experimental Results and Conclusions A sample web application with access to a MySQL 5.0 database was developed to check the proposed approach. Once the database had been created, legal queries were sent from the designed user interfaces. In the case of malicious queries, the dispatch of the queries was automated using the agent SQLMAP0.5 [17]. This tool is able to fingerprint an extensive DBMS back-end, retrieve remote DBMS databases and so on. To analyze the successful rates, a test of the classification of queries was conducted, taking into account the following classifiers: Bayesian Network, Naive Bayes, AdaBoost M1, Bagging, DecisionStump, J48, JRIP, LMT, Logistic, LogitBoost, MultiBoosting AdaBoost, OneR, SMO, Stacking. The different classifiers were applied to 705 previously classified queries (437 legal, 268 attacks). The consecutive process to carry out the output test was the following: selecting one of the cases, extracting it from the set, conducting the model starting from the remaining cases and classifying the extracted case. This process is repeated for each one of the cases and techniques in order to analyze each query without it being used to build the model. The final result of the classification can be seen in Table 1. Table 1. Total number of hits for the different classifiers

Method BayesNet Bagging JRIP LogitBoost SMO

638 684 692 680 685

Method Naive Bayes DecisionStump LMT MultiBoostAB Stacking

666 598 693 666 437

Method AdaBoostM1 J48 Logistic OneR CBRid4SQL

665 689 688 622 698

CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks

517

As can be seen in Table 1, the higest-performance system is CBRid4SQL, which has a success rate of 698/705. The number of queries detected as suspicious was limited to 7 being one of those shown below: select pedido_cliente.id_pedido, linea, codigo, nombre, precio from pedido_lineas, pedido_cliente, producto where pedido_cliente.id_pedido = pedido_lineas.id_pedido and producto.codigo = pedido_lineas.codigo and pedido_cliente.id_pedido = 1 OR 1 = 1 order by fecha desc

This query represents an attack on the database since the presence of OR 1=1implies the retrieval of a number of records not associated with requests from the client. The value obtained by the ANN for this query was 0.28. However SVM deemed that the output value was 0.66. The mixture gave an output value of 0.47, which is in the range of suspicious queries. If the ANN had been applied alone it would have considered this query as a valid one. However, the SVM would have considered it as an attack. The mixture deemed suspicious and a review would be carried out manually.

Fig. 2. SQL queries recovered in the revise stage

During the manual review similar queries are recovered and dimensionality is reduced. In Figure 2 the obtained results to be shown to the human expert can be seen. The most similar queries are coloured: queries that correspond as legal are shown in green, attacks are in red and current queries are in blue. Non-recovered queries are shown in black. A set of queries different from the rest is recovered, both normal and abnormal. An example of each of these is shown below. select pedido_cliente.id_pedido, linea, codigo, nombre, precio from pedido_lineas, pedido_cliente, producto where pedido_cliente.id_pedido = pedido_lineas.id_pedido and producto.codigo = pedido_lineas.codigo and pedido_cliente.id_pedido = 1 AND ORD(MID((CONCAT(CHAR(55), CHAR(55))), 1, 1)) > 63 order by fecha desc select pedido_cliente.id_pedido, linea, codigo, nombre, precio from pedido_lineas, pedido_cliente, producto where pedido_cliente.id_pedido = pedido_lineas.id_pedido and producto.codigo = pedido_lineas.codigo and pedido_cliente.id_pedido = 1 AND 1 = 1 order by fecha desc

The first of the queries is a clear attack, while second of the queries could also present uncertainties due to the presence of the literal 1=1. Being that a query more restrictive than the original, it would retrieve the same values or less, which would not be a very

518

C. Pinzón et al.

intelligent for an attack. In any case, the system considers it as such and provides an output value of 0.66 and it is thus also filtered a priori, but this should not be worrying. This is one of the false positives that the system presents within the 7 existing in the 705. The combination of different paradigms of AI allows the development of a HAIS with characteristics such as the capacity for learning and reasoning, flexibility and robustness which make the detection of SQL injection attacks possible. The proposed CBRid4SQL agent is capable of detecting these abnormal situations with low error rates compared with other existing techniques, as demonstrated in Table 1. It also provides a decision mechanism which eases the review of suspicious queries through the selection of similar queries and their visualization using neuronal models. Acknowledgments. This development has been partially supported by the Spanish Ministry of Science and Technology project OVAMAH: TIN 2009-13839-C03-03, Junta of Castilla and León (JCyL): [BU006A08], the project of the Spanish Ministry of Education and Innovation [CIT-020000-2008-2] and [CIT-020000-2009-12], and Grupo Antolin Ingenieria, S.A., within the framework of project MAGNO2008 1028.- CENIT also funded by the same Government Ministry and The Professional Excellence Program 2006-2010 IFARHU-SENACYT-Panama.

References 1. Halfond, W.G.J., Viegas, J., Orso, A.: A Classification of SQL-Injection Attacks and Countermeasures. In: Proceedings of the IEEE International Symposium on Secure Software Engineering, Arlington, VA, USA (2006) 2. Bockermann, C., Apel, M., Meier, M.: Learning SQL for Database Intrusion Detection Using Context-Sensitive Modelling (Extended Abstract). In: Flegel, U., Bruschi, D. (eds.) DIMVA 2009. LNCS, vol. 5587, pp. 196–205. Springer, Heidelberg (2009) 3. Kemalis, K., Tzouramanis, T.: SQL-IDS: a specification-based approach for SQL-injection detection. In: Proceedings of the 2008 ACM symposium on Applied computing (SAC 2008). ACM, New York (2008) 4. Kiani, M., Clark, A., Mohay, G.: Evaluation of Anomaly Based Character Distribution Models in the Detection of SQL Injection Attacks. In: Third International Conference on Availability, Reliability and Security (ARES 2008). IEEE Computer Society, Washington (2008) 5. Bertino, E., Kamra, A., Early, J.: Profiling Database Applications to Detect SQL Injection Attacks. In: Proceedings of the Performance, Computing, and Communications Conference, IPCCC 2007 (2007) 6. Robertson, W., Vigna, G., Kruegel, C., Kemmerer, R.A.: Using Generalization and Characterization Techniques in the Anomaly-Based Detection of Web Attacks. In: 13th Annual Network and Distributed System Security Symposium, NDSS 2006 (2006) 7. García, V.H., Monroy, R., Quintana, M.: Web Attack Detection Using ID3. In: International Federation for Information Processing (2006) 8. Valeur, F., Mutz, D., Vigna, G.: A Learning-Based Approach to the Detection of SQL Attacks. In: Julisch, K., Krügel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 123–140. Springer, Heidelberg (2005)

CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks

519

9. Corchado, J.M., Laza, R.: Constructing deliberative agents with case-based reasoning technology. International Journal of Intelligent Systems 18, 1227–1241 (2003) 10. Mukkamala, S., Sung, A.H., Abraham, A.: Intrusion detection using an ensemble of intelligent paradigms. Journal of Network and Computer Applications 28(2), 167–182 (2005) 11. Carrascosa, C., Bajo, J., Julian, V., Corchado, J.M., Botti, V.: Hybrid multi-agent architecture as a real-time problem-solving model. Expert Systems with Applications 34(1), 2–17 (2008) 12. Fritzke, B.: A Growing Neural Gas Network Learns Topologies. In: Advances in Neural Information Processing Systems, vol. 7. MIT Press, Cambridge (1995) 13. LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient BackProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998) 14. Corchado, E., Fyfe, C.: Connectionist Techniques for the Identification and Suppression of Interfering Underlying Factors. International Journal of Pattern Recognition and Artificial Intelligence 17(8), 1447–1466 (2003) 15. Corchado, E., MacDonald, D., Fyfe, C.: Maximum and Minimum Likelihood Hebbian Learning for Exploratory Projection Pursuit. Data Mining and Knowledge Discovery 8(3), 203–225 (2004) 16. Herrero, Á., Corchado, E., Sáiz, L., Abraham, A.: DIPKIP: A Connectionist Knowledge Management System to Identify Knowledge Deficits in Practical Cases. Computational Intelligence 26(1), 26–56 (2010) 17. Damele, B.: SQLMAP0.5 – Automated SQL Injection Tool (2007)

Author Index

Abu´ın, Javier Sanchez I-524 Aguilar, Ramiro II-53 Alberdi, Amaia I-327 Alonso, J.B. I-302 Alonso, Luis II-53 Alonso, Ricardo S. II-111 Alonso-R´ıos, D. II-217 ´ Alvarez, A. I-302 ´ Alvarez, Ignacio I-468, I-476 ´ Alvarez-S´ anchez, Jos´e Ram´ on I-245 Amanatiadis, A. II-391 Arana, R. I-270 Aranda-Corral, Gonzalo A. II-383 ´ Araujo, Alvaro II-486 Ara´ ujo, Ricardo de A. II-351 Arbelaitz, Olatz II-151 Argente, Estefan´ıa II-159, II-193 Armano, Giuliano I-548 Artaza, F. I-343 Artaza, Fernando I-368 Asla, N. I-286 Assadipour, Ghazal I-359 ´ Avila-Jim´ enez, Jos´e Luis II-9 Bahig, Hatem M. II-209 Baig, Abdul Rauf I-56 Bajo, Javier I-96, II-444, II-510 Baldassarri, Paola II-296 Bankovi´c, Zorana II-486 Barroso, N. I-196 Barroso, O. I-196 Baruque, Bruno II-501 Batista, Vivian F. L´ opez I-104 Batouche, Mohammed I-48 Bellas, Francisco I-88 Berlanga, Antonio II-436 Bernardo, Jon Alzola I-319 Bernardos, Ana M. II-468 Blankertz, Benjamin I-413 Blesa, Javier II-486 Borrego-D´ıaz, Joaqu´ın II-383 Bot´ıa, Juan A. I-64 Botia, Juan A. I-80 Boto, Fernando I-500, I-524 Botti, Vicente II-177

Bragaglia, Stefano I-438 Burduk, Robert I-532 Buza, Krisztian I-557 Caama˜ no, Pilar I-88 Campos, Jordi II-168 Cano, Alberto II-17 Carbonero-Ruz, M. II-280 Carrascal, Alberto I-327 Casar, Jos´e R. II-468 Casillas, Jorge II-1 Casta˜ nos, David Lecumberri I-492 Castelo, Francisco Javier Perez I-385 Castro, Paula M. II-248 Cavazos, Alberto I-429 Chaves, Rosa I-446, I-452, I-468, I-476, I-516 Chen, Jungan II-201 Chen, Wenxin II-201 Chesani, Federico I-438 Chinga-Carrasco, Gary I-144 Chira, Camelia I-405, II-119 Chmielnicki, Wieslaw I-162 Chyzyk, Darya II-429 Ciampolini, Anna I-438 Cilla, Rodrigo II-436 Cimenbicer, Cem I-178 Claver, Jos´e M. II-233 Corchado, Emilio II-101, II-501, II-510 Corchado, Juan M. II-53, II-85, II-93, II-444 Corrales-Garc´ıa, Alberto II-233 Couso, In´es II-45 Cri¸san, Gloria-Cerasela I-405 Cruz-Ram´ırez, M. II-280, II-288 Cuadra, J.M. I-245 Cuenca, Pedro II-225 Cuevas, F.J. I-40 Cyganek, Boguslaw I-254 d’Anjou, Alicia II-241 Dapena, Adriana II-248 de Blas, Mariano I-500 ´ de Gauna, Oscar Berasategui Ruiz

I-319

522

Author Index

de Goyeneche, Juan-Mariano II-486 de Ipi˜ na, K. L´ opez I-196, I-286, I-508 de la Cal, E.A. II-143 de la Cal, Enrique I-421 de la Hoz Rastrollo, Ana Bel´en I-492 delaPaz, F. I-245 de la Prieta, Fernando II-61, II-93 de Lope, Javier II-77 del R´ıo, B. Baldonedo II-217 de Luis, Ana I-96 de Mendivil, Rafael Yuguero Gonz´ alez I-319 De Miguel Catoira, Alberto I-395 De Paz, Juan F. II-510 De Paz, Juan F. I-229, II-85, II-111 D´ıaz, I. I-237 Djaghloul, Haroun I-48 Dom´ınguez, Ra´ ul II-77 Dong, Jun I-136 Dragoni, Aldo Franco II-296 Duro, Richard J. I-88 Ercan, M. Fikret I-24 Esgin, Eren I-178 Esmi, Estev˜ ao II-343 Esparcia, Sergio II-159, II-193 Esseghir, M.A. I-351 Esteva, Marc II-168 Ezeiza, A. I-196 Fenner, Trevor I-152 Fern´ andez, E. I-245 Fern´ andez, E.M. II-143 Fern´ andez-Escribano, Gerardo II-233 Fernandez-Gauna, Borja I-73, II-312, II-335 Fern´ andez-Navarro, F. II-280 Ferr´ andez, J.M. I-245 Ferrer, M.A. I-302 Foresti, Gian Luca II-452 Fraga, David II-486 Fraile, Juan A. I-96 Fuangkhon, Piyabute I-128 Fuerte, Mercedes Villa II-135 Fuertes, Juan Jos´e I-302 Gaji´c, Vladeta I-205 Gald´ os, Andoni I-213 Garay, Naiara Telleria I-492 Garcia, Ander II-151

Garc´ıa-Fornes, Ana II-193 Garc´ıa, Guillermo I-500, I-524 Garc´ıa-Guti´errez, Jorge II-272 Garcia-Gutierrez, Jorge II-493 Garc´ıa, Jes´ us II-460 Garc´ıa, Mar´ıa N. Moreno I-104 Garc´ıa-Naya, Jos´e A. II-248 ´ Garc´ıa, Oscar II-111 Garcia, Ramon Ferreiro I-385, I-395 Garc´ıa-Sedano, Javier A. I-319 Garc´ıa-Tamargo, Marco I-421 Garcia-Valverde, Teresa I-80 Garc´ıa, Yazmani II-127 Garrido, Antonio II-225 Garrido-Cantos, Rosario II-225 Garro, Beatriz A. I-376 Gasc´ on-Moreno, J. II-304 Ghorbani, Ali A. I-1 Gibaja, Eva II-9 Gil, Ana II-61, II-85 ´ Gil, Oscar II-111 Goenetxea, Jon I-213 G´ omez-Garay, Vicente I-368 Gomez, L.E. I-40 G´ omez, V. I-343 Goncalves, Gilles I-351 Gonz´ alez, Ang´elica II-111 Gonz´ alez, Asier Gonz´ alez I-319 Gonz´ alez, Michel II-1 Gorawski, Marcin I-187 G´ orriz, J.M. I-452, I-460, I-476, I-484, I-516 G´ orriz, Juan-Manuel I-446, I-468 Graczyk, Magdalena I-581 Gra˜ na, Manuel I-500, I-524 Guti´errez, P.A. II-280 Hatami, Nima I-548 Heras, Stella II-177 Hern´ andez, Angeles I-429 Hern´ andez, Carmen II-69 Hern´ andez, M.C. I-508 Hern´ andez, Paula Hern´ andez II-135 ´ Herrero, Alvaro II-101, II-510 Herv´ as-Mart´ınez, C. II-280, II-288 He, Xingxing II-320 Hillairet, Guillaume I-311 Hoﬀmann, Matej II-478 Hogan, Emilie II-399

Author Index Ibarguren, A. I-270 Iglesia, Daniel II-248 Ill´ an, I.A. I-446, I-452, I-516 Iragorri, Eider Egilegor I-492 Irigoyen, E. I-286, I-343 Irigoyen, Eloy I-368 Jabeen, Hajira I-56 Jackowski, Konrad I-540 Jeon, Sungchae I-278 Jessel, Jean-Pierre I-48 Jiang, Jianmin I-120 Jimenez, J.F. I-40 Jolai, Fariborz I-359 Joslyn, Cliﬀ II-399 Juli´ an, Vicente II-101, II-177, II-193 Jureczek, Pawel I-187 Kaburlasos, Vassilis G. II-391, II-410 Kajdanowicz, Tomasz I-573 Kazienko, Przemyslaw I-573 Keck, I.R. I-460 Kim, Eunyoung I-278 Kim, Minkyung I-278 Koci´c-Tanackov, Sunˇcica I-32 Kodewitz, A. I-460 Koene, Randal A. II-478 Kotb, Yasser II-209 Kramer, Oliver I-221, I-262 Kraszewski, Jan I-573 Lang, Elmar I-468 Lang, Elmar W. I-460 Larrea, M. I-343 Lasota, Tadeusz I-581 L¨ assig, J¨ org I-262 Legarreta, Jon Haitz I-500, I-524 Liang, Feng II-201 Liang, Ximing I-24 Linaza, Maria Teresa II-151 Liu, Jun II-320, II-328 Liu, Xia I-136 Li, Xiang I-24 Li, Yingfang II-320 Llinas, James I-14 L´ opez-Guede, Jos´e Manuel I-73, I-492, II-241, II-312 L´ opez, Miriam I-446, I-452, I-468, I-476, I-516

L´ opez, Otoniel II-256 L´ opez-S´ anchez, Maite II-168 L´ opez, Vivian F. II-53, II-61 Lorente, V. I-245 Lucas, Joel Pinho I-104 Luna, J.M. II-27 Lu, Xiaofen I-335 Mac´ıa, Iv´ an I-500, I-524 Madrazo, Eva II-468 Maiora, Josu I-500, I-524 Malag´ on, Pedro II-486 Malumbres, Manuel P. II-256 Maravall, Dar´ıo II-77 Mart´ı, Enrique II-460 Mart´ınez, E. I-508 Mart´ınez-Estudillo, F.J. II-288 Mart´ınez, Jos´e Luis II-225, II-233 Martinez, Luis II-320, II-328 Mart´ınez-Otzeta, J.M. I-270 Mart´ınez, R. I-286 Mart´ınez-Rach, Miguel II-256 Martin, Marcel I-221 Mart´ın, M. Jos´e Polo I-104 Mata-Jim´enez, Marco-Tulio I-429 Matei, O. II-119 Mateos-Garc´ıa, Daniel II-272 Mateos-Garcia, Daniel II-493 Maycock, Jonathan I-221 Mazzieri, Mauro II-296 Mello, Paola I-438 Mendez, Gerardo M. I-429 Mirkin, Boris I-152 Molina, Jose M. II-436 Molina, Jose Manuel II-460 Montali, Marco I-438 Monta˜ n´es, E. I-237 Morell, Carlos II-1 Moreno, Aitor I-213 Moreno, Mar´ıa N. II-53 Moreno, Ram´ on II-241 Mosqueira-Rey, E. II-217 Mosquera, Antonio II-264 Moya, Jos´e M. II-486 M¨ uller, Klaus-Robert I-413 Mu˜ noz, Andr´es I-64 Nanopoulos, Alexandros I-557 Nascimento, Susana I-152 Navarro, Mart´ı II-101

523

524

Author Index

Nieto-Taladriz, Octavio Novo, Jorge II-264

II-486

Ochoa, Alberto II-127 Olivares, Alberto I-484 Olivares, Gonzalo I-484 Oliver, Jos´e II-256 Onaindia, Eva II-185 Onut, Iosif-Viorel I-1 Ortiz-Garc´ıa, E.G. II-304 Oses, Noelia II-478 Otero, Jos´e II-45 Padilla, Pablo I-446, I-452, I-468, I-476, I-516 Paloc, C´eline I-500, I-524 Pan, Xiaodong II-328 Papadakis, S.E. II-391 Patricio, Miguel A. II-436 Pauplin, Olivier I-120 Pechenizkiy, Mykola II-35 Pel´ aez-Moreno, Carmen II-375 Pe˜ na, Carlos Pertusa I-492 Penedo, Manuel G. II-264 Pereira, Lu´ıs Moniz I-152 P´erez-Bellido, A.M. II-304 P´erez, Javier I-229 P´erez-Lancho, Bel´en II-444 Pi˜ nol, Pablo II-256 Pintea, Camelia-M. I-405 Pinz´ on, Cristian II-510 Pinz´ on, Cristian I-229 Pop, P.C. II-119 Portilla-Figueras, A. II-304 Prado-Gesto, D. II-217 Prieto, Abraham I-88 Puntonet, C.G. I-476, I-516 Quevedo, J.R. I-237 Quiroga, R. II-143 Ram´ırez, Javier I-452, I-468, I-476, I-484, I-516 Ram´ırez, J. I-446 Ram´ırez Moreno, M.C. II-288 Ramos, Luc´ıa II-264 Ranilla, J. I-237 Raveaux, Romain I-311 Reyes, Laura Cruz II-135 Riquelme-Santos, Jos´e C. II-272, II-493 Ritter, Gerhard X. II-359, II-367

Rodr´ıguez-Poch, E. II-217 Rodr´ıguez-S´ anchez, Rafael II-233 Rodr´ıguez, Sara II-85, II-93, II-444 Rolle, Jose Luis Calvo I-385 Romero, Elena II-486 Romero, J.R. II-27 Rouco, Jos´e II-264 Ruan, Da II-320, II-328 Ruz, Mariano Carbonero II-288 Salamat, Nadeem I-294 Salas-Gonzalez, D. I-446, I-452, I-476, I-516 Salas-Gonz´ alez, Diego I-468 Salcedo-Sanz, S. II-304 S´ anchez-Anguix, V´ıctor II-193 S´ anchez, Jos´e Luis II-233 S´ anchez, Luciano II-45 S´ anchez-Monedero, Javier II-288 Sannelli, Claudia I-413 Santill´ an, Claudia G´ omez II-135 Sanz, Beatriz Ferreiro I-395 Sapena, Oscar II-185 Satzger, Benjamin I-262 Savio, Alexandre II-429 Schmidt, Florian Paul I-221 Schmidt-Thieme, Lars I-557 Sedano, J. II-143 Sedano, Javier I-421 Segovia, Fermin I-446, I-452, I-468, I-476, I-516 ´ Segura, Alvaro I-213 Senkul, Pinar I-178 Seo, Changwoo I-278 Serrano, Emilio I-80 Simi´c, Dragan I-32, I-205 Simi´c, Svetlana I-205 ˙ S ¸ im¸sek, Irfan I-170 Sitar, C. Pop II-119 Slimani, Yahya I-351 Snidaro, Lauro II-452 Sossa, Humberto I-40, I-376, II-418 Sottara, Davide I-438 Souﬀriau, Wouter II-151 Sremac, Siniˇsa I-32 Sta´ nczyk, Urszula I-565 St¸apor, Katarzyna I-162 Susperregi, U. I-196 Sussner, Peter II-343, II-351

Author Index Tanackov, Ilija I-32 Tang, Ke I-335 Tanprasert, Thitipong I-128 Tapia, Dante I. I-96, II-93 Telec, Zbigniew I-581 Tellaeche, A. I-270 Tepi´c, Jovan I-32 Termenon, Maite II-429 Teymanoglu, Yaddik II-127 Tom´e, A.M. I-460 Tong, Jia-fei I-136 Topuz, Vedat I-112, I-170 Torre˜ no, Alejandro II-185 Travieso, Carlos M. I-302 Trawi´ nski, Bogdan I-581 Unzueta, Luis I-213 Urcid, Gonzalo II-359, II-367 Valdiviezo-N., Juan Carlos II-359 Valera, J. I-343 Vallejo, Juan Carlos II-486 Vallesi, Germano II-296 Valverde-Albacete, Francisco J. II-375 Vansteenwegen, Pieter II-151 Vaquero, C. I-508 V´ azquez, Roberto A. I-376, II-418 Veganzones, Miguel A. II-69

Vega, Pastora II-85 Ventura, Sebasti´ an II-9, II-17, II-27, II-35 Vidaurre, Carmen I-413 Vilches, Borja Ayerdi I-492 Villanueva, Daniel II-486 Villar, Jose R. I-421 Villar, J.R. II-143 Villaverde, Ivan II-335 Visentini, Ingrid II-452 Wozniak, Michal Xu, Yang

I-590

II-320, II-328

Yamakawa, Asuka I-144 Ya˜ nez, Javier II-127 Yao, Xin I-335 Yavuz, Erdem I-112 Zafra, Amelia II-17, II-35 Zahzah, El-hadi I-294 Zato, Carolina I-229, II-444 Zezzatti, Carlos Alberto Ochoa Ort´ız II-135 Zmyslony, Marcin I-590 Zulueta Guerrero, Ekaitz I-492 Zulueta, Ekaitz I-73, II-312, II-335

525