Advances in Intelligent and Soft Computing Editor-in-Chief: J. Kacprzyk
99
Advances in Intelligent and Soft Computing Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail:
[email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 87. E. Corchado, V. Snášel, J. Sedano, A.E. Hassanien, J.L. Calvo, ´ ezak (Eds.) and D. Sl˛ Proceedings of the 7th Atlantic Web Intelligence Conference, AWIC 2011, Fribourg, Switzerland, January 26–28, 2011 Soft Computing Models in Industrial and Environmental Applications, 6th International Workshop SOCO 2011 ISBN 978-3-642-19643-0 Vol. 88. Y. Demazeau, M. Pˇechoucˇek, J.M. Corchado, and J.B. Pérez (Eds.) Advances on Practical Applications of Agents and Multiagent Systems, 2011 ISBN 978-3-642-19874-8 Vol. 89. J.B. Pérez, J.M. Corchado, M.N. Moreno, V. Julián, P. Mathieu, J. Canada-Bago, A. Ortega, and A.F. Caballero (Eds.) Highlights in Practical Applications of Agents and Multiagent Systems, 2011 ISBN 978-3-642-19916-5
Vol. 93. M.P. Rocha, J.M. Corchado, F. Fernández-Riverola, and A. Valencia (Eds.) 5th International Conference on Practical Applications of Computational Biology & Bioinformatics 6-8th, 2011 ISBN 978-3-642-19913-4 Vol. 94. J.M. Molina, J.R. Casar Corredera, M.F. Cátedra Pérez, J. Ortega-García, and A.M. Bernardos Barbolla (Eds.) User-Centric Technologies and Applications, 2011 ISBN 978-3-642-19907-3 Vol. 95. Robert Burduk, Marek Kurzy´nski, ˙ Michał Wo´zniak, and Andrzej Zołnierek (Eds.) Computer Recognition Systems 4, 2011 ISBN 978-3-642-20319-0 Vol. 96. A. Gaspar-Cunha, R. Takahashi, G. Schaefer, and L. Costa (Eds.) Soft Computing in Industrial Applications, 2011 ISBN 978-3-642-20504-0
Vol. 90. J.M. Corchado, J.B. Pérez, K. Hallenborg, P. Golinska, and R. Corchuelo (Eds.) Trends in Practical Applications of Agents and Multiagent Systems, 2011 ISBN 978-3-642-19930-1
Vol. 97. W. Zamojski, J. Kacprzyk, J. Mazurkiewicz, J. Sugier, and T. Walkowiak (Eds.) Dependable Computer Systems, 2011 ISBN 978-3-642-21392-2
Vol. 91. A. Abraham, J.M. Corchado, S.R. González, J.F. de Paz Santana (Eds.) International Symposium on Distributed Computing and Artificial Intelligence, 2011 ISBN 978-3-642-19933-2
Vol. 98. Z.S. Hippe, J.L. Kulikowski, and T. Mroczek (Eds.) Human – Computer Systems Interaction: Backgrounds and Applications 2, 2012 ISBN 978-3-642-23186-5
Vol. 92. P. Novais, D. Preuveneers, and J.M. Corchado (Eds.) Ambient Intelligence - Software and Applications, 2011 ISBN 978-3-642-19936-3
Vol. 99. Z.S. Hippe, J.L. Kulikowski, and T. Mroczek (Eds.) Human – Computer Systems Interaction: Backgrounds and Applications 2, 2012 ISBN 978-3-642-23171-1
Zdzisław S. Hippe, Juliusz L. Kulikowski, and Teresa Mroczek (Eds.)
Human – Computer Systems Interaction: Backgrounds and Applications 2 Part 2
ABC
Editors Dr. Teresa Mroczek Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, 35-225 Rzeszów, Poland E-mail:
[email protected]
Dr. Zdzisław S. Hippe Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, 35-225 Rzeszów, Poland E-mail:
[email protected] Dr. Juliusz L. Kulikowski Polish Academy of Sciences, M. Nalecz Institute of Biocybernetics and Biomedical Engineering, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland E-mail:
[email protected]
ISBN 978-3-642-23171-1
e-ISBN 978-3-642-23172-8
DOI 10.1007/978-3-642-23172-8 Advances in Intelligent and Soft Computing
ISSN 1867-5662
Library of Congress Control Number: 2011936642 c 2012 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India Printed on acid-free paper 543210 springer.com
From the Editors
The history of human-system interactions is as long as this of human civilization. Human beings by natural evolution have been adapted to live in groups and to commonly fight for food and shelter against other groups or against the natural forces. The effects of this fight was depended on two basic factors: on ability to communicate among collaborating groups or persons and on capability to understand and to preview the principles and behavior of the opponent groups or forces. This, in fact, is also the main contemporary human-system interaction (H-SI) problem. A system is in this case – in a narrow sense – considered as a system created on the basis of electronic, optoelectronic and/or computer technology, in order to aid humans in reaching some of their vital goals. So-defined system is not only a passive tool in human hands; it is rather an active partner equipped with a sort of artificial intelligence, having access to large information resources, being able to adapt its behavior to the human requirements and to collaborate with the human users in order to reach their goals. The area of such systems’ applications practically covers most of human activity domains and is still expanding. Respectively, the scientific and practical H-SI problems need a large variety of sophisticated solution methods. This is why the H-SI problems in the last decades became an important and extensively growing area of investigations. In this book some examples of the H-SI problems and solution methods are presented. They can be roughly divided into the following groups: a) Human decisions supporting systems, b) Distributed knowledge bases and WEB systems, c) Disabled persons aiding systems, d) Environment monitoring and robotic systems, e) Diagnostic systems, f) Educational systems, and g) General H-SI problems. As usually, some papers to more than one class can be assigned and that is why the classification suits only to a rough characterization of the book contents. The human decisions supporting systems are presented by papers concerning various application areas, like e.g.: enterprises management (A. Burda and Z.S. Hippe; T. Żabiński and T. Mączka; S. Cavalieri), healthcare (E. Zaitseva), agricultural products storage (W. Sieklicki, M. Kościuk and S. Sieklicki), visual design (E.J. Grabska), sport trainings planning (J. Vales-Alonso, P. López-Matencio, J.J. Alcaraz, et al.). The papers by I. Rejer; J.L. Kulikowski; K. Harężlak and A. Werner; E. Nawarecki, S. Kluska-Nawarecka and K. Regulski; A. Grzech, A. Prusiewicz and M. Zięba; A. Andrushevich, M. Fercu, J. Hopf, E. Portmann and A. Klapproth to various problems of data and knowledge bases exploration in computer decision-aiding systems are devoted.
VI
From the Editors
The WEB-based, including distributed knowledgebases based systems, are presented in the papers by N. Pham, B.M. Wilamowski and A. Malinowski; M. Hajder and T. Bartczak. K. Skabek, R. Winiarczyk and A. Sochan present a concept of a distributed virtual museum. An interesting concept of managing the process of intellectual capital creation is presented by A. Lewicki and R. Tadeusiewicz. A document-centric instead of data-centric distributed information processing paradigm in a paper by B. Wiszniewski is presented. New computer networks technologies by K. Krzemiński and I. Jóźwiak and by P. Rożycki, J. Korniak and J. Kolbusz are described. The last two Authors also present a model of malicious network traffic. Selected problems of distributed network resources organization and tagging are presented by A. Dattolo, F. Ferrara and C. Tasso as well as by A. Chandramouli, S. Gauch and J. Eno. Various problems of disabled persons aiding by their communication with external systems improvement in a next group of papers are presented. The papers by M. Porta and A. Ravarelli and by D. Chugo, H. Ozaki, S. Yokota and K. Takase to physically disabled persons aiding systems are devoted. The spatial orientation and navigation aiding problems by P. Strumillo; A. Śluzek and M. Paradowski and by M. Popa are described. A proposal of an ubiquitous health supervising system by P. Augustyniak is presented. The problems of hand posture or motions recognition for disabled persons aiding by R.S. Choraś and by T. Luhandjula, K. Djouani, Y. Hamam, B.J. van Wyk and Q. Williams have been described while similar problems for a therapy of children supporting by J. Marnik, S. Samolej, T. Kapuściński, M. Oszust and M. Wysocki are presented. A paper by Mertens, C. Wacharamanotham, J. Hurtmanns, M. Kronenbuerger, P.H. Kraus, A. Hoffmann, C. Schlick and J. Borchers to a problem of communication through a touch screen improvement is devoted. Some other problems of tactile communication by L. M. Muñoz, P. Ponsa and A. Casals are considered. J. Ruminski, M. Bajorek, J. Ruminska, J. Wtorek, and A. Bujnowski present a method of computer-based dichromats aiding in correct color vision. In the papers by A. Roman-Gonzalez and by J.P. Rodrigues and A. Rosa some concepts of direct EEG signals using to persons with lost motor abilities aiding are presented. Similarly, some basic problems and experimental results of a direct braincomputer interaction by M. Byczuk, P. Poryzała and A. Materka are also described. A group of papers presented by Y. Ota; P. Nauth; M. Kitani, T. Hara, H. Hanada and H. Sawada; D. Erol Barkana, and by T. Sato, S. Sakaino and T. Yakoh contains description of several new robotic systems’ constructions. The group concerning diagnostic systems consists of papers mainly to medical applications devoted (K. Przystalski, L. Nowak, M. Ogorzałek and G. Surówka; P. Cudek, J.W. Grzymała-Busse and Z.S. Hippe; A. Świtoński, R. Bieda and K. Wojciechowski; T. Mroczek, J.W. Grzymała-Busse, Z.S. Hippe and P. Jurczak; R. Pazzaglia, A. Ravarelli, A. Balestra, S. Orio and M.A. Zanetti; M. Jaszuk, G. Szostek and A. Walczak; Gomuła, W. Paja, K. Pancerz and J. Szkoła ). Besides, in a paper by R.E. Precup, S.V. Spătaru, M.B. Rădac, E.M. Petriu, S. Preitl, C.A. Dragoş and R.C. David an industrial diagnostic system is presented. K. Adamczyk and A. Walczak present an algorithm of edges detection in images which in various applications can be used.
From the Editors
VII
In the papers by L. Pyzik; C.A. Dragoş, S. Preitl, R.E. Precup and E.M. Petriu and by E. Noyes and L. Deligiannidis examples of computer-aided educational systems are presented. K. Kaszuba and B. Kostek describe a neurophysiological approach to learning processes aiding. The group concerning general H-SI problems consists of the papers presented by T.T. Xie, H. Yu and B.M. Wilamowski; H. Yu and B.M. Wilamowski; and G. Drałus. General problems of rules formulation for automatic reasoning are described by A.P. Rotshtein and H.B. Rakytyanska as well as by M. Pałasiński, B. Fryc and Z. Machnicka. Close to the former ones, S. Chojnacki and M.A. Kłopotek consider a problem of Boolean recommenders evaluation in decision systems. Various aspects of computer-aided decision making methods are presented in the papers by M.P. Dwulit and Z. Szymański, L. Bobrowski and by A. Pułka and A. Milik. A problem of ontology creation by A. Di Iorio, A. Musetti, S. Peroni and F. Vitali is described. At last, A. Małysiak-Mrozek, S. Kozielski and D. Mrozek present a concept of proteins structural similarity describing language. This panorama of works conducted by a large number of scientists in numerous countries shows that H-SI is a wide and progressive area of investigations aimed at human life conditions improvement. It also shows that between different scientific disciplines new and interesting problems arise and stimulate development on both sides of the borders. Editors Zdzisław S. Hippe Juliusz L. Kulikowski Teresa Mroczek
Contents
Part IV: Environment Monitoring and Robotic Systems SSVEP-Based Brain-Computer Interface: On the Effect of Stimulus Parameters on VEPs Spectral Characteristics . . . . . . . . . . . . . . . . . . . . . . . M. Byczuk, P. Poryzała, A. Materka
3
Design and Development of a Guideline for Ergonomic Haptic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L.M. Mu˜noz, P. Ponsa, A. Casals
15
Partner Robots – From Development to Business Implementation . . . . . . Y. Ota
31
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Nauth
41
A Talking Robot and Its Singing Performance by the Mimicry of Human Vocalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Kitani, T. Hara, H. Hanada, H. Sawada
57
An Orthopedic Surgical Robotic System-OrthoRoby . . . . . . . . . . . . . . . . . D. Erol Barkana
75
Methods for Reducing Operational Forces in Force-Sensorless Bilateral Control with Thrust Wires for Two-Degree-of-Freedom Remote Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Sato, S. Sakaino, T. Yakoh
91
Part V: Diagnostic Systems Applications of Neural Networks in Semantic Analysis of Skin Cancer Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 K. Przystalski, L. Nowak, M. Ogorzałek, G. Sur´owka
X
Contents
Further Research on Automatic Estimation of Asymmetry of Melanocytic Skin Lesions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 P. Cudek, J.W. Grzymała-Busse, Z.S. Hippe Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 ´ A. Swito´ nski, R. Bieda, K. Wojciechowski A Machine Learning Approach to Mining Brain Stroke Data . . . . . . . . . . 147 T. Mroczek, J.W. Grzymała-Busse, Z.S. Hippe, P. Jurczak Using Eye-Tracking to Study Reading Patterns and Processes in Autism with Hyperlexia Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 R. Pazzaglia, A. Ravarelli, A. Balestra, S. Orio, M.A. Zanetti Ontology Design for Medical Diagnostic Knowledge . . . . . . . . . . . . . . . . . 175 M. Jaszuk, G. Szostek, A. Walczak Rule-Based Analysis of MMPI Data Using the Copernicus System . . . . . . 191 J. Gomuła, W. Paja, K. Pancerz, J. Szkoła Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 K. Adamczyk, A. Walczak Experimental Results of Model-Based Fuzzy Control Solutions for a Laboratory Antilock Braking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 R.E. Precup, S.V. Sp˘ataru, M.B. R˘adac, E.M. Petriu, S. Preitl, C.A. Dragos¸, R.C. David Part VI: Educational Systems Remote Teaching and New Testing Method Applied in Higher Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 L. Pyzik Points of View on Magnetic Levitation System Laboratory-Based Control Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 C.A. Dragos¸, S. Preitl, R.E. Precup, E.M. Petriu 2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 E. Noyes, L. Deligiannidis Employing a Biofeedback Method Based on Hemispheric Synchronization in Effective Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 K. Kaszuba, B. Kostek
Contents
XI
Part VII: General Problems Comparison of Fuzzy and Neural Systems for Implementation of Nonlinear Control Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 T.T. Xie, H. Yu, B.M. Wilamowski Hardware Implementation of Fuzzy Default Logic . . . . . . . . . . . . . . . . . . . 325 A. Pułka, A. Milik Dwulit’s Hull as Means of Optimization of kNN Algorithm . . . . . . . . . . . . 345 M.P. Dwulit, Z. Szyma´nski OWiki: Enabling an Ontology-Led Creation of Semantic Data . . . . . . . . . 359 A. Di Iorio, A. Musetti, S. Peroni, F. Vitali Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 A.P. Rotshtein, H.B. Rakytyanska Server-Side Query Language for Protein Structure Similarity Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 B. Małysiak-Mrozek, S. Kozielski, D. Mrozek A New Kinds of Rules for Approximate Reasoning Modeling . . . . . . . . . . 417 M. Pałasi´nski, B. Fryc, Z. Machnicka Technical Evaluation of Boolean Recommenders . . . . . . . . . . . . . . . . . . . . 429 S. Chojnacki, M.A. Kłopotek Interval Uncertainty in CPL Models for Computer Aided Prognosis . . . . 443 L. Bobrowski Neural Network Training with Second Order Algorithms . . . . . . . . . . . . . 463 H. Yu, B.M. Wilamowski Complex Neural Models of Dynamic Complex Systems: Study of the Global Quality Criterion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 G. Drałus Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
SSVEP-Based Brain-Computer Interface: On the Effect of Stimulus Parameters on VEPs Spectral Characteristics M. Byczuk, P. Poryzała, and A. Materka Institute of Electronics, Technical University of Lodz, Łódź, Poland {byczuk,poryzala,materka}@p.lodz.pl
Abstract. It is demonstrated that spectral characteristics of steady-state visual evoked potentials (SSVEPs) in a brain-computer interface (SSVEP-based BCI) depend significantly on the stimulus parameters, such as color and frequency of its flashing light. We postulate these dependencies can be used to improve the BCI performance – by proper design, configuration and adjustment of the visual stimulator. Preliminary results of conducted experiments show also that SSVEP characteristics are strongly affected by subjects biodiversity.
1 Introduction Brain-Computer Interface (BCI) is an alternative solution for communication between human and machine. In the case of traditional interfaces, the user is expected to make voluntary movements to control a machine (e.g. movements of hands and fingers are required to operate a keyboard). In contrast to commonly used human-machine interfaces, BCI device allows sending commands from brain to computer directly, without using any brain’s normal output pathways or peripheral nerves and muscles [Wolpaw et al. 2000]. This unique feature contributed to great interest in the study of neural engineering, rehabilitation and brain science during last 30-40 years. Currently available systems can be used to reestablish a communication channel for persons with severe motor disabilities, patients in a “locked-in” state or even completely paralyzed people. It is predicted that within next few years BCI systems should be practically implemented. BCI device measures ongoing subject’s brain activity, usually electroencephalographic (EEG) signals, and tries to recognize mental states or voluntarily induced changes in the brain activity. Extracted and correctly classified EEG signal features are translated into appropriate commands which can be used for controlling a computer, wheelchair, operating a virtual keyboard, etc. The various systems differ in the way the intention of the BCI user is extracted from her/his brain electrical activity. Among the approaches, two groups of techniques are most popular, based on:
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 3–14. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
4
M. Byczuk, P. Poryzała, and A. Materka
• identifying changes of the brain activity which are not externally triggered, • detecting characteristic waveforms in EEGs, so called Visual Evoked Potentials (VEP), which are externally evoked by visual stimulus. The class of VEP based BCI systems offer many advantages: easy system configuration, high speed, large number of available commands, high reliability and little user training. Visually evoked potentials can be recorded in the primary visual cortex which is located at the back part of the human brain. VEPs reflect user’s attention on visual stimulus which may be in the form of short flashes or flickering light at certain frequency. VEPs elicited by brief stimuli are usually transient responses of the visual system and are analyzed in time domain. VEPs elicited by flickering stimulus are quasi periodic signals, called Steady-State VEP (SSVEP), and are analyzed in frequency domain. Fig. 1 shows the simplified block diagram of a typical VEP based BCI system. Each target (a letter, direction of a cursor movement, etc.) in a VEP based BCI is encoded by a unique stimulus sequence which in turn evokes a unique VEP pattern. A fixation target can thus be identified by analyzing the characteristics of the VEP: a time of appearance (for flash VEP detection) or fundamental frequency (for SSVEP detection).
Fig. 1 A simplified block diagram of a typical VEP-based BCI system
SSVEP-Based Brain-Computer Interface
5
2 SSVEP-Based BCI Systems In majority of VEP-based BCIs, frequency encoding is used (interface operation is based on SSVEP detection). Energy of SSVEP signals is concentrated in very narrow bands around the stimulation frequency and its harmonics, whereas spontaneous EEG signal may be modeled as a Gaussian noise whose energy is spread over the whole spectrum. Thus SSVEPs can be easily detected using feature extraction based on spectral analysis and classification algorithms. Moreover neither system nor user requires any training since the EEG response to stimulus is known in advance. This approach results in a minimal number of electrodes required for proper operation of BCI, the ability of real-time operation and low hardware cost. Therefore, steady-state visual evoked potentials give raise to a very promising paradigm in brain-computer interface design. Currently, development of BCI systems for real-life applications is emphasized. Research teams still encounter many problems in changing demonstrations of SSVEP-based BCIs into practically applicable systems [Wang et al. 2008]. Two major constraints are: system capacity (a number of available targets or commands) and detection time. They are directly related to speed and reliability of BCI. The overall performance of the BCI system can be expressed numerically with information transfer rate (ITR), which describes the amount of information transferred per unit time (usually minute). ITR is defined as [Wolpaw et al. 2000]: 1− P ) , ITR = s ⋅ log 2 N + P log 2 P + (1 − P ) log 2 ( N − 1
(1)
where s is the number of detections per minute, N is the number of possible selections, and P is the probability that the desired selection will actually be detected. It is assumed that each selection has the same probability of being the one that user desires and each of the other selections have the same probability of selection. ITR of currently available systems usually varies from 10 up to 50 bits/minute. System capacity is limited by the stimulation frequency band (number of available stimulation frequencies), which is directly related to brain electrophysiology and visual information processing mechanisms [Regan 1989]. Detection speed is limited by signal-to-noise ratio (SNR), which may be decreased in subjects with strong spontaneous activity of the visual cortex. Limitations described above can be addressed with different approaches: • Research on stimulation methods that will increase interface capacity when using a limited number of stimulation frequencies: time, frequency or pseudorandom code modulated VEP stimulations [Bing et al. 2009], phase coding, multiple frequency stimulation methods [Mukesh et al. 2006], etc. Advanced methods of stimulation can be used to design interface with more commands available without a need to extend the stimulation frequency band.
6
M. Byczuk, P. Poryzała, and A. Materka
• Research on lead selection for the purpose of SNR enhancement – performance or even applicability of SSVEP-based system is limited due to biological differences between users [Wang et al. 2004]. For subjects with different SSVEP source locations, optimized electrode positions can help achieve high signal-tonoise ratio and overcome SSVEP detection problems. • Research on stimulation methods for the purpose of SNR enhancement – for example half-field alternate stimulation method described in [Materka and Byczuk 2006].
3 Prototype BCI System In our previous research we focused on SNR enhancement. The result of our work was a novel technique of alternate half-field stimulation. The method was practically implemented and tested in the prototype BCI system [Materka et al. 2007] designed in the Institute of Electronics at the Technical University of Lodz. The system can be classified as a noninvasive, SSVEP-based, frequency encoded BCI. Simplified block diagram of the prototype interface is depicted in Fig. 2.
Fig. 2 A block diagram of the prototype SSVEP-based BCI system
The system was implemented as a virtual keypad. Visual stimulator consisted of 8 labeled targets (keys) flickering at different frequencies (Fig. 3). Each target contained three light-emitting diodes (LEDs): two LEDs for alternate stimulation (B) and additional LED as a biofeedback indicator (A), which constantly provided real-time information about amplitudes of the measured SSVEP signals.
SSVEP-Based Brain-Computer Interface
7
A
B
Fig. 3 A view of stimulator targets: A – fixation point and biofeedback indicator, B – stimulating lights
Proper arrangement of stimulation lights within single symbol ensures their images are positioned on the left and right half of the visual field on the human retina. This leads to SSVEP responses (with opposite phases) in the right and left half of the visual cortex, respectively. Differential measurement of the EEG signals from both halves of the visual cortex allows significant SNR increase of the measured SSVEP signals. System operation and usability was tested with contribution of 10 volunteers. Tests showed it is much faster than conventional BCI devices based on SSVEPs. For the user who achieved the best results, detection time was 1.5s (40 detections per minute) with 0% error rate. In this case information transfer rate calculated according to formula (1) equals 120bits/minute. High transfer rate of the interface was obtained mainly due to short detection times (direct result of SNR enhancement). Communication speed of the designed system would be sufficient for most applications but limited capacity makes its usage as a full-alphabet keyboard difficult. Thus new methods for increasing the number of available commands must be developed in order to design fully keyboard-compatible computer interface. Preliminary observations showed that amplitudes of the detected SSVEP signals and the frequency band in which strong SSVEPs can be observed depend on some parameters of the stimulation, e.g. color, size, intensity, layout of stimulation lights and their frequency. Further investigation of the influence of these parameters on the spectral properties of the SSVEPs is the subject of our present research.
8
M. Byczuk, P. Poryzała, and A. Materka
4 Experimental Setup Two experiments were carried out using an alternate half-field stimulation technique. The EEG signal was measured differentially using two electrodes located on the left and right side of the occipital part of the scalp (positions O1 and O2 of the international 10-20 system of EEG electrode placement) with a reference electrode placed between them (position Oz). Amplified EEG signal was sampled at 200Hz. The user was sitting on a comfortable ergonomic chair to minimize activity of neck muscles which might produce EMG artifacts. Fixation LED
A) Viewing direction
B)
F SR
SL
D Beams Stimulating LED
Screen
5mm
Fig. 4 A side view of the stimulator (A), a view of stimulating lights SL, SR and a fixation light F on the screen of stimulator (B)
Visual stimulator used in experiments consisted of three LEDs which were projecting the stimulus on the screen (Fig. 4) – to diffuse (blur) the image of the contrastive shape of the light-emitting semiconductor region in the LED devices. The stimulus was in the form of two lights (left – SL, and right – SR) that flash with the same frequency, alternatively in time. An extra light source (F) was placed between two stimulating lights, slightly above them. This light was used as a fixation point. Additionally, intensity of the light F was changing according to the calculated SSVEP amplitude, to provide a feedback between the user and the system. This helped the user to concentrate his/her attention on the fixation light F. Table 1 Stimulator parameters Experiment
1
2
Diameter (D)
4mm
6mm
Color of lights SL and SR
Green
Red
Color of light F
Red
Green
Intensity of lights SL and SR
Low
High
SSVEP-Based Brain-Computer Interface
9
The distance between the screen of stimulator and the user’s eyes was about 50cm. Two experiments were carried out using different sets of light, described in Table 1. Intentionally all stimulation parameters were changed in the experiment 2 compared to experiment 1, just to demonstrate that these parameters have measurable influence on the SSVEP characteristics. More comprehensive examination of the effect of systematic changes in parameters on the SSVEP BCI performance is currently under way in our laboratory In both experiments, the diameter of fixation light F was 3mm and modulation depth of stimulating lights was 100% (sinusoidal modulation). Stimulation frequency was changing every 5-10 seconds within the range 20-50Hz with a fixed step of 0.78Hz. Each experiment lasted about 5-7 minutes.
5 Results For rough comparison of SSVEP amplitudes in both experiments, power spectral density (PSD) of EEG signals were computed in a sliding window of 1.28s duration (256 samples). This window corresponds to the frequency resolution of about 0.78Hz which was a frequency step of the stimulus. Prior to FFT calculation, the measured signals were filtered using comb filters to reduce the spectral leakage of Fourier analysis [Materka and Byczuk 2006]. Computed spectrograms are shown in Fig. 5 and Fig. 6 for experiment 1 and 2 respectively, carried out by the same user (subject 1).
Fig. 5 A spectrogram of measured EEG signal in experiment 1
10
M. Byczuk, P. Poryzała, and A. Materka
A comparison of the spectra illustrated in Fig. 5 and Fig. 6 demonstrates different frequency ranges with strong SSVEP components. In experiment 1 strong SSVEPs are visible in the range 20-40Hz whereas in experiment 2 SSVEP components may be observed at higher frequencies, in the range 30-50Hz. It seems that evoked potentials are easier to detect in Fig. 5 (because they have higher amplitude than SSVEPs in Fig. 6), so stimulation settings used in experiment 1 are better. However, they are not necessarily weaker in terms of signal power distance from the noise power floor, as will be discussed below.
Fig. 6 A spectrogram of measured EEG signal in experiment 2
To compare both experiments more objectively, a signal-to-background ratio (SBR) for each SSVEP component was computed. An SBR coefficient for frequency f is defined here as a ratio of the PSD at frequency f to the mean PSD value of the signal components at N=10 adjacent discrete frequencies: SBR ( f ) =
N ⋅ PSD ( f ) N /2
( PSD ( f − k ⋅ Δ f ) + PSD ( f + k ⋅ Δ f ))
,
(2)
k =1
where Δf = 0.78Hz is a frequency resolution of Fourier analysis applied for PSD calculation. Maximum values of SBR coefficients for each SSVEP frequency were collected and frequency characteristics for each experiment were estimated using polynomial approximation. A comparison of SBR characteristics for the two experiments carried out by the same subject is shown in Fig. 7.
SSVEP-Based Brain-Computer Interface
11
Fig. 7 A comparison of SBR frequency characteristics measured in experiment 1 (dashed line) and in experiment 2 (solid line)
SBR coefficients obtained in experiment 2 have higher peak value than in experiment 1, while Fig. 6 shows smaller SSVEP amplitudes than it is visible in Fig. 5. It means that EEG signal components other than SSVEP (so-called EEG noise) have much smaller amplitude than SSVEPs in experiment 2. This results in higher signal-to-background ratio. Characteristics presented in Fig. 7 confirm different frequency ranges of strong SSVEP in both experiments, shown in spectrograms (Fig. 5 and Fig. 6, respectively). If detection of SSVEP was done by comparing SBR with threshold value T=30, frequency range of detected SSVEPs would be about 27-40Hz (13Hz width) in experiment 1 and 37-48Hz (11Hz width) in experiment 2. Using different stimulation settings (e.g. in terms of color of the stimulus light) in a BCI system for frequencies below and above 38Hz (a crosspoint of both characteristics in Fig. 7) it is possible to increase stimulation frequency range to 27-48Hz (21Hz width). This may lead to an increased number of available BCI commands. Both experiments were repeated for another subject (Subject 2). Fig. 8 and Fig. 9 present SBR characteristics of the EEG signals measured from Subject 2 in experiment 1 and experiment 2 respectively, calculated according to formula (2).
12
M. Byczuk, P. Poryzała, and A. Materka
Fig. 8 SBR characteristics of measured EEG signal for Subject 2 as function of time in experiment 1
Fig. 9 SBR characteristics of measured EEG signal for Subject 2 as function of time in experiment 2
SSVEP-Based Brain-Computer Interface
13
Frequency range was extended to 3-50Hz in experiment 1 due to the presence of strong SSVEP responses for stimulation frequencies below 20Hz (7-45Hz) in this subject case (Fig. 8). Moreover, experiment 1 for Subject 2 shows different nature of SSVEPs. For frequencies 18-24Hz, the responses contain strong second harmonics and very weak component at the fundamental frequency, whilst for Subject 1 SSVEP signals contain only fundamental harmonics for all frequencies. Fig. 9 shows different stimulation frequency band with strong SSVEPs in experiment 2 (14-35Hz, including second harmonic responses for stimulation frequencies 18-24Hz), when compared to Subject 1 (Fig. 6).
6 Conclusions Steady-state visual evoked potentials give rise to a very promising paradigm in brain-computer interface design. SSVEP-based BCI systems are the most effective solution, in terms of speed and accuracy, when compared to other BCI devices. Experiments presented in the paper show that characteristics of the steady-state visual evoked potentials depend on parameters of visual stimulus. Then we postulate the performance of SSVEP-based BCI systems can be improved by proper construction, configuration and adjustment of the visual stimulator. Moreover, SSVEP characteristics depend on individual features of the subject’s visual system. This suggests that stimulation parameters and SSVEP detection algorithm (tuned to the stimulus fundamental frequency or its second harmonic) should be individually adjusted for a subject. Further research will focus on distinguishing which parameters of stimulus (color, size, shape, etc.) have the strongest influence on the SSVEP characteristics.
Acknowledgment This work is supported by Polish Ministry for Science and Higher Education grant NN515 520838.
References [Bing et al. 2009] Bin, G., Gao, X., et al.: VEP-based brain-computer interfaces: Time, frequency and code modulations. IEEE Comput. Intell. Mag. 4(4), 22–26 (2009) [Materka and Byczuk 2006] Materka, A., Byczuk, M.: Alternate half-field stimulation technique for SSVEP-based brain–computer interfaces. IEEE Electron Lett. 42(6), 321– 322 (2006) [Materka and Byczuk 2006] Materka, A., Byczuk, M.: Using comb filter to enhance SSVEP for BCI application. In: 3rd International Conference on Advances in Medical, Signal and Information Processing MEDSIP 2006, IET Proceedings Series CP520Z (CD-ROM), Glasgow, United Kingdom, 4 p (2006) [Materka et al. 2007] Materka, A., Byczuk, M., Poryzała, P.: A virtual keypad based on alternate half-field stimulated visual evoked potentials. In: Int. Symp. on Information Technology Convergence, Jeonju, Republic of Korea, pp. 296–300 (2007)
14
M. Byczuk, P. Poryzała, and A. Materka
[Mukesh et al. 2006] Mukesh, T.M.S., Jaganathan, V., Reddy, M.R.: A novel multiple frequency stimulation method for steady state VEP based brain computer interfaces. Physiol. Meas. 27(1), 61–71 (2006) [Regan 1989] Regan, D.: Human brain electrophysiology – evoked potentials and evoked magnetic fields in science and medicine. Elsevier, New York (1989) [Wang et al. 2004] Wang, Y., Zhang, Z., Gao, X., et al.: Lead selection for SSVEP-based brain-computer interface. In: Proc. 26th Int. IEEE EMBS Conf., pp. 4507–4510 (2004) [Wang et al. 2008] Wang, Y., Gao, X., Hong, B., et al.: Brain–computer interfaces based on visual evoked potentials, feasibility of practical system design. IEEE Eng. in Medicine and Biology 27(5), 64–71 (2008) [Wolpaw et al. 2000] Wolpaw, J.R., et al.: Brain-computer interface technology: A review of the first international meeting. IEEE Trans. Rehab. Eng. 8, 164–173 (2000)
Design and Development of a Guideline for Ergonomic Haptic Interaction L.M. Muñoz1, P. Ponsa1, and A. Casals1,2 1
Department of Automatic Control, Universitat Politècnica de Catalunya, Barcelona Tech, Spain {luis.miguel.munoz,pedro.ponsa}@upc.edu 2 Institute for Bioengineering of Catalonia, Barcelona, Spain
[email protected]
Abstract. The main goal of this chapter is to propose a guideline for human-robot systems focused on ergonomic haptic interaction. With this aim, this model presents several main parts: a set of heuristic indicators in order to identify the attributes of the haptic interaction, the relationship between indicators, the human task and the haptic interface requirements and finally an experimental task procedure and a qualitative performance evaluation metrics in the use of haptic interfaces. The final goal of this work is the study of possible applications of haptics in regular laboratory conditions, in order to improve the analysis, design and evaluation of human task over haptic interfaces in telerobotic applications.
1 Introduction Traditional human-machine interfaces are usually provided with visual displays and sometimes with auditory information (humans process information coming from the visual channel). Compared to vision and audition, our understanding of human haptics, which includes the sensory and motor systems of the human hand, is very limited. One of the reasons of this drawback is the experimental difficulty of presenting controlled stimuli, due to the fact that haptic systems are bidirectional - they can simultaneously perceive and act upon their environment. The interface is the element that permits users to perform a task efficiently and establish a dialog between the human and the system. Interfaces with haptic feedback can enhance the realism of interactive systems through more intuitive interactions (with other variables as force, distance or speed). In such situations, the interaction is often bidirectional, providing some derived measures (mechanical impedance, ratio of force to speed and transparency). With the development of interfacing technology and due to the strong trend to include haptics in multimodal environments, a working group called TC159/SC4/ WG9 has been created with the aim to develop specific guidelines in this domain. For instance, in complex systems, as in telesurgery applications, haptics is a topic of research. In such systems, the aim is to determine the feedback to be applied to
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 15–29. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
16
L.M. Muñoz, P. Ponsa, and A. Casals
improve surgeons’ performance in tasks such as suturing and knot-tying tasks, which are very time consuming. Thus, the availability of guidelines and recommendations are highly appreciated. There are other situations in which human visual and auditory channels are heavily loaded, or on the contrary, visual and auditory information is limited (e.g. undersea navigation in zones having high density of planktons or teleoperation with a variety of information being sent through the visual channel, like in a pilot training cockpit). In such cases, the availability of haptic channels can be very important and the development of haptic interfaces can help to the progress of human-robot interaction. In order to use haptic and tactile interfaces and study the human-robot interaction, the following items are necessary a) evaluate the levels of automation, b) evaluate the relationship between experts, c) create and use an interface object reference model, d) solve the standardization problem and finally e) study the context of their use. The levels of automation include a set of operational modes (manual control, automatic control or shared control) between human and robot in each domain. Another complex aspect is the coordination between experts from different fields (robotics technicians, interface designers, experts on human factors, end users) and the relationship between human and robots (task allocation). Lynch and Mead, advocated that user interface reference models should “provide a generic, abstract structure which describes the flow of data between the user and the application, its conversion into information, and the auxiliary support which is needed for an interactive dialogue” [Lynch and Mead 1986]. This model provides an understanding of the many facets involved in individual and groups of tactile/haptic interaction objects. However, psychophysics studies and ergonomics considerations of the interaction in human-computer systems are not taken into consideration. Later on, Carter uses a reference model that can help to standardize the design and construction of tactile or haptic interaction objects ((identify, description attributes, representation attributes, and operations), by ensuring that all relevant aspects of such interactions are taken into consideration. Recently reference models have been used to define the major components of accessible icons, organizing ergonomic and user interface standardization. The engineering community interest in haptic and tactile interactions and standardization has grown considerably based on recent research [van Herp et al., 2010]. One of the main difficulties to be solved in this domain is related to considering a human- centred design approach and the study of effective human-robot tasks in complex environments. An ergonomic assessment can ensure that systems are designed taking sufficient attention to interoperability, improvement of task effectiveness, avoiding human error and enhancing the comfort and well-being of users. A guideline on ergonomic haptic interaction included inside a generic framework (analysis, design, evaluation) can be useful in the study of several humanrobot systems, for example in assisted surgical applications. The main proposal of this work is the preparation of a guideline for ergonomic haptic interaction design
Design and Development of a Guideline for Ergonomic Haptic Interaction
17
(GEHID), which provides an approach that relates human factors to robotics technology. The guideline is based on measures that characterize the haptic interfaces, users capabilities and the objects to manipulate. In a human-robot scenario the most important part is the human sensory-motor activity, but it is also necessary to analyze the typology of the tasks to be performed and the context of use of the haptic interface. In the next section we describe the previous work on assessment of the quality of the haptic interaction framework. Section 3 explains the functional description of the GEHID indicators. In section 4 the characteristics of the task in haptic applications and the relationship between tasks and indicators are described. In section 5, the performance evaluation method in human-robot systems and a development of life cycle are presented in order to show the ergonomic validation of the proposed guideline. Finally, some conclusions and the future work are presented.
2 Haptic Interaction Framework In order to define a haptic interaction framework it is necessary to understand that many researchers and developers use two concepts: haptic and tactile. There is no difference between haptic and tactile in most dictionary definitions, however, many researchers and developers use haptic to include all haptic sensations and use of tactile is related to the stimulation of the skin (mechanical, thermal, chemical, and/or electrical stimulation). A haptic/tactile interaction framework based on a human centred design approach needs a standard methodology based on the study of human-robot tasks, a process model approach (analyse of requirements, guidance, performance evaluation), an ergonomic validation and a clear layout of the objects to manipulate. Table 1 shows the efforts of the International Standard Organization, the ISO’s work, in this domain. The ISO 9241-920 Ergonomics of human-system interaction constitutes a guidance for the design of tactile and haptic interactions [ISO 9241920 2009]. Table 2 shows diverse guidelines in human-computer interaction, usability engineering and haptic/tactile interaction. The guideline for ergonomic haptic interaction design, GEHID guide, is a method that seeks to cover aspects of the haptic interface design and the humanrobot task in order to improve the performance of haptic teleoperation applications.
18
L.M. Muñoz, P. Ponsa, and A. Casals
Table 1 ISO’s work on tactile/haptic interaction. An adaptation of van Erp work [van Erp et al., 2010] ISO Number
State
ISO 9241-900 Introduction to tactile and haptic interaction
Not started
ISO 9241-910 Framework, terms and definitions
Work in progress
ISO 9241-920 Ergonomics of human-system interaction
Finished in 2009
ISO 9241-930 Haptic/tactile interactions in multimodal environments
Not started
ISO 9241-940 Evaluation of tactile / haptics interactions
Work in progress
ISO 9241-971 Accessibility and use of haptic / tactile interfaces in public environments
Not started
Table 2 Some Guidelines in Human-computer interaction and haptic/tactile interaction Guideline
Domain
Colwell et. al, 1998: “ guidelines for the design of haptic interfaces and virtual environments”
Haptic interface; Blind people
Miller and Zeleznik, 1999: “ 3D haptic interface widgets”
3D Interaction; X windows system
Challis and Edwards, 2000: “design principles for tactile interaction”
Static tactile interaction; Touchpad, tactile overlay
Sjöström, 2002: “guidelines for haptica and tactile interfaces”
Non visual haptic interaction design; Blind people.
The GEHID guide can offer recommendations and define requirements in the use of a new haptic interface created or can help to improve technical features of commercial haptic interfaces. The guideline can be structured into two parts. The first details a set of selected attributes following the heuristic methods proposed by experts in the Human-Computer Interaction domain and haptic interaction domain. The second part is a task allocation: a clear relationship between attributes and basic haptic tasks. Next sections of the chapter explain in more detail the haptic guideline proposed.
Design and Development of a Guideline for Ergonomic Haptic Interaction
19
3 GEHID Indicators When an operator performs a task directly over an object, for moving it for example, (Fig. 1) a reaction force is generated from the object and perceived by the hand through different receptors. When the same task is performed by a teleoperated system (Fig. 2), firstly the interface device should be able to sense the actions of the operator hand, secondly the teleoperated device must have the ability to reproduce the actions of the operator, and, third the reaction forces and movements measured on the object should be faithfully measured in order to be finally reproduced on the interface device.
Fig. 1 Reaction force perceived by the operator in his interaction with the objects of the environment
Fig. 2 (left remote area), (right local area) Reaction force perceived by the operator in his teleoperated interaction through a haptic interface
The aim of the indicators is providing a quantitative and/or qualitative measure of the perceived information from the teleoperated environment, through a teleoperation interface, in order to characterize a task and assess the degree to which a task can be perceived by an operator. Depending on the nature of the task, one or more indicators should be taken into account in order to make the assessment. The indicators represent properties, characteristics or energies that the operator perceives in a manual exploration or manipulation. Some of these indicators act mainly over the cutaneous receptors, others over the kinesthetics, and others over a combination of both.
20
L.M. Muñoz, P. Ponsa, and A. Casals
Perception indicators are classified into groups in accordance with their physical properties or behavior similarities. Although these indicators are magnitudes or physical properties, they are defined taking into account the operator’s sensing and perception. The indicators considered here are: Texture Texture produces a variation in the perceived movement while exploring an object as a consequence of a displacement on its surface. Superficial texture is characterized by size, distance and slope of the elements belonging to the surface and becomes a variation of the movement of the tip or object in contact. This variation can cause a vibration with a specific frequency, amplitude and waveform or a change in the acting speed in function of the exerted force, normally as a consequence of a variation in the coefficient of friction of the surface. Some of the properties that can be extracted observing a superficial texture are: • Rugosity: presence of irregularities on the surface. Can be characterized from the depth or height of the irregularities with respect to the average surface. The order of magnitude is normally under a millimeter. In a general way, the variation of the movement takes place in the direction normal to the surface. • Paterns: Presence of repetitive shapes (like chanels, grooves, undulatings, etc.) or simbolic representation (hieroglyphics, Braille, etc.). • Friction: Force oposed to the movement over the surfaces in contact. In a general way, the variation of the movement will be in the direction tangencial to the surface. Reaction Force/Moment A reaction produces a variation in the force or moment perceived when contacting with an object or exerting a force over it. Forces and moments have a vector nature, in which the module is constrained by the range of values of force required by the task and the direction of each degree of freedom. The range of forces, resolution and number of degrees of freedom of the interface device must be in accordance with those required by the task. Pressure Pressure produces a variation in the force perceived under a contact with a surface unit. The feeling of pressure is perceived through the cutaneous receptors. Then, pressure is always perceived from the interface device and the perception value depends directly on the contact force perceived, being the surface in contact generally constant. In order to perceive the pressure as a particular magnitude, the interface device should be able to change the surface in contact with the operator hand or fingers. Compliance The variation in the perceived position as a consequence of an exerted force, which is restored when the force disappears, constitutes the compliance concept.
Design and Development of a Guideline for Ergonomic Haptic Interaction
21
The behavior of compliance is governed by the Hooke’s law. Some of the magnitudes related with compliance are: • Elasticity: magnitude related directly to compliance. The perception of elasticity depends on the variation of position x with respect to the force F exerted (F = k·x, being k the constant of elasticity). The resolution and range of forces and displacements of the interface device should be in accordance with the task. • Rigidity: absence of displacement perceived when a force is exerted. This happens when the constant of elasticity is very large. A short reaction time is required form the interface device in order to perceive the feeling of rigidity when an effort is aplied at a given time. Weight/Inertia A resistance is perceived when an object is held statically or is displaced freely. The cutaneous and kinesthetic receptors are involved in the production of this effect. The resolution and number of degrees of freedom of the interface device must be in accordance with those required by the task (direction of movement of the object, the object needs to be oriented, etc.). Impulse/Collision Perception of the change of the momentum (mass×velocity). It is perceived like a significant variation of the interface speed. This variation happens when colliding with objects in the environment or when there is a loss of mass in the objects (breaking or disassembling). Vibration Variation in the position perceived in a cyclic way. Vibration differs from texture, taking into account that the variation of the movement produced by vibrations doesn’t appear during exploration, but is a movement generated from the manipulated object. The cutaneous receptors are mainly involved especially when the amplitude is small (<1mm) and the frequency exceeds 5Hz. For larger amplitudes the kinesthetic receptors can also be involved. The reaction time of the interface device is fundamental and should be in accordance with the frequencies involved. Geometric Properties Perception of the size and shape of an object performing an exploration over its surface. Depending on the size of the objects two scales characterize the behavior of perception. When the objects size is under the size of the operators’ fingerprint the cutaneous receptors are involved and the perception of the size and shape depends on the pressure exerted on the different points of the skin. For larger sizes the perception of the size and shape needs an exploration over the object surface, the changes in the position of the interface device performing an exploration movement allows the operator to perceive the object shapes. The number of DOF
22
L.M. Muñoz, P. Ponsa, and A. Casals
and range of movement of the interface device must be in accordance with the task. Some of the properties are: • Size: Volume perceived from the object. Rather than the absolute value, the ability of perception of the relative size of the objects in the context of the task is of great interest. • Shape: ability to identify a regular charasteristic on the global volume of an object, like the presence of edges, flats or curves. • Curvature: ability to differenciate among different curvature radii. Disposition Perception over the position and orientation of objects. The ranges of movement and resolution of each DOF of the interface device must be in accordance with the task. • Position: perception of the relative position of the objects among them in the context of the task going on and obtained by exploration. The position is an indicator that provides information uniquely after a global exploration proces, in order to determine the relative position of the objects, unless independent interfaces are available for each hand or finger. • Orientation: perception of the degree of inclination or orientation with respect to the vertical or horitzontal planes.
4 Allocation between Tasks and GEHID Indicators A basic haptic action task is a task in which the operator performs a motor action and, as a consequence, he perceives information that can be characterized by one or several indicators. A haptic application can be composed of several basic tasks. Presence The presence task is the action that allows the operator to determine the presence or absence of an object. It is the simplest basic task and gives binary results (presence or absence). Compliance (rigidity) is its characteristic indicator. Classification Classification allows the operator to differentiate among different objects. This differentiation can be based on shape, size, weight, elasticity or texture, depending on the characteristics of the manipulated objects. Then, depending on the nature of the objects, several indicators should be considered like superficial texture, weight, geometric properties and/or compliance.
Design and Development of a Guideline for Ergonomic Haptic Interaction
23
Push/Pull Is the task that allows the operator to move sliding an object by applying a force over it in the direction of the force (push) or in the opposite direction (pull). The indicators involved in this measure, push or pull effect, is manly the reaction force, as a consequence of friction, and inertia. Compression/Expansion Is the task that performs a dimensional reduction (compression) or increase (expansion) in some directions of the object when a force is exerted over it. Compliance (elasticity) and pressure are the characteristic indicators for this task. Grasping Is the task that allows the operator to grab an object. Grasping task require independent mobility for at least two fingers allowing the action of “pinching” an object. Compliance (rigidity) and weight are the main indicators to be considered in order to assess a task. Translation/Rotation The action of displacement and/or orientation of an object differs from push or pull because translation or rotation is performed grasping the object without sliding over a surface. The main indicators are the weight and inertia. Assembling Assembling is the task that allows fitting together two objects closely. Occurs when an object is placed over a surface or when an object is inserted into another, like a peg in a hole. Texture, collision, reaction force, and rigidity are the main indicators for this task.
5 Example of an Application To visualize these concepts with an example, a typical pick and place robotic application is considered, in which a rigid object is moved from one place to another (Fig. 3). The sequence of basic tasks involved are: 1. Presence. In order to determine if the object is near the grasping tool. 2. Grasping. With the aim of taking the object. 3. Translating. In order to move the object towards the target place. 4. Assembling. In order to put the object in contact with the target place. 5. Grasping. In this case to release the object. 6. Presence. In order to release the object. In a teleoperated task with visual feedback the behavior in the remote area is mainly obtained through vision. In this application the main contribution of vision
24
L.M. Muñoz, P. Ponsa, and A. Casals
is the allocation of the objects and robot that allows the operator to determine the necessary directions and trajectories to perform the desired tasks. The contribution of the haptic feedback appears when robot and objects are in contact and in movement. Previous to the first and last task, the movement of the teleoperated device does not requires haptic feedback. Table 3 summarizes the indicators and basic tasks involved in the pick and place application. The first column shows the basic tasks involved, the specific indicators are placed in the second column and in the third column the requirements that the haptic interface has to satisfy, in relation to each indicator. Different tasks have different requirements, the most restrictive must be considered. Presence and translation require 3DOF force feedback, rigidity perception require 1DOF with fast response, and inertia perception require 3DOF with high resolution, then the haptic device should be 3DOF force feedback with fast response and high resolution. This also will satisfy the requirement for the assembling task (z force feedback & fast response). On the other side, the teleoperated device should have the ability to provide the required force and position signals from the task. The x-y-z force reaction is normally provided by a force/torque sensor attached to the robot wrist, and the gripper should allow the control of aperture and grasping force.
6 Performance Evaluation In human-system interaction studies it is necessary to define qualitative and quantitative some performance rates. It is possible to follow different approaches: the individual differences approach, the case study approach and the system characteristics approach. The studies of user’s differences have diverse goals: • To find ways of predicting performance • To find and characterize individual variability. Table 3 Indicators, basic tasks involved and haptic interfaces requirements in a pick and place application Basic task Presence
Indicators Rigidity on 3D movement
Haptic interface requirements 3DOF x-y-z force feedback & fast response
Grasping
Rigidity on the grasping tool
1DOF force feedback gripper
Translating
Weight or inertia on 3D movement
3DOF x-y-z force feedback high resolution
Assembling
Rigidity and collision on z
1DOF z force feedback & fast response
Design and Development of a Guideline for Ergonomic Haptic Interaction
25
3D Force sensor on the wrist Position/Force Controlled gripper
a) Remote pick & place teleoperated application
3D Haptic device
Position/Force Controlled gripper
b) Local haptic interface Fig. 3 Pick & place teleoperation arrangement and the characteristic elements on remote and local spaces
From the point of view of usability, engineering the proposed performance evaluation can be summarized in three steps: Effectiveness measure; Efficiency measure; Users’ satisfaction measure. In order to evaluate how usable is our system, we will work with the definition of usability provided by the international standard ISO 9241-11 on the Ergonomic requirements for office work with visual display terminals (VDTs) - Part 11: Guidance on usability [ISO 1998]. Usability in this standard is defined as “The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use”. The problem with this classical model approach is how to define generic usability metrics and how to include the environment conditions. In this new extended framework, experts want to incorporate areas beyond usability, such as values, emotions, privacy, trust and social aspects of computing [Scholtz and Consolvo 2004]. We remark some aspects: • Interaction between a user and n interfaces; or interaction simultaneously between multi-users and the system
26
L.M. Muñoz, P. Ponsa, and A. Casals
• Direct and indirect stakeholders: direct refers to individuals who interact directly with the system, indirect refers to all other parties who are affected by the use of the system • Human-automation interaction. A new paradigm in which the relationship between human and new machines (for example human-robot interaction) is important • Create a development life cycle in order to specify the interaction steps In some research areas, for example in human-autonomous vehicle interaction, different frameworks of metric classes have been developed by researchers to facilitate metric selection and understanding and comparing research results [Donmez et al. 2008]. The aim of this approach is to find generic metric classes, metric evaluation criteria and finally a methodology based on a cost/benefit analysis approach, which will objectively identify the best set of metrics for classifications of research studies. Table 4 is an adaptation of the Donmez et al. previous work inside a human-robot system framework. In order to include the GEHID guideline inside a well defined model it is necessary create a development life cycle with five steps: haptic interaction framework (following the ISO standards), identify haptic application requirements, allocation task-GEHID indicators, experimental test in laboratory and finally performance evaluation (see Fig. 4). Then, we can obtain feedback and finally offer improvement recommendations in these human-robot systems. Table 4 Human-Robot metric classes Metric classes
Description
1 Task effectiveness
Human-Robot system performance parameters
2 Automation behaviour efficiency
Robot behaviour, Interface behaviour
3 Human behaviour efficiency
Information processing, decision making, action
4 Human behaviour precursors 4.1 Cognitive precursors 4.2 Physiological precursors
Mental workload Fatigue
5 Collaborative metrics 5.1 Human-Automation 5.2 Human-Human
Human-Robot interaction, Human-Haptic interaction Robotic Technician-Surgeon, Robotic Technician-Haptic Interface designer
Design and Development of a Guideline for Ergonomic Haptic Interaction
27
In order to improve the use of a haptic interface, from the point of view of the usability engineering framework, is necessary a set of expert’s evaluators group (3-5 people, for example). We have diverse possibilities: test different haptic devices by the same expert person, or test the same haptic device by a set of experts. A user’s experience test can be prepared in order to measure human-robot performance metrics (task effectiveness, efficiency and satisfaction, etc.). In this scenario it is necessary to establish the relationship between the human-robot team and a human-computer interaction laboratory in order to apply useful evaluation methods.
Fig. 4 Development life cycle in a human-haptic interaction framework
A general framework for evaluating haptic applications in order to facilitate the collaboration between researchers is a complex problem (diverse technology, lack of standards, many applications) and will need to be solved in the future.
7 Discussion and Conclusions The guide GEHID is an approach that attempts to fill a methodology gap, which combines the efforts of systems engineering and human factors to improve the effectiveness of human-robot interaction in teleoperated systems.
28
L.M. Muñoz, P. Ponsa, and A. Casals
This guideline integrates physiological studies (sensation, perception) and haptic interfaces features in order to present a set of haptic indicators. Then, a task allocation between these indicators and a set of basic tasks in haptic interaction is presented. In order to reduce the subjective assessment in the performance evaluation it is necessary to have diverse evaluators (for example three evaluators if we are testing the use of the same haptic interface) and put the GEHID guideline in the context of the usability engineering approach. On the other hand, one evaluator can apply the GEHID guideline in the study of a haptic application testing diverse interfaces. In the opinion of the authors, the framework is systematic although it is necessary to assess the number of indicators and how to measure each indicator in order to obtain objective metrics useful for the engineering community. Then, experimental sessions in laboratory are the future step in our research. The future work will be apply this guideline to specific tasks and study the human performance in execution time, precision, trajectory of the manipulation, comfort, fatigue and satisfaction of the human operator.
Acknowledgment This work is supported by a 2009 research program of the Universitat Politècnica de Catalunya, Barcelona Tech, Spain. Project: Human-centered design in supervisory control systems.
References [Challis and Edwards 2000] Challis, B.P., Edwards, A.D.N.: Design principles for tactile interaction. In: Proc. of the Haptic Human-Computer Interaction Workshop, University of Glasgow, UK, pp. 98–101 (2000) [Colwell et al. 1998] Colwell, C., Petrie, H., Kornbrot, D., Hardwick, A., Furner, S.: Haptic virtual reality for blind computer users. In: ASSETS (1998) [Donmez 2008] Donmez, B., Pina, P.E., Cummings, M.L.: Evaluation criteria for humanautomation performance metrics. In: Proc. of the Performance Metrics for Intelligent Systems Workshop, Gaithersburg (2008) [ISO 1998] ISO, ISO 9241-11:1998 Ergonomic requirements for office work with visual display terminals (vdts) – part 11: Guidance on usability (1998) [ISO 9241-920 2009] ISO, Ergonomics of human-system interaction – Part 920: Guidance on tactile and haptic interactions. ISO 9241-920:2009. ISO, Geneva (2009) [Lynch and Meads 1986] Lynch, G., Meads, J.: In search of a user interface reference model. Report of the SIGCHI Workshop on User Interface Reference Models, vol. 18(2), pp. 25–33 (1986) [Miller and Zeleznik 1999] Miller, T., Zeleznik, R.: The design of 3D haptic widgets. In: Proc. of the ACM Symp. on Interactive 3D Graphics, Atlanta, GA, USA, pp. 97–102 (1999)
Design and Development of a Guideline for Ergonomic Haptic Interaction
29
[Scholtz and Consolvo 2004] Scholtz, J., Consolvo, S.: Towards a discipline for evaluating ubiquitous computing applications. INTEL Research, IRS-TR-04-004 (2004) [Sjöström 2002] Sjöström, C.: Non-visual haptic interaction design. Guidelines and applications, Doctoral Dissertation CERTEC, Lth, Number 2 (2002) [van Herp et al. 2010] van Erp, J.B.F., Kyung, K.-U., Kassner, S., Carter, J., Brewster, S., Weber, G., Andrew, I.: Setting the standards for haptic and tactile interactions: ISO’s work. In: Kappers, A.M.L., van Erp, J.B.F., Bergmann Tiest, W.M., van der Helm, F.C.T. (eds.) EuroHaptics 2010. LNCS, vol. 6192, pp. 353–358. Springer, Heidelberg (2010)
Partner Robots – From Development to Business Implementation Y. Ota Partner Robot / Advanced Engineering Group, Production Engineering Division Toyota Motor Engineering & Manufacturing North America, Inc., USA
[email protected]
Abstract. Toyota has been developing industrial robots since the 1980s. In recent years, man-machine cooperative robots that assist people’s skills have gradually been put into practical use. Now, Toyota is further evolving a number of robot technologies born at production sites to develop Partner Robots that work in harmony with people. Toyota is specifically considering four development areas: (1) manufacturing support, (2) short-distance personal mobility, (3) nursing and healthcare support, and (4) support for work around the home. Some scenes of Partner Robot operation in each of these four fields are mentioned, as well as the essential technologies embedded in Toyota Partner Robots, are then introduced in this paper. Lastly, issues that need to be resolved to bring these robots into truly useful and practical in our society are described.
1 Introduction Toyota is developing partner robots into a core business enterprise, as a new venture to the world. Particularly, Toyota Partner Robot development vision and the four essential technologies for developing Partner Robots will be introduced in this paper. In addition to the challenges ahead, there are several issues which must be resolved before Partner Robots can be practically implemented. This paper will also highlight those activities put into place to address them. Looking back over the history of robot development, Toyota and its group companies have been developing industrial robots since the 1980s. From welding robots, which helped create semi-automatic plants, to general-purpose robots in the 1990s. As a result, Toyota achieved robots capable of handling multiple vehicle models, as well as, various paint colors. As depicted in Fig.1, Partner Robots do not operate within safety zones, unlike conventional industrial robots. Instead, they can share spaces with people and perform useful functions. Today, Japan as well as many countries in Europe are confronted with a major challenge. It is a declining birth rate combined with the aging of the population and a work force shortage. The U.S. is also expected to soon face a similar challenge.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 31–39. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
32
Y. Ota
Industrial Robots (Operation within Safety Zone)
Partner Robots (Free Work Zone)
“Symbiosis” with Robots Fig. 1 Robots from factory to home environments
Decades ago a pyramid could easily be drawn to represent the population data in Japan, based on statistics. However, one will notice that the ‘pyramid shape’ can no longer be recognized by looking at the population data from 2005. In fact, by 2055 it will look like a ‘reversed pyramid’. As such, the working population in Japan will decrease by half and economic demand will shrink; as a result, the economy may cease functioning. Also, with one out of 2.5 people being over the age of 65, the question is then - how will an aging society live without its customary quality of life? To create a safer and more comfortable living environment, Toyota is hoping to develop Partner Robots into a core business. Partner Robots will essentially assist people and improve their quality of life. Based upon the social issues mentioned previously, Toyota is specifically considering the following four target fields: 1. 2. 3. 4.
Housekeeping, Nursing & healthcare, Short-range personal mobility, and Manufacturing assistance
Partner Robots require “intelligence” and “gentleness”, as two key technologies. In other words, they must have high communication skills, move in coordination with people, and operate with safe and reassuring gentle movements. Some scenes of Partner Robot operation in each of those four fields mentioned may be described as follows: In manufacturing applications, Partner Robots may be used to assist the skill or power of operators. The physical strength of even highly skilled workers fades with age. Robot assistance gradually reduces physical load, enabling the worker to continue making the most of his or her skills. Robots may also be used as substitutes for people in hazardous environments such as heat treatment work.
Intelligence
D om Nu e sti rsi cD ng u tie /H s ea lth ca re
Partner Robots – From Development to Business Implementation
33
Welfare
lit y obi M al s on r e P ring Manufactu Gentleness
Fig. 2 Four target fields of Toyota Partner Robots
Toyota has identified the most important requirements in nursing and healthcare field after discussions with affiliated hospitals and nursing facilities. Although nurses and caregivers play irreplaceable roles, Toyota found ways that robots can provide essential help. For instance, picking up objects that are out of reach and walking assistance are a few of highly demanded tasks in healthcare applications. In addition, nursing requirements also include lifting patients from or into bed, rehabilitation, bathing, and the like. Toyota has also extended its automobile technology and applications into a short-distance personal mobility. An automobile is convenient because it can go from door to door. In the same way, a personal mobility robot may be able to take a user from bed to bed. In other words, once the mobility robot picks up the user from bed in the morning, it may board public transportation and take the user seamlessly into buildings or to outdoor destinations. This type of robot has uninhibited mobility. In this way, Toyota has been developing essential technologies in these four basic fields illustrated in Fig. 2 to accomplish the following: 1. Reduce workload 2. Provide adaptable and autonomous movement 3. Provide tool manipulation In addition, Toyota is researching awareness and control technologies that are related to each of these fields.
34
Y. Ota
2 Essential Technologies in Partner Robots Partner Robots that support people’s activities are positioned as a key evolutionary field derived from vehicle technologies, alongside next generation batteries and biotechnologies. 2.1 Assistance and Man-Machine Cooperation Technology Toyota is developing assistance technology to help human operators perform physically demanding work or movements. At first, Toyota focused implementing assistance technology in the factory just for heavy lifting and transportations tasks, but it soon realized that there is also a high demand for this technology in positioning or trajectory assistance tasks that require high-accuracy as well. For the near future, Toyota plans to further develop and implement this technology into wearable assistance devices, not only for the factory, but also to expand into walking assistance and patient movement in the healthcare field. 2.2 Personal Mobility Technology The goal of mobility technology is to be able to autonomously navigate most, if not all, types of terrain that one might encounter in the course of a day. At present, robots are typically confined to indoor flat surfaces, but our work on posture stability is progressing to handle variations in elevations, such as uneven floors, curbs, and steps, and even to take the mobility technology outside. Toyota is also developing the localization, path planning, and map generation technology necessary to provide the autonomy that will allow Partner Robots to navigate in 3-dimensions, for example across multiple floors of an office building. Furthermore, to be able to weave in and out of congested areas, both indoors and outdoors, Toyota is improving the reliability of the obstacle avoidance abilities of Partner Robots. 2.3 Full-Body Coordination Technology Whether running, exercising, or doing work around the house or in the factory, coordination between all parts of the body is an important skill for a robot to be useful to society. Consequently, Toyota is devoting a great deal of its resources to the advancement of posture stabilization and limb coordination [Tsusaka and Ota 2006; Yamamoto and Ota 2007]. As one can easily imagine, this technology may be applied to such activities as patient transportation (nursing), household chores, and naturally, running.
Partner Robots – From Development to Business Implementation
35
Adaptable Autonomous Personal Movement
Dynamic Stability
Mobility
Full-body Coordination
Essential Partner Robot Technology
Man-machine Cooperation Skill Assistance
Tool Manipulation Advanced Dexterity
Fig. 3 Toyota’s four essential robot technology
Toyota’s bipedal technology is actually based on its original “skyhook” automobile control method. Virtual springs and virtual dampers are used to suppress the vibration of the robot body. Incidentally, many of the sensors incorporated in Partner Robot technology, such as high precision gyro sensors and yaw-rate sensors are the same as those used in Toyota vehicles. Toyota’s bipedal ‘running’ technology applies controls used in vehicles as well. Motion of the robot’s torso is stabilized by virtual springs and shock absorbers fixed in space as mentioned above. When running, which is basically just repeating the process of jumping and landing, the robots posture tends to drift and can become unstable. This is a phenomenon that requires a real-time correction to achieve stability. As with the Toyota’s original 2-wheeled robots, the running robot also responds to external perturbations, such as being pushed by a human, by dynamically adjusting its balance. 2.4 Tool Manipulation Technology Furthermore, Toyota is aiming to accurately reproduce the fine motor skills of humans, such as grasping and coordination tasks. Toyota wants Partner Robots to have multi-fingered hands that can quickly and skillfully manipulate objects. Also, Toyota has improved the hand and arm coordination of the trumpet robot [Goto 2003] from the World Expo in 2005 held in Aichi, Japan, to the level necessary for playing the violin. Moreover, Toyota has begun to incorporate behavior (skill) learning with this technology for applications in the healthcare and manufacturing industries. The ability to manipulate tools, and to do so as delicately and gently as a human-being, is extremely important for robots which are designed to assist and
36
Y. Ota
serve humans. If we can get the robots to work together and play musical instruments, such as a trumpet and/or violin with virtuoso-like ability, then it is not unreasonable to assume we can develop robots that are capable of cooperating and manipulating tools to contribute to society. In the beginning, there was only the trumpet robot. However, the next phase of development saw the evolution of robots which could play the tuba, trombone, and drums. As the number of instruments that the robots can master increases, so too does the scope of potentially useable tools.
Fig. 4 Toyota’s violin playing robot
Toyota has then applied our tool manipulation technology into performances with various music instruments. Indeed, the new violin playing robots illustrated in Fig. 4 are now successfully showcased at Shanghai World Expo 2010. Skillful coordination between the hands and arms is required to play the violin. The robot must also be capable of fine finger movements to reproduce subtle operations such as vibrato. In addition, the robot must control the bow with appropriate speed and force to produce music with enough emotion.
3 Practical Implementation To make robots truly useful to society, it is vital to establish a robot industry. In the same way as the motorization trend in the first half of the 20th century and the current computerization trend, efforts are required to popularize robots so that there is one in every house. Cooperation among industry, government, and academia is extremely important to achieve this goal, as depicted in Fig. 5. The establishment of robot safety standards must be promoted through the cooperation of industry and government. ISO Standards for industrial robots cannot be applied without modification to Partner Robots, which are used around ordinary people. It will be very difficult to introduce Partner Robots into our daily lives unless this issue is resolved. In fact, Toyota has been taking a leading role in
Partner Robots – From Development to Business Implementation
37
the world to formulate the ISO Standard for “Robots in personal-care (including healthcare) – Safety” [Ota and Yamamoto 2007; Yamada and Ota 2009], and it is predicted that the ISO Standard will be issued in early 2012 with its great endeavor towards the standardization activity. 1900s Insurance, Car Rentals Automobile Industry
Road Maintenance
Business
Mass-production
Business Models, Partnership
1950s
Safety Standardization, Social Climates
Computer Industry
Technical Innovation
Robot Industry
Internet ads & Shopping
Body Power, Intelligence
Internet, Broadband CPU, OS, Packages
2000s
Fig. 5 Required efforts towards emerging robot industry
Collaboration across industries is also vital. One company alone cannot undertake this task. As is the case with the automotive industry, the collaboration of several companies is required to move from research and development to production and sale, as depicted in Fig. 6. There is a great potential for the development of new business models, and Toyota hopes to see the participation of a large number of business partners in the near future.
Operation Maintenance Robot Insurance
Robot Leases
Robot Sales
Content Providers
Production Collaboration
Software Providers
Robot Production
Fig. 6 Value chains for robot commercialization
38
Y. Ota
4 Conclusions A typical scene in the not-too-distant-future could look something like the illustration seen in Fig. 7 – Robots helping out in many aspects of daily life, including household chores, walking assistance for the disabled, child supervision among others. Also, imagine a network of communally owned personal mobility devices available any time, anywhere, outfitted with navigation systems which would come to your door on-demand, and could be left at your destination. Think how painless and fun mundane tasks will become!
Office
Daily Life Management Stores
School Hospital
Houses
City Hall
Preschool
Station
Ease of Movement
Fig. 7 Robot implementation image in not-too-distant future
References [Goto 2003] Goto, A.: Musical instrument playing robot: Toyota Partner Robots for the 2005 World Exposition, Aichi, Japan. In: Proc. IEEE Int. Conf. on Intelligent Robots and Systems, San Diego, USA (2003) [Ota and Yamamoto 2007] Ota, Y., Yamamoto, T.: Standardization activities of service robots. In: Proc. The 5th Int. Work on Technical Challenges for Dependable Robots in Human Environments, Rome, Italy (2007)
Partner Robots – From Development to Business Implementation
39
[Tsusaka and Ota 2006] Tsusaka, Y., Ota, Y.: Wire-driven bipedal robot. In: Proc. IEEE Int. Conf. on Intelligent Robots and Systems, Beijing, China (2006) [Yamada and Ota 2009] Yamada, Y., Ota, Y.: Novel Activity on international safety standardization for personal care robots. In: Proc. ICROS-SICE Int. Joint Conference, Fukuoka, Japan (2009) [Yamamoto and Ota 2007] Yamamoto, T., Ota, Y.: System design of Toyota partner robots for the 2005 world exposition, Aichi, Japan – reliability and safety. In: Proc. Int. Work on Technical Challenges for Dependable Robots in Human Environments, Rome, Italy (2007)
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots P. Nauth Department of Engineering and Computer Sciences, University of Applied Sciences, Frankfurt, Germany
[email protected]
Abstract. An intelligent robot has been developed which understands the goal a user wants to be met, recognizes its environment, develops strategies to achieve the goal and operates autonomously. By means of a speech recognition sensor the robot listens to the command spoken by a user and derives the goal, i.e. the task the user wants the robot to perform such as to bring a specific object. Next, the robot uses its smart camera and other sensors to scan the environment and to search for the demanded object. After it has found and identified the object, it grabs it and brings it to the user. Additionally, a method for generating a will is proposed. This enables the robot to operate optimally even under conflicting requirements.
1 Introduction Nowadays robots perform repetitive tasks executing control algorithms. This approach is sufficient for industrial applications or to hoover a room. Robots which assist human beings directly such as helping elderly or handicapped people need to act autonomously in a natural environment and to communicate in a natural way with those people they are supposed to support. Autonomous robots require intelligence, i.e. they must execute tasks depending on a goal in a complex environment, learn words representing new goals and adapt to changing requirements such as learning to differentiate objects in unknown environments. However, these kind of environment-learning algorithms do not utilize experiences made by the robot in order to optimize its behaviour. A higher level of intelligence can be achieved by behaviour-learning, i.e. by intelligent robots with a self-generating will. Algorithms for robot control and navigation have been proposed by several research groups [Jin et al. 2003]. This paper focuses on the understanding of goals, strategies to achieve these goals, learning methods and on sensing and navigating in a natural environment with respect to intelligent humanoid robots. The robot we have developed is equipped with visual, auditive and proximity sensors. It understands spoken instructions and acts accordingly by sensing the environment for the respective object. After detection the object is grabbed and delivered. Key Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 41–55. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
42
P. Nauth
technologies for these autonomous robots are Embedded Intelligent Systems [Nauth 2005] which analyze and fuse comprehensive sensor data and derive execution strategies in order to accomplish a goal. Another focus is on applying the algorithms to small robots. The advantages of small sized robots over other systems [Hirai et al. 1998] are reasonable deployment costs and scalability. However, small robots alone cannot carry heavy or big objects or reach these lying in higher levels. A solution to this problem is the swarm robot approach where several robots co-operate as a team in order to solve a heavy task together. If a robot cannot achieve a task by itself it communicates with other robots in order to get help. Effective co-operation requires a certain level of intelligence as well. In order to cope with situations where the robots are confronted with conflicting requirements and drives such as a goal to be immediately achieved in the presence of “fear” entering a dangerous area, an approach for a self-generating will is introduced. The purpose is to equip the robots with cognitive intelligence which allows autonomous behaviour even under difficult and non-predictable conditions by adapting their behavior due to experiences made previously. Architecture, sensing approaches and application results for the intelligent robot based on algorithms adapting to the environment are explained in chapter 2 – 6, whereas the proposal for the self-generating will is described in chapter 7.
2 System Architecture An autonomous robot needs to know the goal to be accomplished, situation awareness and the ability to plan and perform actions depending on a situation. This requires the following functions [Stubbs and Wettergreen 2007]: • Sensing by means of multiple intelligent sensors in order to acquire all necessary data about the environment. It includes getting to know the goal to be met, e.g. by understanding a spoken instruction. • Fusion of the data acquired from intelligent sensors in order to assess the situation • Planning how to achieve the goal and • Execution of the necessary steps by controlling the robot motors Each of these functions rely on reference data which are stored in a distributed data base, e.g. reference data for pattern recognition algorithms in intelligent sensors, for strategies to fuse data or for setting an optimal execution plan. As mentioned above, it is important for autonomous robots that the data base can be adapted to new situations by methods such as learning algorithms.
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots
43
The mechanical structure is based on the Robonova 1 (Fig. 1). It embeds an 8Bit Atmel ATMega 128L micro-controller for data processing and the control of 16 servo motors which drive the arms and legs. Additional servo motors can be attached, e.g. for the movement of the head.
Fig. 1 Autonomous Humanoid Robot
Depending on the area of activity the robot is equipped with some or all of the following sensors: • Speech Sensor • Proximity Sensor • Smart Camera. The speech sensor recognizes the spoken instruction and sends a corresponding word number to the robot microcontroller (Fig. 2) which receives the class and position of all objects recognized by the vision system and distance information. This allows the robot to walk to the object, to grab and deliver it. The signal processing for speech recognition is implemented on the DSP-based module Voice Direct 364 (Sensory Inc.). The image processing algorithms run on a 32-Bit controller ARM7TDMI embedded in the smart colour camera POB-Eye.
44
P. Nauth
It is mounted on the neck of the humanoid robot and can be turned left and right as well as up and down. In order to cope with new situations the speech sensor and the vision sensor can learn new words as well as shapes and colours of new objects respectively. Proximity sensors (laser beam triangulation method) measure the distance to an object. In order to cover distances from 4 to 80 cm we used a near range (4 to 24 cm) and a far range sensor (15 to 80 cm) together. They can be mounted on the moveable head of the robot in order to scan the environment in 2 dimensions. Spoken Instruction
Scenery of Objects
Speech Recognition -Speech Acquisition -Word Recognition
Intelligent Camera -Image Acquisition -Object Identification
Word Number
IR
Object Identifier
Robot Microcontroller - Sensor Fusion - Execution Planning - Execution: Control of Actuators
Coordinates
Distance, Colour
Robot Motors - Legs - Arms - Head with camera and/ or proximity sensors Fig. 2 System architecture. Thick lines indicate parts mounted at the robot
3 Speech Recognition The intelligent speech recognition enables the robot to understand spoken instructions. These are either single words or a sequence of words which are spoken
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots
45
without breaks in between. After data acquisition, the algorithm divides the signal into segments and calculates the frequency spectra out of each segment. Next, frequency parameters are calculated and classified by means of a neural network. As for the training phase, the user speaks a word and repeats it. If the frequency parameters from the first and the repeated word are similar, the word is accepted. The weighting factors of the classifier are being adapted to the word’s frequency parameters and assigned to a word number. Then the training can be continued with the next word. During the recognition phase the speech sensor asks by saying “Say a word” the user to speak the instruction. If the classifier detects a high similarity to one of the previously learned words, it sets a respective pin to high. An additional controller SAB 80535 converts the spoken word into a word number by reading the state of the pins and transmitting a bit sequence via an infrared (IR) LED to the robot. The sequence corresponds to the pin number set to high and therefore to the word number recognized by the speech module. This enables the user to command the robot remotely. The robot microcontroller receives the bit sequence via an infrared detector and decodes the word number. For each word number is assigned to an instruction, the robots now knows its goal, i.e. which object to search for.
4 Object Recognition For the recognition of the demanded object and for obstacle avoidance during the search phase the intelligent camera and the proximity sensors are used. The algorithms we have developed for the smart camera converts the acquired RGB – image into the HSL – space, segments the image [Gonzalez and Woods 2008] by means of an adaptive threshold algorithm (histogram analysis of Hue) and extracts the form factor F F =U2 A
(1)
from the area A and the circumference U as well as the average Hue Ha from each object detected. By means of a box classifier [Toth 2009] each object is assigned to an object identifier which represents the class (Fig. 3). Given the extracted parameters, objects and obstacles can be differentiated regarding shape and colour. Additionally, the coordinates of each object are calculated. The object identifier and the respective coordinates of all objects found are transmitted to the robots microcontroller. New objects can be learned by a supervised learning algorithm: Typical examples of each object class are shown to the camera and the learning algorithm assigns the mean values of each parameter to the class these objects belong to. The tolerances which define the size of the classification box of each class equal 1.5 times the standard deviation calculated during the teach-in procedure.
46
P. Nauth Average Hue Ha Feature vector of an unknown object assigned to class 1
Class 1 X
Mean of Ha and F Class 3 Tolerance of Ha
Class 2 Tolerance of F Form Factor F
Fig. 3 Box Classifier. Mean values and tolerances result from the learning phase
The proximity sensors supplement the camera information by measuring the distance between the robot and the object. Additionally, proximity data acquired by scanning in the horizontal and vertical direction can be used to provide shape parameters for object differentiation themselves, especially for obstacle detection [Eres 2009]. Given that the proximity sensors scan the environment in two dimensions, the distance z (α,ß) is a function of the horizontal angle α and the vertical angle ß. Appropriate parameters can be selected by modeling the distance function for typical object shapes. E.g., the distance function z (α,ß) of a round object with the radius r positioned in a lateral distance d can be modeled as z (α , β ) =
(d + r ) cosα − (d + r ) 2 (cos2 α − 1) + r 2 cos β
(2)
whereas a wall has the distance function z (α, ß) z(α , β ) =
d cosα cos β
(3)
At the edges of the steps of stairs the distance changes rapidly (Fig. 4). Therefore, stairs can be differentiated from objects and walls if unsteady parts show up in the distance function z (α, ß) when turning the proximity sensors in the vertical direction ß.
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots
47
Fig. 4 Distance Function of stairs. The vertical and horizontal angle of the proximity sensor’s position is shown on the x- and z- axis, respectively. The distance is shown on the y-axis and in colours
5 Sensor Fusion, Planning and Motion Control By fusing the auditive, visual and proximity data, the robot knows all objects within its reach and their position as well as the goal it is advised to attain. The fusion algorithm used is hierarchical (Fig. 5) and works as follows: First, auditive and visual data are fused by matching the word number (derived from the speech sensor data) with one of the object identifiers (derived from the camera data) by means of a table. We overcome the binding problem by not dealing with sensor data themselves but by fusing classification results. The algorithm generates one of the following hypothesis: A negative match result (i.e. no object or the wrong object found) leads to the hypothesis “object not found”. This requires no additional fusion of visual and proximity data and causes the robot to repeat the search. • A match of one of the object identifiers with the word number results in the hypothesis “object found”. • The hypothesis “obstacle” is derived if a wall or a stair has been classified regardless the spoken command and if no other object that matches the goal is present. The robot moves towards the object or obstacle until it is within the scanning range of the proximity sensor.
48
P. Nauth
In the second fusion step the hypothesis generated by the visual sensor is verified by the data acquired from the proximity sensor. If the class derived from the distance function z (α,ß) equals the hypothesis it is regarded as true. Otherwise the hypothesis is rejected. Additionally, obstacles are differentiated between wall and stairs.
Fig. 5 Hierarchical sensor fusion architecture
This hierarchical approach results in a high specificity and a low sensitivity because in case of conflicting visual and proximity results they are rejected. We overcome this problem by repeating the search in this case. The robot moves to a different position and starts a renewed search as described above. Currently we apply the fusion approach to differentiate 3 kinds of objects: a water bottle and bottles of 2 different kinds of soft drinks as well as 2 different kinds of obstacles (wall and stairs). At each phase of the execution of the goal the robot has to plan the next actions necessary to meet the goal. First, the robot assesses its state s j (x) which is a func-
tion of the sensory input x . Next, the action a i ( s j ( x), g , a i −k , x)
(4)
is derived depending on the actual state s j ( x) , the goal g to be met, previous actions ai −k and the sensory input reaches the next state s j+1 ( x) .
x . After having executed the action ai the robot
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots
49
Supposed the robot is in the state “second sensor fusion step”, three different actions are possible from this state. The robot develops the respective execution plan and controls the robot motors accordingly: • If the demanded object has been identified, the robot approaches it and grabs it in order to bring it to the user. During the movement towards the object its position relatively to the robot is being tracked permanently. In order to grab it, the robot presses its arms from left and right at the object. It stops the arm movements if the feedback signals of the arm’s motors indicate a resistance. • If no or the wrong object has been spotted or in case of conflicting fusion results, the robot repeats the search by turning the camera head and the proximity sensors or by moving around in order to change the position. • If an obstacle has been detected, the robot develops an approach to overcome it, i.e. it climbs stairs or avoids colliding with walls.
6 Application Examples One typical example of the robot’s performance is to search for objects and bring it to the user (Fig. 6). If the user says “Water Bottle”, the robot understands its task and searches for the bottle. After detection, it grabs the bottle and brings it to the user. Stairs can be climbed by a coordinated arm and leg movement and the stabilization of the robot with a tilt sensor. In the scenario mentioned above (1 water and 2 soft drink bottles, a wall and stairs), the right bottle has been found in 37 out of 40 cases. In 14 cases the robot had to repeat the search at least once due to a mismatch or conflicting data during the fusion process. The number of search repetitions has been limited in each of the cases to 10. In 1 case stairs have been wrongly classified as a wall and the robot went around. It turned out that the precision during the final approaching phase is decisive in order to avoid a collision between robot and object. Due to measurement errors in the proximity data and variations in the step length of the robot caused by poor grip of the floor covering, in 4 of the 37 cases mentioned above the robot went so close to the object that it felt down. The error margin of the proximity sensor and the step length depend on the surface condition of the object and the floor covering, respectively. Sensory errors regarding object recognition resulted mainly from the low resolution of the camera (120 x 80 pixel) and the limitation of the image processing features to the 2 parameters average hue and form factor. Therefore, we plan to replace the camera either by its successor POB-Eye II or the SRV-1 camera module (Surveyor) and to implement new algorithms. In order to support the ATMega controller, a PSOC CY8C29X66 will be integrated. A modified version of the robot uses the feedback signals of the motor positions to indicate whether an object is too heavy to take. This sets the robot for the state
50
P. Nauth
“need help” and triggers the communication with other robots via Bluetooth to come and assist in order to grab and carry the object together (Fig. 7) as swarm robots. However, although the robots synchronize their movements, differences in their mechanical adjustments sometimes cause the object to fall down if carried for a long distance. Listen for commands
“Water Bottle !“
Search
Grab the bottle
Fig. 6 Object Search and Fetch
Bring the bottle
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots
51
Fig. 7 Co-operating Robots
In case a robot has found an object it is not instructed to grab, it informs other robots about the object type and position. A robot which has received the goal to fetch this particular object can go directly to it and grab it without the need of searching for the object by itself. Other application examples are soccer robots which must search for the ball and kick them to the goal or another robot of their team. In order to co-ordinate these actions they exchange their and the ball’s positions as well as information about the action they plan to execute next. As for rescue robots, we focus on the scenario that an injured person (in this case a doll) lays on a stretcher. Two robots evacuate the injured person by jointly carrying the stretcher into a safe area. This task can be solved efficiently if they communicate with each other in order to synchronize their movements and to exchange information about the direction to go as well as to warn each other about obstacles.
7 Self-generating Will The robot so far obeys the user and executes the task straightforward. Although it is intelligent because it can adapt by learning new object classes and words, it cannot cope with conflicting requirements and cannot use behavioural experiences it has made previously such as avoiding dangerous situations (Fig. 8). In order to enable the robot to act intelligently when boundary conditions have changed and requirements contradict, we propose to equip the robot with a self- generating will based on artificial feelings. For the related algorithms must run on the limited resources of the microcontroller embedded in the robot, our approach focuses on the development of compact algorithms rather than modeling human behaviour or the functions of the brain [Dietrich et al. 2009, Doeben-Henisch 2009].
52
P. Nauth
Fig. 8 Behaviour scenarios of the robot. Without feelings it would run into dangerous situations, whereas feelings would generate the will to go back in case of danger
The limitation of the robot presented above is the dependence of the state s j ( x) on the task-related sensory vector x containing features derived from the visual and proximity data, only. Adding a physical condition sensory vector d for acquiring features related to the internal state of the robot such as temperature as well as for drives such as hunger (low battery status) and the desire for praise by the user for having achieved a goal and “social contact”, the robot can gather information about its well-being or the dangerousness of situations. Combining x and d as well as appending a constant of 1 we get the sensory vector y y = ( x T , d T ,1) T
(5)
Regarding the linear algorithm described subsequently for the assessment of the feelings, it is important that the features used in the sensory vector y can be separated linearly in feature space. In order to implement a sort of feelings with respect to a given state, we introduce a weighting vector w which represents the experiences made during tasks which have been performed previously. Hence, the states s j ( y, w)
(6)
depend on the sensory input y and experiences w . The experience gathering process, i.e. the training of w by forced feedback, is triggered if the physical condition sensors (or a part of them if applicable) exceed a threshold which indicates either the physical condition “pain” or “comfortable”. They can be compared with receptors of the human body, e.g. the pain receptors of the skin, which give a feedback about the body’s situation.
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots
53
In our current research we use the inertial sensor as the physical condition sensor which triggers the adaption of w . The sensory input y is currently limited to 2 features: the inertial angle and the proximity to the nearest object. An inertial angle of more than a magnitude of 31° against the vertical axis is regarded as “pain” because the robot has fallen down and triggers the experience gathering process. An inertial angle of less than or equal 5° (magnitude) is regarded as “comfortable”. We have chosen this narrow range in order to avoid that the normal condition “comfortable” dominates the learning process. Other trigger events could be: recognizing that an action has led to an increasing battery voltage (“comfortable”), a word of acknowledgement spoken by the user (“comfortable”) or a too high temperature (“pain”). Experiences are made by assigning the feedback “pain” to the feeling “fear” as well as “comfortable” to “well being” and adapting w accordingly. A possible adaption approach is minimizing the amount of false classifications by the 1st gradient method and the Robbins-Monro method. This implies correcting w by means of y scaled by a factor k (0 < k ≤1) in case of a false classification, i.e. a false assessment of the feeling: wi +1 = wi + k y
if the feeling "fear” has been classified but the physical condition feedback is “comfortable”
wi +1 = wi − k y
(7)
if the feeling “well being” has been classified but the physical condition feedback is “pain”.
w is not modified in case of a right classification. It is important for the adaption to use the sensory data y which have been acquired a short time before the adaption has been triggered. Under this condition feelings calculated will allow the robot act appropriately before a situation has deteriorated dangerously. In order to assess the artificial feelings about the current state, a quality of state function [Sun 2007] Q( s j ( y, w)) = y T w
(8)
is introduced, which classifies the state either to the feeling “well being” or to the feeling “fear”. A quality of state larger than or equal 0 corresponds to “well being”. This will cause the robot to continue the actions necessary to achieve its goal. A quality of state value less than 0 means the robot has “fear” e.g. of another “aggressive” robot which has hit it in the past and caused it to fall down. Rather than executing the actions to meet the goal our robot would return to its (safe) start position and stay there or retry to fulfill its mission after a while.
54
P. Nauth
A more sophisticated approach would avoid going back and decide about the appropriate action in order to reach a state without fear and to continue the goal achievement. It requires predicting the quality of all the states which can be reached from the current state s j ( y, w) after the actions ai have been executed. The prediction is calculated by a modified quality of state function Q ( s j ( y , w), a i )
(9)
Hence, the generation of the robot’s will to perform an action a w in order to achieve the goal g a w ( s j ( y, w), g , ai −k , y )
(10)
is not rule based but the result of optimizing the quality of state function Max{Q ( s j ( y, w), ai )} → a w
(11)
This operator predicts the quality of all states s j ( y, w) which can be attained from the current state with respect to the possible actions ai that can be executed from the current state and selects the maximum. As a result the robot generates the will to perform an action a w which maximizes the quality of state. The algorithm is currently implemented at our humanoid robot. If requirements contradict such as a goal to be met and the danger caused by another “aggressive” robot, our robot could cope with this contradiction by making a decision based on the maximal quality of state. We expect the robot to have a kind of independence and refuse to meet the goal directly if it expects that an alternative state has a higher quality than the state “goal achieved”. In this scenario the negative feeling “fear” dominates over the expected praise of the user for having delivered the object. Having reached the state chosen by its self-generating will the robot can decide on the next steps necessary to attain the goal. Or it might seek co-operations more frequently if previous team work had been successful such as asking another robot to grab the bottle and deliver it.
8 Conclusion A robot has been developed which understands spoken instructions and can act accordingly. If the user advises the robot to bring a particular object, the robot uses its smart camera and other sensors in order to search for the object and carries it to the user. New instructions and new objects can be learned by a supervised teach-in procedure. Important features are reasonable deployment costs due to the relative small size and scalability by co-operations as swarm robots. Additionally, applications such as rescue robots or soccer robots have been developed.
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots
55
Furthermore an algorithm for implementing a self-generating will has been proposed. It enables the robot to change its behaviour based on positive and negative experiences it has made in the past.
Acknowledgements The research was funded by the Fachhochschule Frankfurt a.M. and the Gesellschaft für technische Weiterbildung e.V.
References [Dietrich et al. 2009] Dietrich, D., Bruckner, D., et al.: Psychoanalytical model for automation and robotics. In: IEEE Africon, Nairobi, Kenya (2009) [Doeben-Henisch 2009] Doeben-Henisch, G.: Humanlike computational learning theory. A Computational Semiotics Perspective. In: IEEE Africon, Nairobi, Kenya (2009) [Eres 2009] Eres, D.: Object recognition for humanoid robots using intelligent sensors. Diploma Thesis, Fachhochschule Frankfurt a.M., Germany (2009) (in German) [Gonzalez and Woods] Gonzalez, R., Woods, R.: Digital image processing. Prentice Hall, Englewood Cliffs (2008) [Hirai et al. 1998] Hirai, K., Hirose, M.Y., et al.: The development of Honda humanoid robot. In: IEEE International Conference on Robotics and Automation, pp. 1321–1326 (1998) [Jin et al. 2003] Jin, T., Lee, B., et al.: AGV Navigation using a space and time sensor fusion of an active camera. Int. J. of Navigation and Port Research 27(3), 273–282 (2003) [Nauth 2005] Nauth, P.: Embedded intelligent systems. Oldenbourg Verlag, München/Wien (2005) [Sun 2007] Sun, R.: Cognitive social simulation incorporating cognitive architectures. In: IEEE Intelligent Systems, September 2007, pp. 33–39 (2007) [Stubbs and Wettergreen 2007] Stubbs, K., Wettergreen, D.: Anatomy and common ground in human-robot interaction: a field study. In: IEEE Intelligent Systems, pp. 42–50 (2007) [Toth 2009] Toth, D.: Object recognition of humanoid robots using visual sensors. Diploma Thesis, Fachhochschule Frankfurt a.M., Germany (2009) (in German)
A Talking Robot and Its Singing Performance by the Mimicry of Human Vocalization M. Kitani, T. Hara, H. Hanada, and H. Sawada Department of Intelligent Mechanical Systems Engineering, Faculty of Engineering, Kagawa University, Japan
[email protected],
[email protected]
Abstract. A talking and singing robot which adaptively learns the vocalization skill by an auditory feedback learning is being developed. The fundamental frequency and the spectrum envelope determine the principal characteristics of a sound. The former is the characteristics of a source sound generated by a vibrating object, and the latter is operated by the work of the resonance effects. In vocalization, the vibration of vocal cords generates a source sound, and then the sound wave is led to a vocal tract, which works as a resonance filter to determine the spectrum envelope. The paper describes the construction of vocal cords and a vocal tract for the realization of a talking and singing robot, together with the control algorithm for the acquisition of singing performance by mimicking human vocalization and singing voices. Generated voices were evaluated by listening experiments.
1 Introduction Humans employ voices not only for simple daily communication, but also for the transmission of complex contexts in logical discussions. Different vocal sounds are generated by the complex movements of vocal organs under the feedback control mechanisms using an auditory system. Vocal sounds and human vocalization mechanisms have been the attractive researching subjects for many researchers so far, and computerized voice production and recognition have become the essential technologies in the recent developments of flexible human-machine interface studies. Various ways and techniques have been reported in the researches of voice production. Algorithmic syntheses have taken the place of analogue circuit syntheses and became widely used techniques. Sampling methods and physical model based syntheses are typical techniques, which are expected to provide realistic vocal sounds. In addition to these algorithmic synthesis techniques, a mechanical approach using a phonetic or vocal model imitating the human vocal system would be a valuable and notable objective.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 57–73. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
58
M. Kitani et al.
Mechanical constructions of a human vocal system to realize human-like speech have been reported so far. In most of the researches, however, the mechanical reproductions of the human vocal system were mainly directed by referring to X-ray images and FEM analysis, and the adaptive acquisition of control methods for natural vocalization have not been considered so far. In fact, since the behaviors of vocal organs have not been sufficiently investigated due to the nonlinear factors of fluid dynamics yet to be overcome, the control of mechanical system has often the difficulties to be established. We are developing a talking robot by reproducing a human vocal system mechanically. An adaptive learning using an auditory feedback control for the acquisition of vocalizing skill is introduced. The fundamental frequency and the spectrum envelope determine the principal characteristics of a sound. The former is the characteristics of a source sound generated by a vibrating object, and the latter is operated by the work of the resonance effects. In vocalization, the vibration of vocal cords generates a source sound, and then the sound wave is led to a vocal tract, which works as a resonance filter to determine the spectrum envelope. The robot is being constructed, based on a motor-controlled mechanical model with vocal cords, a vocal tract and a nasal cavity to generate a human-like voice imitating a human vocalization. By introducing an auditory feedback learning with an adaptive control algorithm of pitch and phoneme, the robot is able to autonomously acquire the control method of the mechanical system to produce stable vocal sounds imitating human vocalization skill. In the first part of the paper, the adaptive control method of mechanical vocal cords and vocal tract is described, and then the evaluation of generated voices by listening experiments by human is presented.
2 Construction of a Talking Robot The talking robot mainly consists of an air pump, artificial vocal cords, a resonance tube, a nasal cavity, and a microphone connected to a sound analyzer, which respectively correspond to a lung, vocal cords, a vocal tract, a nasal cavity and an audition of a human. Construction and overview of the talking robot is shown in Figure 1 and 2. An air from the pump is led to the vocal cords via an airflow control valve, which works for the control of the voice volume. The resonance tube as a vocal tract is attached to the vocal cords for the manipulation of resonance characteristics. The nasal cavity is connected to the resonance tube with a rotary valve between them. The sound analyzer plays a role of the auditory system. It realizes the pitch extraction and the analysis of resonance characteristics of generated sounds in real time, which are necessary for the auditory feedback control. The system controller manages the whole system by listening to the vocalized sounds and
A Talking Robot and Its Singing Performance
59
calculating motor control commands, based on the auditory feedback control mechanism employing a neural network learning. The relation between the voice characteristics and motor control commands are stored in the system controller, which are referred to in the generation of speech articulatory motion.
Airflow
Auditory system
Nasal Cavity
Vocal cords Air compressor
Vocal Tract Nasal Cavity
Vocal cords
Lung+Trachea
10 11
Sound Analysis Microphone
9 Resonance Tube
Low Pass Filter Control Valve Pressure Valve Pitch Control Motor
8 7654 321 Tongue AD Board Resonance Control Motors
Calculator USB Learning Control
Fig. 1 Construction of a talking robot
Rubber band
Vocal source
Airflow Vibration Plastic cover
Fig. 2 Overview of a talking robot
Plastic body
Fig. 3 Structure of artificial vocal cord
60
M. Kitani et al.
2.1 Artificial Vocal Cords and Its Pitch Control The characteristics of a glottal wave which determines the pitch and the volume of human voice is governed by the complex behavior of the vocal cords. It is due to the oscillatory mechanism of human organs consisting of the mucous membrane and muscles excited by the airflow from the lung. Although several researches about the computer simulations of the movements are found, we are trying to generate the wave by a mechanical model. We employed an artificial vocal cord used by people who had to remove their vocal cords because of a glottal disease. Figure 3 shows the structure of the artificial vocal cord. The vibration of a rubber with the width of 5mm attached over a plastic body makes vocal sound source. We measured the relationship between the tensile force and the fundamental frequency of a vocal sound generated by the artificial vocal cord. The fundamental frequency varies from 110 Hz to 350 Hz by the manipulations of a force applying to the rubber. While, the relation between the produced frequency and the applied force is not stable but tends to change with the repetition of experiments due to the fluid dynamics. The artificial vocal cord is, however, considered to be suitable for our system not only because of its simple structure, but also its frequency characteristics to be easily controlled by the tension of the rubber and the amount of airflow. 2.2 Construction of Resonance Tube and Nasal Cavity The human vocal tract is a non-uniform tube about 170mm long in man. Its crosssectional area varies from 0 to 20cm2 under the control for vocalization. A nasal cavity with a total volume of 60 cm3 is coupled to the vocal tract. In the mechanical system, a resonance tube as a vocal tract is attached at the sound outlet of the artificial vocal cords. It works as a resonator of a source sound generated by the vocal cords. It is made of a silicone rubber with the length of 180 mm and the diameter of 36mm, which is equal to 10.2 cm2 by the cross-sectional area as shown in Figure 2. A nasal cavity is coupled with the resonance tube as a vocal tract to vocalize human-like nasal sounds by the control of mechanical parts. A rotational valve as a role of the soft palate is settled at the connection of the resonance tube and the nasal cavity for the selection of nasal and normal sounds. For the generation of nasal sounds /n/ and /m/, the rotational valve is open to lead the air into the nasal cavity. By closing the middle position of the vocal tract and then releasing the air to speak vowel sounds, /n/ consonant is generated. For the /m/ consonants, the outlet part is closed to stop the air first, and then is open to vocalize vowels. The difference in the /n/ and /m/ consonant generations is basically the narrowing positions of the vocal tract. In generating plosive sounds such as /p/, /b/ and /t/, the mechanical system closes the rotational valve not to release the air in the nasal cavity. By closing one point of the vocal tract, air provided from the lung is stopped and compressed in the tract. Then the released air generates plosive
A Talking Robot and Its Singing Performance
61
consonant sounds like /p/ and /t/. The robot also has a silicone-molded tongue, which is made by referring to the shape and size of a human. A string is attached to the tongue, and at the other end of the string, a servo motor is connected for the manipulation of the up-down motion, to articulate the vocalization of /l/ sounds. By actuating displacement forces with stainless bars from the outside of the vocal tract, the cross-sectional area of the tube is manipulated so that the resonance characteristics are changed according to the transformations of the inner areas of the resonator. Compact servo motors are placed at 8 positions mj ( j = 1-8) from the lip side of the tube to the intake side, and the displacement forces Pj (mj ) are applied according to the control commands from the motor-phoneme controller. In this study a LPC cepstrum is employed as a phonetic characteristic. Figure 4 shows the change of the phonetic characteristics by randomly manipulating the 8 sliding bars settled under the vocal tract. We found that the resonance tube and its manipulation mechanism sufficiently reproduce the various resonance characteristics for generating human-like vocal sounds. A problem to be solved is how the robot learns articulatory motions for vocalizing particular vocal sounds, like a human baby who learns vocalization skills.
3 Learning of Vocalization Skill An adaptive learning algorithm for the achievement of a talking and singing performance is introduced. The algorithm consists of two phases. First in the learning phase, the system acquires two maps in which the relations between the motor control values and the characteristics of generated voices are described. One is a motor-pitch map, which associates motor control values with fundamental frequencies. It is acquired by comparing the pitches of generated sounds with the desired pitches included in melody lines of a music score. The other is a motorphoneme map, which associates motor values with the phonetic characteristics of generated voices appeared as lyrics in a score. Then in the performance phase, the robot gives a singing performance by referring to the obtained maps while pitches and phonemes of produced voices are adaptively maintained by hearing its own output voices. 3.1 Adaptive Pitch Learning The pitch learning algorithm simulates the pitch learning process of a human in practicing singing. The system starts its action by sending arbitrary values to the pitch controller to let the vocal cord motor and the air-control motor move. The pitch of a generated sound is calculated by the sound analyzer of the auditory system, which executes FFT calculations in realtime. The difference between the
62
M. Kitani et al.
target pitch and the current pitch is calculated, and the next motor commands are determined to reduce the pitch difference. As the auditory feedback process is repeated, the pitch difference between the target pitch and the produced pitch decreases. When the pitch difference becomes smaller than a predetermined threshold value, which is currently set to 0.6 Hz, the motor control commands are associated with the target pitch and are stored as the motor-pitch map. An example of the pitch learning result is shown in figure 5. The ordinate shows the pitch, and the abscissas shows the time step of the learning. As time proceeds in the learning, the robot successfully obtained proper pitches given as targets. The results show that the robot successfully learned the correspondence of air-control motor values with pitches. 50 40 30 20 10 0 -10 -20
[mm]
[Value]
8 6 4 2 0 -2 -4 -6 -8 -10
b
0 1 2 3 4 5 6 7 8 9 [Order]
1
a) 9th order LPC cepstra
2
3
4
5 6 7 8 [Motor number]
b) Vocal tract shape
Fig. 4 Characteristics obtained by randomly moved vocal tract shape
Fundamental frequency [Hz]
180
Tuning freq Target freq
160 140
E D
120
C B
100
A
80 0
5
10
15
20
Fig. 5 Result of the pitch tuning
25 30 [Tuning step]
A Talking Robot and Its Singing Performance
63
3.2 Learning of Vowel and Consonant Vocalization A neural network (NN) is employed to autonomously associate vocal tract shapes with generated vocal sounds. The associated relations will enable the robot to estimate the articulation of the vocal tract for generating particular vocal sounds even if they are unknown vocal sounds, owing to the inference ability of the NN. In the learning process, the network learns the motor control commands by inputting resonance characteristics as teaching signals. The network acquires the relations between sounds and the cross-sectional areas of the vocal tract. After the learning, the NN is connected in series into the vocal tract model. By inputting the sound parameters of desired sounds to the NN, the corresponding shape of the vocal tract is associatively obtained. In our previous studies, a Self Organizing Neural Network (SONN) by combining a Self-Organizing Map (SOM) with a Neural Network (NN) was employed. The SONN had 2-dimensional mapping space, in which phonetic characteristics of voices generated by the robot were located. By choosing cells on the map, voice articulations were autonomously recreated. However, due to the spatial restriction of the map, the voice transition from one vocal sound to another was not always recreated properly. To solve this problem, a three-dimensional Self-Organizing Map (3D-SOM) is employed for locating phonetic characteristics. The 3D-SOM has three-dimensional mapping space, and the characteristics could be located three-dimensionally, so the probability of miss location could be decreased. In this study we employ two 3D-SOMs, one for constructing the topological relation among the control commands and the other for establishing the relations of the phonetic characteristics. After the learning of two 3D-SOMs, two 3D-SOMs will be associated, based on the topological relations of motor control commands with phonetic characteristics. We called this algorithm a dual-SOM. a) Learning Method of 3D-SOM On the 3D-SOM, the inputs are defined as a vector x={xi} [i =1,...,9], and the weighting vectors which are assigned on the cells of the 3 dimensional feature map are defined as m={mi}. A Gaussian function is employed for the learning of the three-dimensional SOM. The learning is executed as follows. (I) Weighting vectors {mi} are initialized by small random values, and the variance of the Gaussian function is initialized by a large value. (II) Sound characteristics consisting of 9th order cepstra are extracted from vocal sounds, and are inputted as {xi}. (III) The cell, which has the minimum Euclidean distance with xi are selected by formula (1) on the 3D feature map. c = arg min m i − x i t
(1)
64
M. Kitani et al.
(IV) The neighborhoods of the selected cells are calculated by the formula (2) and (3). m i (t + 1) = m i (t ) + hci [ x i (t ) − m i ( t )] hci = α ( t ) ⋅ exp( −
rc − ri
2
2σ ( t ) 2
)
(2) 0 < α (t ) < 1
(3)
In the equation, α is a learning parameter which indicate the weight of the learning, and its value decreases as the learning proceeds. hci express the range of neighborhood learning. In this paper the Gaussian function is employed, and σ which decreases as learning proceeds, defines the neighbouring cells. (V) The learning proceeds until all the phonetic characteristics are distributed properly on the feature map, by repeating the procedure (II) to (IV), and the topological relations among different features are autonomously established.
Fig. 6 Learning result of 3D-SOM
Figure 6 shows the result of the mapping in the SOM. Each marker corresponds to five Japanese vowels in a 3D-SOM. The phonetic characteristics are well mapped three dimensionally, and five vowels are categorized with one another by the learning of the SOM. On each cell, phonetic characteristics are buried, and close phonetic characteristics are located close with other to form a cluster of a vowel. Figure 7 shows sound characteristics extracted from Japanese vowel /a/ from the robot, together with human voices for the comparison. As a result of learning, the clusters of each vowel are properly constructed in three-dimensional mapping space, and each similar phonetic characteristic is located with each other. These results present that the talking robot successfully learns the topological relations of phonetic characteristics.
A Talking Robot and Its Singing Performance
65
b) Dual-SOM
15
15
10
10
5
5
[Value]
[Value]
In this study a dual-SOM is employed to associate the motor control commands of the robot with the phonetic characteristics of generated voices. The structure of the dual-SOM is shown in Figure 8, which consists of two self-organizing maps. One is a 3D-Motor_SOM, which describes the topological relations of various shapes of the vocal tracts, in which close shapes are arranged in close locations with each other, and the other is 3D-Phonetic_SOM, which learns the relations among phonetic characteristics of generated voices. The talking robot generates various voices by changing its own vocal tract shapes. Generated voices and vocal tract shapes have the physical correspondence, since different voices are produced by the resonance phenomenon of the articulated vocal tract. This means that similar phonetic characteristics are generated by similar vocal tract shapes. By adaptively associating the 3D-Phonetic_SOM with the corresponding 3D-Motor_SOM, we could expect that the talking robot autonomously learns the vocalization by articulating its vocal tract.
0 -5
0 -5
-10
-10
-15
-15 0
1
2
3
a) Human
4
5
6
7 8 9 [Order]
0
1
2
3
4
5
6
7
8 9 [Order]
b) Talking robot
Fig. 7 Phonetic characteristics of the /a/ vowel on 3D-SOM
In the learning phase, the motor control commands and the corresponding phonetic characteristics consisting of 9th order LPC cepstra are obtained by random articulations of the talking robot, and are inputted to the 3D-Motor_SOM and the 3D-Phonetic_SOM, respectively. The topological structures are autonomously established by the neighborhood learning on each SOM, so that similar patterns are located close with each other, and different patterns are located apart. The differences among patterns appear as the norm information in the three-dimensional space in the SOM, so we tried to associate the two maps with each other by referring to the norms among a target cell and winner cells, which are shown in Figure 9. First, in the 3D-Phonetic_SOM, the distances from a target cell to the selected 3 winner cells are calculated, and the topological relations among 4 cells are also obtained. Then, by applying the topological relations to the 3D-Motor_SOM, the
66
M. Kitani et al.
location of a cell from the corresponding 3 winner cells is estimated. The estimated location in the 3D-Motor_SOM would generate the corresponding vocal tract shape given by the phonetic characteristics of the inputted sound.
Phonetic Characteristics
Topological structure 3D-Phonetic_SOM
Motor Control commands 3D-Motor_SOM Fig. 8 Structure of dual-SOM
Winner cell 3
Winner cell 1
Winner cell 1 Target cell Winner cell 2
3D-Phonetic_SOM
Estimated ll Winner cell 2
Topological relations among Winner cell 3 Target and Winner cells 3D-Motor_SOM
Fig. 9 Association of 3D-Phonetic_SOM and 3D-Motor_SOM
3.3 Experiments for Vocalization Acquisition After the learning of the relationship between the 3D-Phonetic_SOM and the 3DMotor_SOM, we inputted human voices from microphone to confirm whether the robot could speak by mimicking human vocalization. Figure 10 shows the comparison of spectra between human /u/ and /e/ vowel vocalizations and robot speech. The first and second formants, which present the characteristics of the vowels, were formed as to approximate the human vowels, and the sounds were well distinguishable by listeners. The robot also successfully acquired the other Japanese /a/, /i/ and /o/ vowels, and the first and second formants were formed as to appear in vowels vocalized by a human.
A Talking Robot and Its Singing Performance
67
In /a/ vocalization, for example, the glottal side was narrowed while the lip side was open, which was the same way as a human utter the /a/ sound. In the same manner, characteristics for the /i/ pronunciation were acquired by narrowing the outlet side and opening the glottal side. The experiment also showed the smooth motion of the vocalization. The connection between /a/ and /i/ sounds were well acquired based on the dual-SOM learning, and the /a/ vocalization was transited to /i/ vocalization smoothly by the mechanical system. Nasal sounds such as /m/ and /n/ are generated with the nasal resonance under the control of the valve between the nasal cavity and the resonance tube. A voiced sound /mo/ was generated by closing the lip and leading the sound wave to the nasal cavity, then by opening the lip and closing the nasal valve, the air was released to the mouth to vocalize /o/ sound. 3.4 Transition from One Vowel to Another For the validation of the voice learning executed by dual-SOM, the transition motion is recreated. Figure 11 shows the result of transition /o/ from /a/ vowel. The (a-1) indicates the change of phonetic characteristics buried in 3D-Phonetic_SOM and (b-1) shows the motor control values buried in 3D-Motor_SOM. In each figure, abscissas show the time step, and the ordinates show the change of phonetic characteristics and motor control values respectively. As the phonetic characteristics changed its values from /o/ to /a/, the corresponding motor control values changed gradually as expected. The (a-2) and (b-2) shows the selected cells on the 3D-Phonetic_SOM and 3D-Motor_SOM through the transition from /o/ to /a/. Through the transition, selected cells were gradually changing its positions from /o/ to /a/. This result confirms the cells on two SOMs were properly chosen. By the use of the three dimensional space for the mapping, the transitions were properly generated, and the results presented that the associations between two SOMs were well achieved. 3.5 Singing Performance with Adaptive Control The singing performance is executed by referring to the acquired two maps, which are the motor-pitch map and the motor-phoneme map. The motor-pitch map shows the relation between pitch information with motor control value of motor number 9 which controls the air flow rate. On the other hand, the motor-phoneme map describes the relation between target phonemes with motor control values which decides the vocal tract shape. Table 1 show the motor-pitch and motor-phoneme maps. Score information is inputted by the use of an interface dialog shown in Figure 12 before the performance. A user selects pitch, duration and lyrics from the lists of musical notations to compose a score, which is stored as score information.
68
M. Kitani et al.
30 10 -10 -30 -50 -70 -90 -110
30 10 -10 -30 -50 -70 -90 -110
F1
F1
F2
[dB]
[dB]
F2
0
1000 2000 3000 4000 [Hz]
0
a-1) Human vowel /u/
30 10 -10 -30 -50 -70 -90 -110
a-2) Talking robot /u/
30 10 -10 -30 -50 -70 -90 -110
F1
F2
[dB]
F2
[dB]
F1
1000 2000 3000 4000 [Hz]
0
1000 2000 3000 4000 [Hz]
b-1) Human vowel /e/
0
1000
2000 3000 4000 [Hz]
b-2) Talking robot /e/
Fig. 10 Comparison of spectra
The singing performance is conducted according to the performance signals generated by the performance-control manager. The manager has the internal clock and makes a temporal planning of note outputs with the help of the duration information in the score. The score information is translated into motor-control commands by referring to the maps. During the performance, unexpected changes of air pressure and tensile force cause the fluctuations of sound outputs. The adaptive control with the auditory feedback is introduced by hearing the own output voices. The auditory system observes errors in the pitch and the phoneme so that the system starts fine tuning of produced sounds. The system is able to realize a stable singing performance under the adaptive mechanical control using the auditory feedback.
[Value]
A Talking Robot and Its Singing Performance
10 8 6 4 2 0 -2 -4 -6
/o/
69
/a/
1
2
3
4
5
[T ime step] 2nd order 4th order 6th order 8th order
1st order 3rd order 5th order 7th order 9th order
[mm]
50 40 30 20 10 0
-10 -20
/o/ 1
/a/ 2
3
4 5 [T ime step] 2nd motor 4th motor 6th motor 8th motor
1st motor 3rd motor 5th motor 7th motor
a-1) Phonetic features on 3D-Phonetic_SOM
b-1) Control commands on 3D-Motor_SOM
Start
End
End Start
a-2) Selected cells on 3D-Phonetic_SOM
b-2) Selected cells on 3D-Motor_SOM
Fig. 11 Transition of /o/ to /a/ vowel
70
M. Kitani et al.
To realize a real-time feedback control for the vocalization of stable pitches, a real-time pitch calculation from vocal sounds is required. If the calculation consumes the processing power, it disturbs the singing performance of the talking robot. We examined several methods for pitch extraction, and employed the fastest algorithm based on the peak-to-peak extraction method. Figure 13 shows the algorithm. At first, the sound wave generated by the talking robot is obtained, and the smoothing was executed by applying the moving average. The locations of each peak on a sound wave are extracted by finding upper peaks. The cycle T is calculated by selecting a distance between nearest two peaks, and the pitch is calculated by the inverse of T. Finally, the control value of the air pump valve is suitably adjusted by comparing the pitch on the score with the extracted pitch. Figure 14 introduces the singing performance of Japanese song “Kagome Kagome”. In Figure14-a, the ordinate shows the pitch of the singing voice, and the abscissa shows the time step in the singing performance. The words presented in the bottom of the figure show the lyrics. Through the singing performance, the talking robot follows the pitches described in the score, and with the auditory feedback, the talking robot autonomously adjusted its motors for the stable singing performance. Figure 14-b shows the pitch difference, obtained by comparing the ideal pitches with the generated pitches. Without the auditory feedback, obtained pitches were not stable because of unexpected change of the air flow and tension of the artificial vocal cord. With the adaptive control, the talking robot autonomously adjusted the pitch to follow the ideal pitch of the score data. The system autonomously performs uttering and singing with the use of the vocalization skill acquired by the adaptive learning. The robot also makes mimicking performance by listening and following a human speaker and singer. The auditory system listens to a human singer and extracts pitch and phonemes from his voice in real-time. Then the pitch and phonemes are translated into motor-control values to let the robot follow the human singer.
Fig. 12 User interface for the singing performance
A Talking Robot and Its Singing Performance
71
Table 1 Score map of the Japanese song “Kagome Kagome” Word control
Time Words step
Pitch control
Vocal tract shape
Tongue
Nose
Intonation control
Scale Flow control Intonation Duration
Motor 1 Motor2 Motor 3 Motor 4 Motor 5 Motor 6 Motor 7 Motor 8 Motor 11 Motor 12 Mode
Flow
Note
1
K
0
0
0
0
0
0
0
0
-350
0
F
3
2
Time 0
2
a
-100
-134
301
-50
-167
452
-769
920
200
0
F
3
0
750
3
G
-585
401
-602
920
-636
117
-201
318
200
0
F
3
4
0
4
o
-185
117
0
184
-468
703
-753
753
200
0
F
3
0
375
5
M
-920
268
-469
302
-284
100
301
-435
200
1
G
4
4
0
6
e
117
268
-469
302
-284
100
301
-435
200
0
G
4
0
375
7
K
0
0
0
0
0
0
0
0
200
0
F
3
4
0
8
a
-100
-134
301
-50
-167
452
-769
920
200
0
F
3
0
375
9
G
-585
401
-602
920
-636
117
-201
318
200
0
F
3
4
0
10
o
-185
117
0
184
-468
703
-753
753
200
0
F
3
0
375
11
M
-920
268
-469
302
-284
100
301
-435
200
1
F
3
4
0
12
e
117
268
-469
302
-284
100
301
-435
200
0
F
3
0
375
13
NULL
0
0
0
0
0
0
0
0
200
0
NULL
9
4
375
Motor-Phoneme map
Motor-Pitch map
1
1
0.5
0.5
[Amplitude]
[Amplitude]
T 0
-0.5
0
-0.5
-1
frame
-1 0
10
20 [ms] 30
a) Sound wave of generated sound
0
10
20 [ms] 30
b) Sound wave after smoothing
Fig. 13 Method for the pitch extraction Score data
Without adjustment
With adjustment
Pitch [Hz]
180 170 160 150 140 130 120 110 100
Ka Go Me Ka GoMe
0
10
KaGo No Na Ka No To Ri Wa
20
30
I Tsu I Tsu De YaRu
Yo A Ke
40 [Time step] 50
a) Singing performance
Fig. 14 Pitch variation on singing performance
M. Kitani et al.
Pitch difference [Hz]
72 20
Without adjustment
With adjustment
10 0
-10 -20 0
10
20
30
40 [Time step] 50
b) Pitch difference in singing performance
Fig. 14 (continued)
4 Conclusions In this paper a talking and singing robot, which was constructed mechanically with human-like vocal chords and a vocal tract, was introduced. By introducing the adaptive learning and controlling of the mechanical model with the auditory feedback, the robot was able to acquire the vocalization skill as a human baby did when he grew up, and generated vocal sounds whose pitches and phonemes were uniquely specified. The voices were assessed by the experiments, and the robot speech was verified that it was clear enough to be sufficiently recognized by human. The actual motion of the mouth assisted the listener's imagination to estimate what it spoke. We are now working to develop a training system for auditory impaired people to interactively train the vocalization. The mechanical system reproduces the vocalization skills just by listening to actual voices. Such people will be able to learn and know how to move vocal organs for the clear vocalization, by observing the motions of the talking robot A mechanical construction of the human vocal system is considered not only to have advantages to produce natural vocalization rather than algorithmic synthesis methods, but also to provide simulations of human acquisition of speaking and singing skills. Further analyses of the human learning mechanisms will contribute to the realization of a speaking robot, which learns and sings like a human. The proposed approach to the understandings of the human behavior will also open a new research area to the development of a human-machine interface.
Acknowledgment This work was partly supported by the Grants-in-Aid for Scientific Research, the Japan Society for the Promotion of Science (No. 21500517) and Japan Society for the Promotion of Science Fellows (No. 2210539).
A Talking Robot and Its Singing Performance
73
References [Depalle et al.1994] Depalle, P.H., Garcia, G., Rodet, X.: A virtual castrato. In: Int. Comp. Music Conference, pp. 357–360 (1994) [Flanagan 1972] Flanagan, J.L.: Speech analysis synthesis and perception. Springer, Heidelberg (2001) [Fukui et al. 2007] Fukui, K., Shintaku, E., Honda, M., Takanishi, A.: Mechanical vocal cord for anthropomorphic talking robot based on human biomechanical structure. The Jap. Soc. of Mech. Eng. 73(734), 112–118 (2007) [Hirose 1992] Hirose, K.: Current trends and future prospects of speech synthesis. J. of the Acoustical Society of Japan, pp. 39–45 (1992) [Kohonen 1995] Kohonen, T.: Self-organizing maps. Springer, Berlin (1995) [Miura et al. 2007] Miura, K., Asada, M., Yoshikawa, Y.: Unconscious anchoring in maternal imitation that helps finding the correspondence of caregiver’s vowel categories. Advanced Robotics 21(13), 1583–1600 (2007) [Nakamura and Sawada 2006] Nakamura, M., Sawada, H.: Talking robot and the analysis of autonomous voice acquisition. In: Proceedings of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 4684–4689 (2006) [Sawada 2007] Sawada, H.: Talking robot and the autonomous acquisition of vocalization and singing skill. In: Grimm, K. (ed.) Chapter in Robust Speech Recognition and Understanding, vol. 22, pp. 385–404 (2007), ISBN: 978-3-902613-08-0 [Sawada and Nakamura 2005] Sawada, H., Nakamura, M.: A talking robot and its singing skill acquisition. In: International Conference on Knowledge-Based Intelligent Information and Engineering Systems, pp. 898–907 (2005) [Smith III 1991] Smith III, J.O.: Viewpoints on the history of digital synthesis. In: Int. Comp. Music Conference, pp. 1–10 (1991)
An Orthopedic Surgical Robotic System-OrthoRoby D. Erol Barkana Department of Electrical and Electronics Engineering, Yeditepe University, Istanbul, Turkey
[email protected],
[email protected]
Abstract. Recent research in orthopedic surgeries indicates that computer-assisted robotic systems have shown that robots can improve the precision and accuracy of the surgery which in turn leads to better long-term outcomes. Increasing demand on minimal invasive bone cutting operation has been encouraging surgical robot developments in orthopedics. In this work, an orthopedic robotic system called OrthoRoby and an intelligent control architecture that will be used in bone cutting operations are developed. Experiments are performed to demonstrate the performance of the intelligent control architecture.
1 Introduction Orthopedic surgery is one of the most common operations in hospitals. Most of the bone related orthopedic surgeries are performed to straighten bone deformities, to extend bone length, and to remove bone regions inflicted on by tumors and infections. Current manual surgical techniques often result in inaccurate placing and balancing of hip replacements, knee components, or soft-tissues. In recent years, computer-assisted robotic systems are developed for orthopedic surgeries, which improve the precision and accuracy of the surgery. Some of the orthopedic surgery robotic systems use serial manipulators and some of them use parallel manipulators. Robodoc [Schulz et al. 2007], Caspar, and Acrobot [Jakopec et al. 2003] are well known orthopedic surgical robots that belong to the serial manipulators with large workspace which are somewhat heavy and suffer from low stiffness and accuracy, and possess low nominal load/weight ratio. Parallel manipulators are preferred for orthopedic surgeries because they provide advantages in medical robotics such as small accumulated positioning errors and high stiffness. Parallel manipulators are closed kinematic structures that hold requisite rigidity to yield a high payload to self-weight ratio. MARS is one of the well known patient-mounted parallel robot [Pechlivanis et al. 2009]. Similar to the MARS miniature orthopedic robot, MBARS [Wolf et al. 2005] robot employs a parallel platform architecture, which has been used for machining the femur to allow a patella implant to be positioned [Wolf et al. 2005]. Orthdoc Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 75–90. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
76
D. Erol Barkana
[Kwon et al. 2002] uses parallel manipulators for orthopedic surgery. Hybrid bone attached robot (HyBAR) has also been developed with a parallel and serial hybrid kinematic configuration for joint arthroplasty [Song et al. 2009]. A parallel robot has been developed with automatic bone drilling carriage [Tsai and Hsu 2007]. On the other hand, Praxiteles is another patient-mounted surgical robot which comprised of 2 motorized degrees of freedom (DoF) whose axes of rotation are arranged in parallel, and are precisely aligned to the implant cutting planes with a 2 DoF adjustment mechanism [Plaskos et al. 2005]. In this work, an orthopedic surgery robot called OrthoRoby, which consists of a parallel robot and a cutting tool, has been developed [Erol Barkana 2010a; Erol Barkana 2010b]. An intelligent control architecture is developed for OrthoRoby to track a desired bone cutting trajectory in a desired and safe manner. The intelligent control architecture is responsible, i) to supervise the OrthoRoby to produce necessary coordinated motion to complete bone cutting operation, and ii) to monitor the progress and safety of the cutting operation such that necessary dynamic modifications can be made (if needed) to complete bone cutting operation in a safe manner. This paper introduces the intelligent control architecture that is developed for OrthoRoby in Section 2. In Section 3, experimental setup and experiment results are presented to demonstrate the feasibility of the proposed intelligent control architecture. Potential contributions of this work are given in Section 4.
2 Intelligent Control Architecture The control architecture that is developed for orthopedic surgical robotic system OrthoRoby is shown in Fig.1. The intelligent control architecture is used to track a desired bone cutting trajectory in a desired and safe manner. The intelligent control architecture consists of a parallel robot and a cutting tool (OrthoRoby), a Cartesian system, a medical user interface, a camera system, a high-level controller and a low-level controller.
τ
+
l d , ld
−
Fig. 1 Intelligent control architecture of OrthoRoby for orthopedic surgery
l a , la
An Orthopedic Surgical Robotic System-OrthoRoby
77
2.1 OrthoRoby and Cartesian System OrthoRoby consists of a parallel robotic system and a cutting tool. Additionally, a Cartesian system that holds the parallel robotic system is integrated to OrthoRoby system. Parallel robotic system is developed considering the well known parallel manipulator Stewart platform. OrthoRoby parallel robot consists of two circular plates connected by six linear actuators (Fig. 2A). The actuators have encoders attached to them to determine the position of the parallel robot. In this work, 6 spherical joints are placed at 600 along the circumference of the base platform and 6 spherical joints are placed with 750 and 450 spacing at the moving plate (Fig. 2A). A cutting tool is placed in the middle of the moving platform of the parallel robot. The mechanical system of parallel robot is given in Fig. 2B.
Fig. 2A Parallel robot
Fig. 2B Mechanical system of parallel robot
Vector loop closure equations are used to derive the inverse and forward kinematics equations of the parallel robot. The details of the kinematics of the parallel robot are given in [Erol Barkana 2010a; Erol Barkana 2010b]. Additionally, Kane’s method is selected for its relative ease of computerization and its computational efficiency to derive the dynamics equations of the parallel robot [Erol Barkana 2010a; Erol Barkana 2010b]. The details of the dynamics of the parallel robot are given in [Erol Barkana 2010a; Erol Barkana 2010b]. The parallel robot is controlled via a 3.2GHz Pentium 4 PC with 2GB of RAM. The hardware is controlled through the MatLab Real Time Workshop Toolbox from Mathworks, and WinCon from Quanser Consulting. All data I/O is handled by the Quanser Q8 board. The joint angles of the parallel robot are acquired using encoders of actuators with a sampling time of 0.001 seconds from a Quanser Q8 card. The torque output to the parallel robot is given with the same card with the same sampling time. A control card is developed to drive DC motors (actuators) of parallel robot. The board includes 6 set of pulse width
78
D. Erol Barkana
modulation (PWM) generator, H-bridge amplifier and decision-making circuit. For each set there are three inputs, i) one for speed, ii) one for direction, and iii) one for enabling the H-Bridge. Direction input assigns direction of the linear actuator and enable input activates H-bridge IC (L6205N). These are digital inputs and they are connected to the digital input/output of Quanser Q8 board. Third input assigns velocity of the actuator, which is an analog input. PWM generator IC (K3525) is responsible to control the velocity of actuator and its frequency is adjusted to 40 kHz. The signal from the PWM circuit connected to H-bridge to drive one motor with high current (7Amper(A)). Logic gates (74H08, 74H04) are used for decision-making module to control the direction of the actuators. Position feedback of the actuators is received from internal encoders of actuators, which is transmitted to the Quanser Q8 board via this control card. A power supply is used to provide 5Volt(V) and 12V to the control card. A Cartesian system has been integrated into OrthoRoby to hold the parallel robot and to place the cutting tool of OrthoRoby close to the bone that will be cut. A touch screen control unit has been added to the Cartesian system in order for the surgeon to use the system easily. Programmable Logic Controller (PLC) is used to control the touch screen panel. Since this Cartesian system will be used during the orthopedic surgery, all inputs and outputs of the Cartesian system have been secured with fuses. 2.2 Camera System Two Logitech C600 HD webcam with fixed focus cameras, which are labeled as C1 and C2, are added to the intelligent control architecture to measure the depth of cutting and to detect if cutting tool of parallel robot is close enough to the bone (Fig. 3). L represents the distance between the markers seen from the positions of C1 and C2. α represents the slope between the two markers seen from the positions of C1 and C2. h represents the vertical distance between the markers seen from the positions of C1 and C2 (h = L sin α , h = L sin α ). Virtual dimensions L1,2 and α1,2 are measured. h1,2, a and b are calculated using a= L cos α , b= L cosα , h=(h +h )/2 equations. a, b and h values are used to calculate the 3-dimensional (3D) distance between two markers (L) in pixel unit using the L = a 2 + b 2 + h 2 equation. L is known to be 8.1cm, thus it is possible to find the cm equivalent of each pixel in images. If pixel width is known in cm unit, it is possible to measure the distance of the cutting tool movement inside the bone. 1,2
12
1,2
1
1
1
2
2
2
1
1
2
1
2
2
An Orthopedic Surgical Robotic System-OrthoRoby
79
Fig. 3 Camera system interface
2.3 Medical User Interface A medical user interface (MUI) is developed and integrated into the intelligent control architecture. MUI is used by the surgeon to import images of the patients’ bone as inputs and to provide a bone cutting trajectory as output. The necessity of obtaining accurate results for posterior validation with experimental values implied an adequate modeling of the bone structure in terms of 3D modeling. The initial step concerning the bone anthropometrical definition is a ComputerTomography (CT) scan of the femur region of patients in a Philips® Brilliance CT equipment. The geometric models are obtained from 3D reconstruction of CT images of the patients, which are taken from Yeditepe University Hospital. The CT images are taken with intervals of 1 mm in the neutral position. These images are transferred to the medical user interface. Surgeon decides bone cutting trajectory after processing patient’s CT images using the functions of these user interface. The decision is given to the OrthoRoby via a high-level controller that will be discussed in the next section. 2.4 Controllers Low-level controllers and a high-level controller are used in the intelligent control architecture. 2.4.1 Low-Level Controllers Computed-torque control is used for the low-level controller of parallel robot of OrthoRoby. Activation/deactivation mechanism is used for both the low-level controllers of the cutting tool and the Cartesian system. The details of the low-level controllers are given in [Erol Barkana 2010a; Erol Barkana 2010b]. 2.4.2 High-Level Controller The high-level controller is the decision making module which makes intermittent decisions in a discrete manner. A hybrid system modelling technique is used to
80
D. Erol Barkana
design the high-level controller. A set of hypersurfaces that separate different discrete states are defined for the high-level controller. The hypersurfaces are not unique and are decided considering the capabilities of the OrthoRoby, Cartesian system, camera system and medical user interface (Table 1). Table 1 Hypersurfaces h1 = ( sb == 1)
h2 = x − xt < ∈
h3 = ( xct ≥ xdb −∈ct )) ∧ (cto==1)
h4 = x − x i < ∈
1
ll < l < lu ½ h5 = ® ¾ = δ − δ limit ¯τ ctrl ≥ τ rth ¿
h6 = (eb == 1) h7 = ( pb == 1)
h8 = ( pb == 0) ∧ (eb == 0)
OrthoRoby’s cutting tool is positioned close to the bone using the Cartesian system, the camera system starts monitoring, and start button of the overall system is active. x and xt are OrthoRoby’s cutting tool position and the bone’s position, respectively. İ is a value used to determine if the OrthoRoby’s cutting tool is close enough to the bone. cto (cutting tool on) is a binary value, which will be 1 when it is pressed and 0 when it is released. xct and xdb, which are calculated using the camera system, are the cutting tool depth and the depth in the bone, respectively. İct is a value used to determine if the cutting tool is close enough to the desired cutting depth. x and xi are the parallel robot position and the initial position of the operation, respectively. İ1 is a value used to determine if the parallel robot is close enough to the initial position. ll and lu represent the set of lower and upper limits of the legs of the parallel robot, respectively. l is the set of actual leg lengths of the parallel robot. IJctrl and IJrth are the torque applied to the actuators of the parallel robot and the threshold value, respectively. į and įlimit are the actual values of the parallel robot configuration in vector form and limit values of the parallel robot configurations in vector form, respectively. Emergency button (eb) is 1 when it is pressed by the surgeon. Pause button (pb) is pressed when the surgeon wants to pause the cutting operation for a while. Surgeon can release both pb and eb to continue cutting operation.
Each region in the state space of the plant, bounded by the hypersurfaces, is associated with a state of the plant. A plant event occurs when a hypersurface is crossed. A plant event generates a plant symbol to be used by the high-level controller. The next discrete state is activated based on the current state and the associated plant symbol (Table 2). In order to notify the low-level controllers the next course of action in the new discrete state, the high-level controller generates a set of symbols, called control symbols. In this application, the purpose of the highlevel controller is to activate/deactivate the Cartesian system, parallel robotic device and the cutting tool device in a coordinated manner so that the bone cutting operation does not enter critical regions of the state space to ensure safety. The control states are given in Table 3. The transition function uses the current control state and a plant symbol to determine the next control action. The high-level controller generates a control symbol which is unique for each state (Table 4). The
An Orthopedic Surgical Robotic System-OrthoRoby
81
low-level controller cannot interpret the control symbols directly. Thus the interface converts the control symbols into continuous outputs, which are called plant inputs. The high-level controller and low-level controllers coordination is shown in Fig. 4. Table 2 Plant Symbol ~ x1 ~ x2 ~ x
3
~ x4 ~ x5 ~ x 6
~ x 61 ~ x 62 ~ x
63
The cutting tool of OrthoRoby approaches towards the bone using the Cartesian system (when h1 is crossed). The cutting tool of OrthoRoby reaches the bone (when h2 is crossed).
The cutting tool reaches the desired cutting depth (when h3 is crossed). OrthoRoby goes back to starting position (when h4 is crossed), safety related issues happened such as the parallel robot leg lengths are out of limits, or the parallel robot applied force is above its threshold (when h5 is crossed), or emergency button is pressed (when h6 is crossed) The surgeon presses the pause button (when h7 is crossed). The surgeon releases the pause button (when h8 is crossed). If the surgeon presses pause button when the parallel robot is approaching towards the bone. If the surgeon presses pause button when the bone cutting tool is on. If the surgeon presses pause button when parallel robot is returning back to initial position.
Table 3 Control States
~ s1 f
~ s1b ~ s2 ~ s 3
~ s4 m ~ s5 m
The parallel robot device alone is active to move towards the cutting region on the bone. The parallel robot device alone is active to move towards the initial position. Both the parallel robot device and the cutting tool device are active. Both the parallel robot device and the cutting tool device are idle. Memory state after surgeon says “stop”. Continue state when the surgeon wants to continue with the operation while ~ s4 m (where m=1,2) is active.
Table 4 Control Symbols ~ r1 f ~ r
1b
~ r2 ~ r
3
Drive parallel robot device to move towards the cutting region on the bone. Drive parallel robot device to move back to the initial position. Drive cutting tool device to cut the bone. Make both parallel robot and cutting tool devices idle.
82
D. Erol Barkana
Fig. 4 Coordination of low-level controllers and high-level controller
3 Experiments 3.1 Experimental Setup The experimental setup includes parallel robot with cutting tool (OrthoRoby), a Cartesian system, a camera system and a bone (Fig. 5). A bone cutting path is constructed in consultation with an orthopedist in Istanbul University Cerrahpaşa Medical Faculty. Surgeon defines a bone cutting trajectory, which is common in a surgical operation, using medical user interface. This interface uses The Digital Imaging and Communications in Medicine (DICOM) images of the patient’s bone as inputs to form patient-specific bone structure in 3D. Note that it is possible for the surgeon to plan bone cutting trajectories based on patient’s bone deformities and bone structure using the medical user interface. The cutting trajectory that is defined by the surgeon initially requires OrthoRoby to be positioned next to the bone cutting region using Cartesian system (Fig. 6). Later, parallel robot becomes active from A to B to move towards the bone ( ~s1 f active) (Fig. 6). Then, the cutting tool device becomes active to complete the cutting operation on the bone from point B to point C ( ~s 2 state active). When the bone has been cut (it is understood by using the images of markers from the camera), then parallel robot moves back to the initial position from point C to point D ( ~s1b active). Finally both parallel robot and cutting tool are at idle position from D to E ( ~s3 active) (Fig. 6). The position and euler angles of the OrthoRoby throughout the bone cutting operation are given in Fig. 7 and in Fig. 8, respectively.
An Orthopedic Surgical Robotic System-OrthoRoby
83
Fig. 5 Experimental setup 9 E
8 7
State
6 5 B
4
C
3 D
2 A
1 0
0
50
100
150
200 Time (s)
Fig. 6 Control states (1=
~ s1 f
250
, 2=
~ s1b
300
, 4=
350
~ s3 s2 , 8= ~ )
84
D. Erol Barkana -3
x 10 X (m)
10 B
5
C
A
0
D E
0
50
100
150
200 Time (s)
250
300
350
300
350
300
350
-3
x 10 Y (m)
10 B
5
C
A
D E
0 0
50
100
150
200 Time (s)
250
Z (m)
0.55 B
C
A
D E
0.5 0
50
100
150
200 Time (s)
250
Alpha (rad)
Fig. 7 Desired position of OrthoRoby 0.01 A
0 -0.01
0
B
50
100
150
C
200 Time (s)
250
DE
300
350
Gamma (rad)
Beta (rad)
0.01 A
0 -0.01
0
B
50
100
150
C
200 Time (s)
250
DE
300
350
0.01 A
0 -0.01
0
B
50
100
150
C
200 Time (s)
250
DE
300
350
Fig. 8 Desired euler angles of OrthoRoby
3.2 Experiments In the first experiment the cutting operation had been executed as given in Fig. 6. The corresponding actual trajectories of the parallel robot are shown in Fig.9 (dashed-lines). The errors in legs of the parallel robot are shown in Fig. 10. It could be noticed that the maximum error was 9.4x10-4(m), minimum error was 2.2x10-10(m), mean of error was 1.1x10-6(m) and standard deviation of the error was 4.9x10-4(m) in all legs of the parallel robot.
Leg 4 (m)
Leg 1 (m)
An Orthopedic Surgical Robotic System-OrthoRoby
0.5
0.45
0
100
200 Time (s)
0.45 100
200 Time (s)
100
200 Time (s)
300
0
100
200 Time (s)
300
0
100
200 Time (s)
300
0.45
0.5
0.45
300
0
0.5
300
0.5
0
0.5
0.45
300
Leg 5 (m)
200 Time (s)
0.5
0.45
Leg 3 (m)
100
Leg 6 (m)
Leg 2 (m)
0
85
Actual Length
Desired Length
Fig. 9 Desired and actual leg length changes (Experiment 1) -3
1 0 -1 -2 0
100
200 Time (s)
300
Leg 4 Error (m)
Leg 1 Error (m)
-3
x 10
x 10 1 0 -1 -2 0
1 0 -1 -2
0
100
200 Time (s)
300
300
100
200 Time (s)
300
100
200 Time (s)
300
1 0 -1 -2 0 -3
x 10 1 0 -1 -2
100
200 Time (s)
300
Leg 6 Error (m)
Leg 3 Error (m)
200 Time (s)
x 10
-3
0
100 -3
Leg 5 Error (m)
Leg 2 Error (m)
-3
x 10
x 10 1 0 -1 -2 -3 0
Fig. 10 Error in leg length changes (Experiment 1)
In the second experiment, we demonstrated the ability of the intelligent control architecture to dynamically modify the desired bone cutting trajectory based on an event that might happen during the execution of the bone cutting operation in a surgery. In this case, the parallel robot started the execution of the cutting operation as before with the same desired cutting trajectory as given in Experiment 1. During the execution of the operation at time t’, the surgeon wanted to pause cutting operation. This could happen when surgeon did not feel comfortable about the planned trajectory. In this case, the desired trajectory that was originally given to the low-level controllers could be modified considering the surgeon’s intention to pause the operation. Additionally, it was desirable to resume the operation where it was left when the surgeon decided to continue the operation (at time tt’) later. The control state changes are given in Fig. 11. During the second experiment, the surgeon pressed the pause button when the parallel robot was moving towards the bone. The parallel robot device remained active till the surgeon pressed the pause
86
D. Erol Barkana
button at time t’ (Fig.12-dashed trajectory). As the surgeon pressed the pause button at time t’, the plant symbol ~x5 was generated and ~s 41 state became active (Fig. 11). When ~s 41 state became active both the parallel robot device and the cutting tool device become idle (from a’ to b’ in Fig. 11). When the surgeon released the pause button at time tt’ to continue the operation execution, ~x 61 was generated and ~ s51 became active again to activate the parallel robot device (from point b’ to B in
Fig. 11). The rest of the desired trajectory had been generated in the same way as it was described in the previous experiment (Fig.12-solid lines). It could be noticed from Fig.12-solid lines that at time of t’, both the parallel robot and cutting tool remained in their previous set points. Additionally, the parallel robot’s position at time tt’, was automatically detected and taken as an initial position to continue the operation where it was resumed with zero initial velocity (Fig.12-dashed lines). Euler angle changes of parallel robot are given in Fig. 13. 9 a'
8
b'
E
7
State
6 5 B
4
C
3 D
2 A 1 0
0
t' 100 tt'
200
300 Time (s)
400
500
Fig. 11 Control states (1= (~ s2 , 8= (~ s1 f , ~ s51) , 2= ~ s1b , 4= ~ s3 , ~ s41) ) (Experiment 2) -3
X (m)
x 10 10
C
5
a'
A
b'
B
t' 100 tt'
200
D
E
0 0
300 Time (s)
400
500
400
500
400
500
-3
x 10 Y (m)
10 C
5 A
0 0
a'
b'
t' 100 tt'
B 200
D 300 Time (s)
E
Z (m)
0.55 C A
a'
b'
B
D
E
0.5 0
t' 100
tt'
200
300 Time (s)
Modified Desired Position
Desired Position
Fig. 12 Position of OrthoRoby (Experiment 2)
Alpha (rad)
An Orthopedic Surgical Robotic System-OrthoRoby
87
0.01 A
0 -0.01
a'
0
b'
100
B
200
C
300 Time (s)
DE
400
500
Gamma (rad)
Beta (rad)
0.01 A
0 -0.01
a'
0
b'
100
B
200
C
300 Time (s)
DE
400
500
0.01 A
0 -0.01
0
a'
b'
100
B
200
C
300 Time (s)
DE
400
500
Fig. 13 Euler angles of OrthoRoby (Experiment 2)
Leg 4 (m)
Leg 1 (m)
If the high-level controller does not modify the desired trajectories to register the intention of the surgeon to pause the operation, then the parallel robot will start moving at point tt’, with a different starting position and a non-zero velocity (Fig.12-solid lines), which can create unsafe operating conditions. Note that if the intelligent control architecture does not modify the bone cutting trajectory, it can cause the cutting tool to start cutting operation at undesirable times. The corresponding actual trajectories of the parallel robot are shown in Fig.14 (dashedlines). The errors in legs of the parallel robot are shown in Fig. 15. It could be noticed that the maximum error was 1x10-3(m), minimum error was 0 (m), mean of error was 6.4x10-5(m) and standard deviation of the error was 4.3x10-4(m) in all legs of the parallel robot.
0.5
0.45
0.45
Leg 3 (m)
0.45
Leg 5 (m)
0.5
0.45 0 t' tt' 200 400 Time (s)
Actual Length
0
t' tt' 200 400 Time (s)
0
400 t' tt' 200 Time (s)
0
t' tt' 200 400 Time (s)
0.5
0.45
0 t' tt' 200 400 Time (s) Leg 6 (m)
Leg 2 (m)
0 t' tt' 200 400 Time (s) 0.5
0.5
0.5
0.45
Desired Length
Fig. 14 Desired and actual leg length changes (Experiment 2)
0 -2 0
t' tt' 200 400 Time (s) -3
x 10 1 0 -1 -2 0
t' tt' 200 400 Time (s) -3
x 10 2 0 -2
0 t' tt' 200 400 Time (s)
Leg 5 Error (m)
x 10
Leg 6 Error (m)
Leg 2 Error (m) Leg 3 Error (m)
-3
2
Leg 4 Error (m)
D. Erol Barkana
Leg 1 Error (m)
88
-3
2
x 10
0 -2 0
t' tt' 200 400 Time (s) -3
2
x 10
0 -2 0
t' tt' 200 400 Time (s) -3
x 10 1 0 -1 -2 0
400 t' tt' 200 Time (s)
Fig. 15 Error in leg length changes (Experiment 2)
4 Discussion and Conclusions In this study, an orthopedic robot called OrthoRoby, which is planned to be used in bone-cutting operations, have been developed and implemented. OrthoRoby consists of a parallel robot and a cutting tool. A Cartesian system has been integrated into OrthoRoby to hold the parallel robot and to place the cutting tool of OrthoRoby close to the bone that will be cut. An intelligent control architecture has been developed that systematically combines a high-level controller with lowlevel controllers of the OrthoRoby to enable bone-cutting operations in a safe and desired manner. Intelligent control architecture has been integrated with a medical user interface to generate desired bone cutting trajectories based on patient’s 3D bone model. Medical user interface is used by the surgeon to define the required cutting trajectories for each surgery. It can be observed that patient-specific preoperation planning through creation of 3D virtual environments using medical user interface is possible. It is noticed that surgeon can easily define and modify the cutting trajectory using this interface. The intelligent control mechanism developed in this paper can supervise the parallel robot, the cutting tool and the Cartesian system to produce necessary coordinated motion to complete a given cutting bone operation in a desired manner. To our knowledge, such an intelligent control architecture has not been explored for orthopedic surgical robotic systems. Additionally, the proposed intelligent control architecture can monitor the progress and the safety of the cutting operation such that necessary dynamic modifications of the operation can be made (if needed) to complete the given operation in a safe manner. Medical robots are safety-critical systems, and safety should be considered from the very beginning of the design process and during the surgical operation. For bone cutting surgical robots that utilize structural rigidity and an automated decision mechanism are major issues. The accuracy tests of the
An Orthopedic Surgical Robotic System-OrthoRoby
89
OrthoRoby and its intelligent control architecture yielded promising results about the structural rigidity by its design and automated decision by its intelligent control mechanism. As a future work, tests of the OrthoRoby and its intelligent control architecture on cadaver studies will be performed.
Acknowledgment I gratefully acknowledge the help of Dr. Muharrem Inan who is an orthopedist in Orthopedics and Traumatics Department in Istanbul University Cerrahpaşa Medical Faculty. The work is supported by TUBITAK, The Support Programme for Scientific and Technological Research Projects (1001)108E092 grant.
References [Erol Barkana 2010a] Erol Barkana, D.: Design and implementation of a control architecture for a robot-assisted orthopedic surgery. The Int. J. of Med. Robot and Comp. Assist. Surgery 6(1), 42–56 (2010) [Erol Barkana 2010b] Erol Barkana, D.: Evaluation of low-level controllers for an orthopedic surgery robotic system. IEEE Trans. on Information Technology in Biomedicine 14(4), 1128–1135 (2010) [Jakopec et al. 2003] Jakopec, M., Baena, F.R., Harris, S.J., Gomes, P., Cobb, J., Davies, B.L., et al.: The hands-on orthopaedic robot Acrobot: early clinical trials of total knee replacement surgery. IEEE Trans. on Robotics and Automation 19(5), 902–911 (2003) [Kwon et al. 2002] Kwon, D.S., Lee, J.J., Yoon, Y.S., Ko, S.Y., Kim, J., Chung, J.H., Won, C.H., Kim, J.H.: The mechanism and the registration method of a surgical robot for hip arthroplasty. In: Proc. of the IEEE International Conference of Robotics and Automation, pp. 1889–1894 (2002) [Pechlivanis et al. 2009] Pechlivanis, I., Kiriyanthan, G., Engelhardt, M., Scholz, M., Lucke, S., Harders, A., Schmieder, K.: Percutaneous Placement of pedicle screws in the lumbar spine using a bone mounted miniature robotic system, first experiences and accuracy of screw placement. Spine J. 34(4), 392–398 (2009) [Plaskos et al. 2005] Plaskos, C., Cinquin, P., Lavallée, S., Hodgson, A.J.: Praxiteles: miniature bone-mounted robot for minimal access total knee arthroplasty. The Int. J. of Med. Robot. and Comp. Assist. Surgery 1(4), 67–79 (2005) [Schulz et al. 2007] Schulz, A.P., Klaus, S., Queitsch, C., Haugwitz, A.V., Meiners, J., Kienast, B., Tarabolsi, M., Kammal, M., Jürgens, C., et al.: Results of total hip replacement using the Robodoc surgical assistant system: clinical outcome and evaluation of complications for 97 procedures. The Int. J. of Med. Robot and Comp. Assist. Surgery 3(4), 301–306 (2007)
90
D. Erol Barkana
[Song et al. 2009] Song, S., Mor, A., Jaramaz, B.: HyBAR: hybrid bone-attached robot for joint arthroplasty. The Int. J. of Med. Robot and Comp. Assist. Surgery 5(2), 223–231 (2009) [Tsai and Hsu 2007] Tsai, T.-C., Hsu, Y.-L.: Development of a parallel surgical robot with automatic bone drilling carriage for stereotactic neurosurgery. Biomedical Engineering, Applications, Basis and Communications 19(4), 269–277 (2007) [Wolf et al. 2005] Wolf, A., Jaramaz, B., Lisien, B., DiGioia, A.M.: MBARS: mini boneattached robotic system for joint arthroplasty. The Int. J. of Med. Robot and Comp. Assist. Surgery 1(2), 101–121 (2005)
Methods for Reducing Operational Forces in Force-Sensorless Bilateral Control with Thrust Wires for Two-Degree-of-Freedom Remote Robots T. Sato, S. Sakaino, and T. Yakoh Keio University, Department of System Design Engineering, 3-14-1 Hiyoshi, Kouhoku-ku, Yokohama-shi, Kanagawa-ken 223-8522, Japan
[email protected],
[email protected],
[email protected]
Abstract. In this study, a bilateral control system for two-degree-of-freedom (twoDOF) remote robots that are capable of grasping and manipulating motion is considered. The purpose of this research is to achieve force-sensor-less bilateral control with thrust wires for two-DOF systems with small operational forces. Small operational forces in remote robot systems are suitable for several applications. In conventional research, methods for reducing the operational forces in one-DOF systems with thrust wires were proposed. In this study, this method is applied to a two-DOF system. Furthermore, a method for further reduction of operational forces is proposed in which force transforms of both local and modal space are implemented. By considering modal space in the two-DOF system, the operational forces can be reduced further. The validity of the proposed method is confirmed using experiments.
1 Introduction Bilateral control has been recently studied for advanced remote control of robots. Many control systems for remote robots use visual and position information. However, to perform precise remote works, additional use of force information is required. Thus, to accomplish position and force transmission among remote robots, a bilateral control has been considered [Iida and Ohnishi 2004]. A bilateral control system comprises master robots and slave robots. Human operators manipulate master robots, and then slave robots establish contact with the environment. The slave robots track to the positions of the master robots, and then the operators feel the reaction forces of the environments. Moreover, some researchers have considered force-sensor-less bilateral control [Iida and Ohnishi 2004]. In this method, a reaction force observer (RFOB) is used instead of force sensors [Murakami et al. 1993]. By using position encoders and modeling of the robots, the RFOB estimates the reaction forces. The estimated reaction forces are used for Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 91–107. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
92
T. Sato, S. Sakaino, and T. Yakoh
bilateral control. Force-sensor-less bilateral control based on the RFOBs can achieve low-cost bilateral applications without force sensors. Because an RFOB based on position encoders has wide frequency as sensors, the control performance can be improved. Furthermore, Tsuji et al. proposed a flexible actuator [Tsuji et al. 2006] for useful bilateral applications. It comprises an actuator and a thrust wire (Fig. 1.1). The ends of the thrust wire are connected to the actuator and the end-effector, respectively. Because the thrust wire comprises an inner wire and an outer tube, the location of the end-effector can be varied flexibly. In addition, the thrust wire can accomplish pushing and pulling motions. Thus, by using the flexible actuator, the actuator location can be set arbitrarily, and the position and force transmission can be carried out. Hyodo et al. improved and experimentally validated thrust wires [Hyodo et al. 2009]. In addition, flexible actuators were used in a haptic endoscopic surgery robot [Tanaka et al. 2009]. Some applications require flexible locations of the end-effectors. Therefore, force-sensor-less bilateral control based on the RFOBs with thrust wires has high utility and is required. However, a system with thrust wires has a problem. The RFOB requires modeling of robot systems [Murakami et al. 1993]. Moreover, modeling of thrust wires is very difficult because they have complex nonlinear friction. In addition, if postures of thrust wires are changed, their characteristics change as well. Hence, reaction forces estimated using RFOBs are not exact when thrust wires are used because there are modeling errors of thrust wires. As a result, human operators feel large operational forces due to frictional forces of thrust wires in free motion which is not suitable for remote operation. In this study, free motion is defined as the motion that slave robots do not have any contact with the environment. Forces felt by human operators in free motion are defined as operational forces. In addition, contact motion is defined as the motion that slave robots are in contact with the environment. For useful low-cost bilateral applications, decoupling of estimated forces without precise modeling and force sensors is required. In recent years, some methods for force (torque) estimation without precise modeling have been proposed. Smith et al. proposed neural-network-based contact force observers for haptic applications [Smith and Hashtrudi-Zaad 2006]. This method involves learning with much time and force sensors: and its availability to thrust wires is unknown. Oh et al. proposed intelligent force-sensor-less power-assisted control based on an external force signal space [Oh and Hori 2008], which by using an external force signal space allows the gravitational force and human forces to be decoupled. Moreover, methods for reducing operational forces in force-sensor-less bilateral control with thrust wires for one-degree-of-freedom (one-DOF) remote robots were proposed [Sato et al. 2010]. To decouple the reaction force and disturbances on the slave side, a method using a force transform was presented. Additionally, a three-channel (3 ch) bilateral controller with a force transform was proposed and its validity was experimentally verified. This method has simple procedures and does not require precise modeling and force sensors.
Methods for Reducing Operational Forces
93
In this paper, a system of two-DOF remote robots capable of grasping and manipulating motion is considered. For more number of useful applications, multiDOF systems must be considered. In this research, the method of [Sato et al. 2010] is applied, and then a method for further reducing operational forces is proposed. In this system, the modes of grasping and manipulating motion can be considered [Kubo and Ohnishi 2006]. As a result, force transforms of both local and modal space can be implemented, yielding further reduction of operational forces. The validity of the proposed method is confirmed by experiments. This paper is organized as follows: In Chapters 2 and 3, bilateral control with a one-DOF system and a two-DOF system, respectively, are explained. In Chapter 4, the experimental results are described. Finally, conclusions are presented in Chapter 5.
Fig. 1 Example of flexible actuator
2 Bilateral Control with One-DOF System In this chapter, a four-channel (4 ch) bilateral controller [Iida and Ohnishi 2004] is explained, and then a 3 ch bilateral controller with a force transform on the slave side [Sato et al. 2010] is explained. A one-DOF bilateral system is considered. 2.1 4 ch Bilateral Controller A bilateral system comprises master robots and slave robots. Human operators manipulate the master robots, and the slave robots contact the environment. The operators can feel the softness of the environment through the system. The purposes of bilateral control are as follows: • Human operations with the master robots are reproduced with the slave robots. (The slave robot positions track to the master robot positions.) • Operators feel the reaction forces from the environment. (The law of action and reaction is achieved.) Hence, the control system is designed to accomplish these purposes. The purposes are expressed as (1) and (2).
94
T. Sato, S. Sakaino, and T. Yakoh
xm − xs = xd = 0
(1)
fm + f s = fc = 0
(2)
Here, x and f denote the position and force, respectively, of a robot. Subscripts m and s denote variables of the master and the slave, respectively. xd and fc denote the difference of positions and the sum of the forces, respectively. To treat xd and fc using the same dimensions, (1) is differentiated twice and (2) is divided by the masses of the robots. In this research, the masses of the robots are assumed the same. As a result, (3) and (4) are derived.
xm − xs = xd
xm + xs = xc =
(3)
fm fs f + = c M M M
(4)
Here, M denotes the robot’s mass. In addition, (3) and (4) are rewritten as (5). xc 1 1 xm x = 1 − 1 x s d
(5)
To achieve (1) and (2), the acceleration references are calculated using position and force controllers as in (6) and (7).
xcref = −2
1 C f ( s )( f mres + f sres ) Mn
xdref = −2C p ( s)( xmres − xsres )
(6) (7)
The superscripts ref and res denote a reference value and a response value, respectively. Cp(s) and Cf(s) represent a position controller and a force controller, respectively. Mn denotes the nominal mass of the robot. In this study, a PD controller and a P controller are used as Cp(s) and Cf(s), respectively. From the above equations, the acceleration references of the master and the slave are derived as the follows. This is called a 4 ch bilateral controller. xmref = C p (s)( xsres − xmres ) −
1 C f (s)( f sres + f mres ) Mn
(8)
xsref = C p (s)(xmres − xsres ) −
1 C f (s)( f mres + f sres ) Mn
(9)
Fig. 2 shows a block diagram of the 4 ch bilateral controller. The position responses are measured using position encoders. In this research, a disturbance observer (DOB) [Ohnishi et al. 1996] is used for robust acceleration control and an RFOB [Murakami et al. 1993] is used to estimate reaction forces. Force values estimated by the RFOB are used as force response values.
Methods for Reducing Operational Forces
95
Fig. 2 Block diagram of 4 ch bilateral controller
2.2 3 ch Bilateral Controller with a Force Transform on the Slave Side Here, a 3 ch bilateral controller with a force transform on the slave side [Sato et al. 2010] is explained. The controller is described by (10) and (11). The term Cp(s)(xsres-xmres) is removed from (8) to reduce the operational forces. xmref = −
1 C f (s)( f sres + f mres ) Mn
xsref = C p (s)( xmres − xsres ) −
(10)
1 C f (s)( f mres + f sres ) Mn
(11)
To decouple the environmental reaction force and disturbances of the thrust wire on the slave side, the force transform was introduced (Fig. 3). fres denotes the force response from the RFOB and fres with ^ denotes the transformed force response. fT1 and fT2 denote setting parameters. fT1 is set to the maximum reaction force during free motion determined by a prior check. In addition, to prevent discontinuities in this force transform, fT1 and fT2 are set to differential values. The use of this force transform can decrease the operational force [Sato et al. 2010], because the transformed force response is set to zero in free motion. The characteristics of thrust wires are nonlinear; however, the maximum reaction forces in free motion are almost the same. This method utilizes this principle. Fig. 4 shows a block diagram of the 3 ch bilateral controller with the force transform on the slave side.
Fig. 3 Force transform
Fig. 4 Block diagram of 3 ch bilateral controller with force transform on slave side
96
T. Sato, S. Sakaino, and T. Yakoh
3 Bilateral Control with Two-DOF System In this chapter, a two-DOF bilateral system is considered. First, grasping and manipulating modes [Kubo and Ohnishi 2006] in the system are elucidated. By considering these modes, the operational forces can be reduced further. Second, the methods of reducing operational forces are explained. 3.1 Grasping and Manipulating Modes Fig. 5 shows the configuration of the robots in the two-DOF system. Subscripts 1 and 2 denote variables for robot1 and robot2. In this setting, grasping and manipulating motion can be implemented. Here, the components of these motions are considered using modes [Kubo and Ohnishi 2006].
Fig. 5 Configuration of robots
In this system, the following grasping and manipulating modes can be considered. Subscripts G and M denote variables of the grasping mode and the manipulating mode, respectively. Grasping and manipulating motion are explained in Fig. 5. xΟM 1 1 xΟ1 x = 1 − 1 x Ο2 ΟG
(12)
f ΟM 1 1 f Ο1 f = 1 − 1 f Ο2 ΟG
(13)
Here, the two-DOF system is considered in the following four cases. The forces fs1res, fs2res , fsMres, and fsGres are considered for each case. ffric denotes the friction of the thrust wire and fext denotes the reaction force from the object. Here, only ffric and fext are considered, because other disturbances can be compensated.
Methods for Reducing Operational Forces
97
Case 1: Free motion in which slave robots are not in contact with an object In this case, fext does not exist. fric f sres 1 = f s1
(14)
fric f sres 2 = fs2
(15)
res f sM = f s1fric + f s 2fric
(16)
f sGres = f s1fric − f s 2fric
(17)
Case 2: Carrying motion in which slave robots grasp and carry an object Here, it is assumed that friction of object and other environments is very small; for example, the object is in the air. In this case, “fs1ext nearly equal -fs2ext” is assumed. ext fric f sres 1 = f s1 + f s1
(18)
ext fric f sres 2 = fs2 + fs2
(19)
res ext fric f sM = f sext + f s 2fric ≅ f s1fric + f s 2fric 1 + f s 2 + f s1
(20)
ext fric fric ext fric fric f sGres = f sext 1 − f s 2 + f s1 − f s 2 ≅ 2 f s1 + f s1 − f s 2
(21)
Case 3: One-side contact motion in which one slave robot contacts an object Here, only slave1 is assumed to contact to the object. ext fric f sres 1 = f s1 + f s1
(22)
fric f sres 2 = fs 2
(23)
res fric f sM = f sext + f s 2fric 1 + f s1
(24)
fric f sGres = f sext − f s 2fric 1 + f s1
(25)
Case 4: Otherwise In this case, fext and ffric appear in all following equations. ext fric f sres 1 = f s1 + f s1
(26)
ext fric f sres 2 = fs2 + fs2
(27)
res ext fric f sM = f sext + f s 2fric 1 + f s 2 + f s1
(28)
ext fric f sGres = f sext − f s 2fric 1 − f s 2 + f s1
(29)
In all cases, fs1fric and fs2fric appear and disturb smooth motion. The purpose of this research is to reduce the operational forces in Cases 1, 2, and 3. Thus, methods of reducing fs1fric and fs2fric are considered in the next section.
98
T. Sato, S. Sakaino, and T. Yakoh
3.2 Methods of Reducing Operational Forces In this section, one basic method and three methods for reducing operational forces are explained. First, Fig. 6 shows a two-DOF bilateral system employing two 4 ch bilateral controllers. This is a basic method. In this method, the friction forces of the thrust wires cannot be removed. This is the problem of systems with thrust wires. In this method, since the forces are expressed as (14)-(29), fs1fric and fs2fric disturb smooth motion.
Fig. 6 Two-DOF bilateral system employing two 4 ch bilateral controllers
Second, the application of the method of [Sato et al. 2010] is considered for reducing operational forces. Fig. 7 shows a two-DOF bilateral system employing two 3 ch bilateral controllers with a force transform on the slave side. This is the two-DOF system using the method of [Sato et al. 2010].
Fig. 7 Two-DOF bilateral system using two 3 ch bilateral controllers with a force transform on the slave side
In this method, the transformed forces are expressed using the following equations. The friction forces are removed in some cases; however, they do not decrease in Case 2. , fˆsres , fˆsMres = 0 , fˆsGres = 0 Case 1: fˆsres 1 =0 2 =0
(30)
ext fric Case 2: fˆsres 1 = f s1 + f s1
(31)
ext fric fˆsres 2 = fs2 + fs2
(32)
res ext fric f sM = f sext + f s 2fric ≅ f s1fric + f s 2fric 1 + f s 2 + f s1
(33)
ext fric fric ext fric fric f sGres = f sext 1 − f s 2 + f s1 − f s 2 ≅ 2 f s1 + f s1 − f s 2
(34)
Methods for Reducing Operational Forces ext fric Case 3: fˆsres 1 = f s1 + f s1
99
(35)
fˆsres 2 =0
(36)
res fric fˆsM = f sext 1 + f s1
(37)
fric fˆsGres = f sext 1 + f s1
(38)
Third, a method with a modal transform is considered for further reducing. Fig. 8 shows the two-DOF bilateral system employing two 3 ch bilateral controllers with a force transform on the slave side in modal space.
Fig. 8 Two-DOF bilateral system employing two 3 ch bilateral controllers with a force transform on the slave side in modal space
In this method, the transformed forces are expressed by the following equations. In Case 2, the friction forces in (40) and (41) can be decreased by reducing the friction forces in the manipulating mode. Because fs1fric and fs2fric are the same sign in the manipulating motion, the absolute value of (fs1fric-fs2fric) is the small value. However, the friction forces do not decrease in Case 3. , fˆsres , fˆsMres = 0 , fˆsGres = 0 Case 1: fˆsres 1 =0 2 =0
(39)
ext f sext f fric − f s 2fric f s1fric − f s 2fric 1 − fs2 Case 2: fˆsres + s1 ≅ f sext 1 = 1 + 2 2 2
(40)
ext f sext f fric − f s 2fric f s1fric − f s 2fric 1 − fs2 fˆsres − s1 ≅ f sext 2 = − 2 − 2 2 2
(41)
res fˆsM =0
(42)
ext fric fric fˆsGres = f sext − f s 2fric ≅ 2 f sext − f s 2fric 1 − f s 2 + f s1 1 + f s1
(43)
ext fric Case 3: fˆsres 1 = f s1 + f s1
(44)
fric f sres 2 = fs2
(45)
res fric = f sext + f s 2fric fˆsM 1 + f s1
(46)
fric − f s 2fric fˆsGres = f sext 1 + f s1
(47)
100
T. Sato, S. Sakaino, and T. Yakoh
Fourth, further reducing based on above methods is considered. The methods of Figs. 8 and 9 each have merit. Therefore, we proposed a method combines the methods of Figs. 8 and 9. Fig. 9 shows a two-DOF bilateral system using the proposed method in which force transforms of both local and modal space are implemented. Because the proposed method has the merits of the methods of Figs. 8 and 9, further reducing of operational forces are expected.
Fig. 9 Two-DOF bilateral system using proposed method
In this method, the transformed forces are expressed by the following equations. Using the proposed method decreases the friction forces in Cases 1, 2, and 3. Therefore, the effect of the proposed method (Fig. 9) is better than the effects of the methods of Figs. 6, 7, and 8. The effect means a reduction effect of operational forces. , fˆsres , fˆsMres = 0 , fˆsGres = 0 Case 1: fˆsres 1 =0 2 =0
(48)
ext f sext f fric − f s 2fric f s1fric − f s 2fric 1 − fs2 Case 2: fˆsres + s1 ≅ f sext 1 = 1 + 2 2 2
(49)
ext f sext f fric − f s 2fric f s1fric − f s 2fric 1 − fs2 fˆsres − s1 ≅ f sext 2 = − 2 − 2 2 2
(50)
res fˆsM =0
(51)
ext fric fric fˆsGres = f sext − f s 2fric ≅ 2 f sext − f s 2fric 1 − f s 2 + f s1 1 + f s1
(52)
ext fric Case 3: fˆsres 1 = f s1 + f s1
(53)
fˆsres 2 = 0
(54)
fric fˆsMres = f sext 1 + f s1
(55)
fric fˆsGres = f sext 1 + f s1
(56)
Methods for Reducing Operational Forces
101
4 Experiments To evaluate the performance of the proposed method, experiments are conducted. Four linear motors with two thrust wires were used as the master and slave robots. The thrust wires were connected to the slave robots (Fig. 5). In this experiment, the velocity response was calculated using the position response of the encoder and pseudo derivation. Table 1 shows the control and setting parameters. An aluminum block was used as the object on the slave side. Table 1 Control and setting parameters Parameter
Value 0.1
Sampling time [ms] 2
Position feedback gain [1/s ]
1600.0
Velocity feedback gain [1/s]
80.0
Force Feedback gain
3.0
Cut-off angular frequency of DOB, RFOB, and pseudo derivation. [rad/s]
500.0
Actuator mass and nominal mass [kg]
0.5
(fT1, fT2) in local [N]
(3.5, 4.5) (10.0, 12.0)
(fT1, fT2) in modal space [N]
Table 2 shows the methods used. Type 1 is the conventional method. Types 2–4 use the 3 ch bilateral controllers with a force transform on the slave side. In Type 2, the force transform is used only in local. In Type 3, the force transform is used only in modal space. Type 4 is the proposed method. Table 2 Settings of methods used Type
Method
1
4 ch bilateral controllers
2
3 ch bilateral controllers with force transform
3
3 ch bilateral controllers with force transform in modal space
4
Proposed method
Three motions were executed in the experiment. Motion 1 is free motion in which the slave robots do not contact the object. Motion 2 is carrying motion in which the slave robots grasp and carry the object. In Motion 2, the object was set in the air, and then no friction exists between the object and other environments. Thus, the absolute values of the reaction forces from the object to slave1 and slave2 were almost equal. Motion 3 is contact motion on one side. In Motion3, only slave1 contacts the fixed object. Figs. 10 and 11 show the position and force responses, respectively, in Motion 1. Figs. 12 and 13 show the position and force responses, respectively, in Motion 2. Fig. 14 shows the force responses of modal space in Motion 2. Figs. 15 and 16
102
T. Sato, S. Sakaino, and T. Yakoh
show the position and force responses, respectively, in Motion 3. These figures show the responses of systems of Types 1–4. These force responses are the values that were transformed using the force transform. In each figure, the master and slave responses are shown. If bilateral control is effective, the slave position responses track to the master position responses. The position responses of Figs. 10, 12, and 14 show the master responses almost corresponding to the slave responses. Therefore, bilateral control was achieved in these cases. In Fig. 11(a), the operational force was large. However, the operational forces in Figs. 11(b), 11(c), and 11(d) were small. Thus, the operational forces were decreased by the force transform in Motion 1. Therefore, Types 2, 3, and 4 were effective to reduce operational forces in Motion 1. In Fig. 14, the operational forces of the manipulating mode of Types 1 and 2 were larger than those of Types 3 and 4. Small operational forces in the manipulating mode are suitable for useful applications. In addition, the absolute values of master1 and master2 in Types 1 and 2 were not the same (Fig. 13) however those of Types 3 and 4 were almost the same. In this situation for Motion 2, Types 3 and 4 were effective, because the same absolute values of the reaction forces are ideal. In Fig. 16, the operational forces of master2 for Types 1 and 3 were larger than those for Types 2 and 4. Thus, Types 2 and 4 were effective to reduce operational forces. Therefore, in Motions 1–3, Type 4 (the proposed method) was the most effective for reducing operational forces. master1 master2 slave1 slave2
0.1
0.05 x [m]
0.05 x [m]
master1 master2 slave1 slave2
0.1
0
0
-0.05
-0.05
-0.1
-0.1 2
4
6 t [s]
8
28
10
30
(a) Type 1
34
36
38
master1 master2 slave1 slave2
0.1 0.05 x [m]
0.05 x [m]
t [s]
(b) Type 2 master1 master2 slave1 slave2
0.1
0
0
-0.05
-0.05
-0.1
-0.1
52
32
54
56
58
60
62
76
78
80
t [s]
(c) Type 3
82 t [s]
(d) Type 4
Fig. 10 Position responses in Motion 1
84
86
Methods for Reducing Operational Forces 6
103 6
master1 master2 slave1 slave2
4
2 f [N]
f [N]
2 0
0
-2
-2
-4
-4
-6
master1 master2 slave1 slave2
4
2
4
6 t [s]
8
-6 28
10
30
(a) Type 1
t [s]
34
36
38
(b) Type 2
6
6
master1 master2 slave1 slave2
4
master1 master2 slave1 slave2
4 2 f [N]
2 f [N]
32
0
0
-2
-2
-4
-4
-6 52
54
56
58
t [s]
60
-6 76
62
78
(c) Type 3
80
t [s]
82
84
86
(d) Type 4
Fig. 11 Force responses in Motion 1 0.06
0.06
master1 master2 slave1 slave2
0.04
0.02 x [m]
x [m]
0.02 0
0
-0.02
-0.02
-0.04
-0.04
-0.06
master1 master2 slave1 slave2
0.04
16
18
20 t [s]
22
-0.06 44
24
46
(a) Type 1
50
0.06
master1 master2 slave1 slave2
0.04
52
54
master1 master2 slave1 slave2
0.04 0.02 x [m]
0.02 x [m]
t [s]
(b) Type 2
0.06
0
0
-0.02
-0.02
-0.04
-0.04
-0.06 90
48
92
94
t [s]
(c) Type 3
96
98
100
-0.06
110
112
114 t [s]
(d) Type 4
Fig. 12 Position responses in Motion 2
116
118
104
T. Sato, S. Sakaino, and T. Yakoh master1 master2 slave1 slave2
40
20 f [N]
20 f [N]
master1 master2 slave1 slave2
40
0
0
-20
-20
-40
-40 16
18
20 t [s]
22
24
44
46
(a) Type 1
t [s]
50
52
54
(b) Type 2 master1 master2 slave1 slave2
40
master1 master2 slave1 slave2
40 20 f [N]
20 f [N]
48
0
-20
0
-20
-40
-40
90
92
94
96
t [s]
98
100
110
112
(c) Type 3
114 t [s]
116
118
(d) Type 4
Fig. 13 Force responses in Motion 2 master_grasping master_manipulating slave_grasping slave_manipulating
60 40
40 20
f [N]
20
f [N]
master_grasping master_manipulating slave_grasping slave_manipulating
60
0
0
-20
-20
-40
-40
-60 16
18
20 t [s]
22
-60 44
24
46
(a) Type 1 40
40
52
54
20
f [N]
f [N]
50
master_grasping master_manipulating slave_grasping slave_manipulating
60
20 0
0
-20
-20
-40
-40
-60 90
t [s]
(b) Type 2
master_grasping master_manipulating slave_grasping slave_manipulating
60
48
-60 92
94
t [s]
(c) Type 3
96
98
100
110
112
114 t [s]
(d) Type 4
Fig. 14 Force responses of modal space in Motion 2
116
118
Methods for Reducing Operational Forces
105
master1 master2 slave1 slave2
0.06 0.04
0.04 0.02
x [m]
0.02
x [m]
master1 master2 slave1 slave2
0.06
0
0
-0.02
-0.02
-0.04
-0.04
-0.06 8
10
12
14
t [s]
16
18
-0.06 28
30
(a) Type 1
t [s]
34
36
38
(b) Type 2 master1 master2 slave1 slave2
0.06 0.04
master1 master2 slave1 slave2
0.06 0.04 0.02
x [m]
0.02
x [m]
32
0
0
-0.02
-0.02
-0.04
-0.04
-0.06 50
52
54 t [s]
56
-0.06 68
58
70
(c) Type 3
72
t [s]
74
76
78
(d) Type 4
Fig. 15 Position responses in Motion 3 master1 master2 slave1 slave2
40
20 f [N]
20 f [N]
master1 master2 slave1 slave2
40
0
0
-20
-20
-40
-40 8
10
12
14
t [s]
16
18
28
30
(a) Type 1
34
36
38
master1 master2 slave1 slave2
40 20 f [N]
20 f [N]
t [s]
(b) Type 2 master1 master2 slave1 slave2
40
32
0
-20
0
-20
-40
-40 50
52
54 t [s]
56
(c) Type 3
58
68
70
72
t [s]
74
(d) Type 4
Fig. 16 Force responses in Motion 3
76
78
106
T. Sato, S. Sakaino, and T. Yakoh
5 Conclusions In this paper, a bilateral control system comprising two-DOF remote robots capable of grasping and manipulating motion was considered. The method of [Sato et al. 2010] was applied to this system, and then a method for further reducing operational forces was proposed. The validity was confirmed by the experiments. In the experiments, the four methods were conducted, and the proposed method was the most effective to reduce operational forces. In future multi-DOF systems with thrust wires, operational forces can be reduced considering a force transform and modal space. In future research, the procedures of this paper will be effective.
Acknowledgment This work was supported in part by a Grant-in-Aid for the Global Center of Excellence for High-Level Global Cooperation for Leading-Edge Platform on Access Spaces from the Ministry of Education, Culture, Sport, Science and Technology, Japan.
References [Hyodo et al. 2009] Hyodo, S., Soeda, Y., Ohnishi, K.: Verification of flexible actuator from position and force transfer characteristic and its application to bilateral teleoperation system. IEEE Trans. on Industrial Electronics 56(1), 36–42 (2009) [Iida and Ohnishi 2004] Iida, W., Ohnishi, K.: Reproducibility and operationality in bilateral teleoperation. In: Proc. of the 8th IEEE Int. Workshop on Advanced Motion Control, pp. 217–222 (2004) [Kubo and Ohnishi 2006] Kubo, R., Ohnishi, K.: Hybrid control for multiple robots in grasping and manipulation. In: Proc. of the 12th International Power Electronics and Motion Control Conference, Portoroz, Slovenia, pp. 367–372 (2006) [Murakami et al. 1993] Murakami, T., Yu, F., Ohnishi, K.: Torque sensorless control in multidegree-of-freedom manipulator. IEEE Trans. on Industrial Electronics 40, 259–265 (1993) [Oh and Hori 2008] Oh, S., Hori, Y.: Intelligent force sensor-less power assist control based on external force signal space. In: Proc. of the 2007 Annual Conf. of IEE of, Japan, Industry Applications Society, Japan (2008) (in Japanese) [Sato et al. 2010] Sato, T., Skaino, S., Ohnishi, K.: Methods for reduction of operational force in force sensor-less bilateral control with thrust wire. In: Proc. of the 3rd International Conference on Human System Interaction, Rzeszow, Poland, pp. 412–418 (2010) [Smith and Hashtrudi-Zaad 2006] Smith, A.C., Hashtrudi-Zaad, K.: Neural-Network-based contact force observers for haptic applications. IEEE Trans. on Robotics. 22(6), 1163– 1175 (2006) [Ohnishi et al. 1996] Ohnishi, K., Shibata, M., Murakami, T.: Motion control for advanced mechatronics. IEEE/ASME Transactions on Mechatronics 1(1), 56–67 (1996)
Methods for Reducing Operational Forces
107
[Tanaka et al. 2009] Tanaka, H., Ohnishi, K., Nishi, H., Kawai, T., Morikawa, Y., Ozawa, S., Furukawa, T.: Implementation of bilateral control system based on acceleration control using FPGA for Multi-DOF haptic endoscopic surgery robot. IEEE Trans. on Industrial Electronics 56(3), 618–627 (2009) [Tsuji et al. 2006] Tsuji, K., Soeda, Y., Nagatomi, H., Kitajima, M., Morisawa, Y., Ozawa, S., Furukawa, T., Kawai, T., Ohnishi, K.: Free allocation of actuator against end-effector by using flexible actuator. In: Proc. of the 9th IEEE Workshop on Advanced Motion Control, pp. 74–79 (2006)
Applications of Neural Networks in Semantic Analysis of Skin Cancer Images K. Przystalski, L. Nowak, M. Ogorzałek, and G. Surówka Department of Information Technologies, Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Krakow, Poland
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. Computational intelligence is finding more and more applications in computer aided diagnostics, helping doctors to process large quantities of various medical data [Buronni et al. 2004]. In dermatology it is extremely difficult to perform automatic diagnostic differentiation of malignant melanoma based only on dermatoscopic images. Applying artificial intelligence algorithms to explore and search large database of dermatoscopic images allow doctors to semantically filter out image with specified characteristics. This paper presents an semantic approach for characteristic objects classification found in image database of pigment skin lesions, based on radial basis function kernel for artificial neural networks. Presented approach is divided into few parts: JSEG image segmentation [Deng et al. 2001], feature extraction and classification. Prepared features vector consist of color models parts. For classification Artificial Neural Networks and Support Vector Machines are used and their performance is evaluated and compared. Success rates in both cases are greater than 90%.
1 Introduction Skin cancer is a fast developing disease of modern society, reaching 20% increase of diagnosed cases every year. Dermatoscopy is primary and commonly used method of diagnostics for nearly thirty years. This method is non-invasive and requires great deal of experience to make correct diagnosis. As described in [Menzies et al. 2005] only experts have 90% sensitivity and 59% specificity in skin lesion diagnosis (table 1.). The result of strict compliance with the instructions should be similar to this document. Specificity and sensitivity are indexes calculated like below: Sensitivity =
True Positive True Positive + False Negative
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 111–124. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
(1)
112 Specificity =
K. Przystalski et al. True Negative True Negative + False Positive
(2)
Table 1 Sensitivity and specificity Experts Dermatologists Trainees General practitioners
Sensitivity 90% 81% 85% 62%
Specificity 59% 60% 36% 63%
Variables like TP, TF, FN and FP are described in table 2. That is why computer aided diagnostic is gaining more popularity every year and more complex algorithms are being developed. Some of this algorithms use: colorimetric and geometric analysis, statistical differentiation based on knowledge databases of diagnosed cases (supplied with most video and photo dermatoscopes). Table 2 Diagnosing possibilities Diagnosed as Abnormal Diagnosed as Normal
Actually Abnormal
Actually Normal
True Positive (TP) False Negative (FN)
False Positive (FP) True Negative (TN)
Doctors so far has few diagnosis methods. All methods are calculated manually. Most known and used method is called ABCD [Stolz et al. 1994]. It is calculated by recognizing four parameters: asymmetry, border, color and differential structures. The result is called Total Dermoscopy Score (3) and it’s calculated in following way: TDS = A * 1,3 + B * 0 ,1 + C * 0 ,5 + D * 0 ,5
(3)
If TDS is greater than 4.75 then lesion should be handled as suspicious. Lesion with TDS greater than 5.45 gives strong indication that this lesion is cancerous. An alternative method is proposed by Menzies. Scoring is based on finding positive and negative features in lesion image. Positive features means here features that indicates that lesion is cancerous. Menzies method describes only two negative features. First is patterns symmetry which means if the whole structure of lesion or color is symmetrical. Second feature is the color count. If lesion contains only one color then lesion in this approach should be recognized as benign lesion. Menzies scoring method describes eight positive features. Most of these features base on melanoma’s specific patterns like dots, veil or broadened network. Evaluating the approaches Argenziano [Argenziano et al. 1998] made a comparisons of different methods finding that the seven point checklist had sensitivity in the range of 95% and specificity of 75% while for pattern analysis these number were 91% and 90% respectively going down to 85% and 66% respectively for the
Applications of Neural Networks in Semantic Analysis of Skin Cancer Images
113
ABCD rule. This paper introduces a semantic analysis method of dermatoscopic images of malignant melanoma. In process of analysis all images are being segmented into semantic objects containing various textures, shapes and colors. Calculation based on this method shows accuracy in object classification ranging from 92,05% to 97,44%, depending on train/test ratios. Application of gathered knowledge based on conducted analysis allows creation of semantic search tools that can be used for automatic classification of dermatoscopic images. The paper is organized as follows. In section 2 the concepts of Artificial Neural Network Radial Basis Function is presented. Section 3 reviews the idea of semantic Malignant Melanoma image objects classification. In section 4, we give the experimental data to show the effectiveness of our method. In next section a comparison is described. Presented method and method based on SVM classifier are presented. Finally, we conclude this paper in section 6 and 7 with some future direction.
2 Concept of ANN RBF Classifier Neural network is widely used in image recognition. NN are composed of simple elements that are inspired by biological nervous systems. These elements are working in parallel. A network is trained to perform connections between elements. Network is trained until output matches target. As the result elements’ weights are set properly. In most cases few values are send as input and just one values is expected as target. Radial basis function, known also as Gaussian function is used often due to its strong classification power. RBF consists of two layers as shown in figure 1. First layer is a hidden radial basis layer of S1 neurons.
Fig. 1 Neural Network Radial Basis Function
The second layer is a linear basis layer of S2 neurons. In figure 1 it is presented how NN works with RBF. First input values are forwarded to radial basis transfer function, which is vector p distance between weight vectors w. Next w vectors are multiplied by bias b. The radial basis function has a maximum of 1 when input is 0. When distance between vector w and p decreases then output increases, so a neuron produce 1 whenever vector p is the same as w. Mentioned b adjust layer’s
114
K. Przystalski et al.
neurons. In this paper Mathworks Matlab Neural Networks with Radial Basis Function is used. Folowing commands are used: net = newrbe(Tra in, Classes, Spread) predicted = sim(net, Test)
(4)
3 Semantic Malignant Melanoma Image Features Classification Proposed method is divided in 5 parts: image segmentation, objects extraction, features extraction, training and classification. JSEG as a solution for image segmentation is used. 3.1 Melanoma Image Segmentation JSEG segmentation method [Chuan et al. 2009] is done in two steps: color quantization and spatial segmentation. In the first step image is quantized to several region classes. Quantization is done in color space only. After that a classmap is created by their corresponding color class. Second step is spatial segmentation is performed on created class-map. Color similarity is in second step not taken into consideration. From command line it’s used as following: segwin - i filename.j pg - t 6 - o filename.j seg.jpg 1 - q 20
(5)
where parameters i and o are input and output files, t is set to 6 which is the type of JPEG image, q is quantization threshold. It can be set from 0 to 600. Images are not changed in size. Additional parameter m is set to 0.4 by default. It’s the region merge threshold. It can be set from 0 to 1. The result of image segmentation by using JSEG algorithm is presented in figure 2. In figure 2a. a lesion before segmentation is presented. In figure 2b. a lesion after segmentation is presented. In this example quantization parameter is set to 20. 3.2 Object’s Feature Extraction The next step is objects extraction [Liu et al. 2005]. Described method extract borders from a segmented image. By using these borders image objects can be extracted. Object extraction is done by a region growing algorithm. This is the most time consuming part of presented method. For 5198 objects this part takes about 90 minutes on a Intel Core 2 Duo workstation without using any additional helpers like CUDA.
Applications of Neural Networks in Semantic Analysis of Skin Cancer Images
Fig. 2 Image segmentation by using JSEG Table 3 Features Vector Features vector RGB
HSV
NTSC
YCbCr
Feature R G B H S V Y1 I Q Y2 CB CR SE
Description Red Green Blue Hue Saturation Value Luminance Chrominance Chrominance Grayscale Blue-difference Red-difference Object size
115
116
K. Przystalski et al.
After each object is separated feature extraction is proceed. All features are presented in table 3. Each object is represented in four colour spaces: RGB, HSV, NTSC and YCbCr. Every colour space is represented by few variables like: red, green and blue in RGB. There are 12 colour spaces features. These features are not correlated. Additionally proposed features vector includes some colour spaces features compositions with constants. The last feature is size. It is a simple objects pixel count. Together there are 13 features in proposed features vector, what is shown in table 3.
Fig. 3 Feature extraction
In figure 3 feature extraction is presented. In figure 3b. objects are displayed as an average of RGB values of feature its feature vector. As shown in figure 3a. before features can be extracted each object need to be extracted from the whole lesion. A map is created and region growing algorithm is used to extract objects. 3.3 Training and Classification Vector values are next normalized to values between -1 and 1. So prepared values can be divided into groups of train and test data as shown in table 4. Last step is to use prepared train data and put these as input into neural network algorithm as
Applications of Neural Networks in Semantic Analysis of Skin Cancer Images
117
described on figure 1. Used Matlab’s ANN RBF algorithm [Muezzinoglu 2006] takes about 2-10 minutes to learn. It depends on training data set. Afterwards gathered knowledge can be used to classify test data. Results for NN are shown in tables 4-11. In figure 4. exemplary feature classification result is presented. In figure 4a. objects are shown by using its RGB average. In figure 4b. classified image is presented. Small black objects are these objects which are not considered because of too small size. This kind of objects can be reduced by using greater quantization parameter than 20.
Fig. 4 Feature classification
4 Results In this section, we present the results of our semantic analysis of malignant melanoma images using above described classification process. Six semantic categories are used in this paper: black, dark brown, light brown, blue-grey, red,
118
K. Przystalski et al.
natural skin color. Each malignant melanoma image possesses several objects. As described above, each object is described as a features vector.
Fig. 5 Melanoma color spaces
4.1 Color Decomposition Skin lesions are difficult to classify because of their short color ranges, instead of real-world images. Malignant melanoma is a kind of skin cancer that has some characteristic color groups like: black, blue-grey, red, light brown, dark brown and skin color. These colors appear on images and can depend on cancer progress stage, lesion depth and blood vessels. This features are good shown on SIAscope images. These colors are also used in melanoma diagnosis scales like ABCD. Melanoma colors are shown in two color spaces in Figure 5. 4.2 Classification Scenarios ANN RBF method was used to classify each object separately. Five scenarios was created for classification purpose. First scenario is with 25% of objects randomly
Applications of Neural Networks in Semantic Analysis of Skin Cancer Images
119
selected for train and test data. Next four also randomly selected objects in ratio 34/66, 50/50, 70/30 and 80/20. Objects does not repeat in train and test group, so every object selected to a group is eliminated from whole group of objects in next select iteration. Our results are shown in table 4. Best success ratio is reached with 25/25. As shown in table 4. 25/25 classify a little bit more objects as 80/20 ratio. What is very interesting is that 25/25 has better classification results than 80/20 where 80% of objects is used for training and less than in 25/25 is being classified. Other results success rates of 34/66, 50/50, 70/30 are between 92 and 94%. This is a quite good result. 4.3 Features Classification Results For practical purpose the most important result is the one where least data is used for training and more is being predicted. In our case it is 34/66 ratio. This result has the lowest classification rate than other tested scenarios, nevertheless, it is also quite high with result of 92,05%. Our best success rate of 96,61% is a good result for future consideration. Table 4 Classification Success Rate for Different Train to Test Ratios Train/Test [%]
Success rate
Classified objects
Trained
Tested
25 / 25
96.61 %
1253 / 1297
1297
1297
34 / 66
92.05 %
3346 / 3635
1553
3635
50 / 50
92.87 %
2410 / 2595
2591
2595
70 / 30
93.60 %
1463 / 1563
3626
1563
80 / 20
93.66 %
975 / 1041
4148
1041
Table 4 contain information about number of objects belonging to training and testing datasets. Those datasets were prepared with different ratios. Tables 5-11 represent results of classification success rates for six object groups: skin regions, red regions , black regions, light and dark brown regions, gray-blue regions. As shown below in table 5 best results (average 96,2 %) are reached for objects corresponding to skin regions of the images. This is because of characteristic high color values and size of region. Most images contain only few regions that classify as skin, and those are usually quite big corresponding to other object. Table 5 Skin Region Classification Success Rate Train/Test [%]
Success rate
Classified objects
25 / 25
98.86 %
173 / 175
34 / 66
93.62 %
499 / 533
50 / 50
97.31 %
326 / 335
70 / 30
96.08 %
196 / 204
80 / 20
95.39 %
145 / 152
120
K. Przystalski et al.
Object classified as black regions of image represents parts of skin lesion that contain very high concentration of melanin (skin pigmentation) in multiple skin layers. Most malignant melanoma images contain few black regions. Table 6 shows success rate in classification of image regions with low saturation and intensity values, due to low saturation those regions can be sometimes misclassified as blue-gray regions reaching only 79.6% success rate for 34 to 66 ratio. Table 6 Black Region Classification Success Rate Train/Test [%]
Success rate
Classified objects
25 / 25
86.96 %
100 / 115
34 / 66
79.61 %
207 / 260
50 / 50
70.67 %
106 / 150
70 / 30
70.67 %
55 / 78
80 / 20
77.55 %
38 / 49
Blood vessels visible on dermatoscopic images appear in pink-red color. Results in classification success rates for regions containing blood vessels are shown in table 7. Number of classified objects shows that only few images contained blood vessels, and low classification success rate suggests that misclassified regions contained brown objects. Red objects classification gives worst results, so this is a futures work issue to try increase classification success rate here. Table 7 Red Objects Classification Success Rate Train/Test [%]
Success rate
Classified objects
25 / 25 34 / 66 50 / 50 70 / 30 80 / 20
50.00 % 47.14 % 60.61 % 55.56 % 69.23 %
12 / 24 33 / 70 20 / 33 15 / 27 18 / 26
Table 8 Brown Regions Classification Success Rate Train/Test [%]
Success rate
Classified objects
25 / 25
98.86 %
260 / 263
34 / 66
95.17 %
631 / 663
50 / 50
91.34 %
496 / 543
70 / 30
94.59 %
280 / 296
80 / 20
94.21 %
228 / 242
Applications of Neural Networks in Semantic Analysis of Skin Cancer Images
121
Dark brown color represents high melanin concentration in top layers of skin, and it is found in most pigmentation skin lesions images. Those regions are easily recognized and classified, although they can be misclassified as light brown regions because it is difficult to determine when region is light brown or when it is already dark. Rate of correctly classified regions are shown in table 8. Second best classification rate is achieved for regions with count is the biggest corresponding to other found on images. These regions represent parts of the images where it is possible to detect pigmentation network. Results of classification are shown in table 9. For the same reasons as mentioned earlier some misclassification has occurred. Table 9 Light Brown Regions Classification Success Rate Train/Test [%]
Success rate
Classified objects
25 / 25
98.42 %
498 / 506
34 / 66
95.02 %
1602 / 1686
50 / 50
96.63 %
1349 / 1396
70 / 30
97.66 %
836 / 856
80 / 20
98.20 %
490 / 499
Last classified regions type are areas of image covered by blue-grey veil. Characteristic thing about these regions is that areas covered by them could easily belong to other region types. Table 10 represents successful classification rates for blue-grey regions. Table 10 Blue-Grey Regions Classification Success Rate Train/Test [%]
Success rate
Classified objects
25 / 25
98.13 %
210 / 214
34 / 66
88.42 %
374 / 423
50 / 50
81.88 %
113 / 138
70 / 30
79.41 %
81 / 102
80 / 20
76.71 %
56 / 73
4.4 Training and Classification Table 11 shows all 5189 mentioned before objects. The count of each object class depends of their occurrence on skin lesion images and JSEG segmentation algorithms threshold. Is threshold is set to high then there is less objects, but if we set it to a low value especially skin and light brown objects count grow substantially.
122
K. Przystalski et al. Table 11 Object Classes Rate Object class
Count
Rate
Black
378
7.28%
Red
96
1.85%
Skin
748
14.42%
Light Brown
2340
45.10%
Dark Brown
949
18.29%
Blue-Gray
678
13.07%
As shown in table 11 light brown has the biggest impact to classification success rate. Next, dark brown, blue-gray and skin objects. As mentioned in above tables exactly this four object classes has also bests success rates.
5 Comparisons Presented method is based on radial basis function applied within neural networks. This method gives good enough results. To make sure that current method is good enough we replaced NN in the process with SVM classifier [Wu et al. 2008]. Support Vector Machines classifier was used as a plug-in for Matlab (http://www.csie.ntu.edu.tw/~cjlin/libsvm/). Only two functions were used: model = svmtrain(t est, target, opt) [pred, accur, descision] = svmpredict (test, target, model, opt)
(6)
In first function testing and target values is needed. Options in this case mean parameters that are also used in the command line version. In options classification method within SVM can be chosen. Additionally kernel type should be given. By default it’s set to linear kernel. Testing and target should be vectors like in case of NN. As the result a model is returned. This model is then used as a parameter for svmpredict() function. In this method testing data need to be given. Target is only given to calculate the accuracy. It’s not taken in to consideration while predicting. In this function also options can be specified like in command line equivalent. As the result predicted values are returned. These values are automatically compared to target values and accuracy is calculated. The last returned values are decision values which are internal SVM values that were used for prediction. They can be used for a more deep consideration about SVM internal prediction algorithm. Table 12. presents the results of SVM classification for each object type. Each object type classification result is presented for each of four basic SVM kernels. Linear kernel gives the best results here and the best recognized object type are dark brown objects.
Applications of Neural Networks in Semantic Analysis of Skin Cancer Images
123
Table 12 SVM prediction results Linear
Polynomial
Radial
Sigmoid
Black
93,80%
92,84%
94,66%
94,28%
Red
95,87%
91,43%
95,05%
94,98%
Skin
97,49%
97,36%
98,01%
97,29%
Light Brown
97,83%
94,79%
97,58%
95,32%
Dark Brown
98,99%
80,07%
42,71%
77,32%
Blue-Gray
98,15%
98,80%
97,95%
97,64%
In the following table SVM and NN are compared. Few cases of SVM are considered, depends on kernel used. Table 13 Classifiers comparison Classifier
SVM
NN
Ratio
Linear
Polynomial
Radial
Sigmoid
Radial
34 / 66
96,75%
94,04%
96,45%
95,71%
92.05 %
50 / 50
97,12%
94,72%
96,95%
95,90%
92.87 %
70 / 30
97,44%
95,16%
97,29%
95,97%
93.60 %
80 / 20
95,53%
90,54%
94,10%
92,08%
93.66 %
In above table SVM comes as a better solution as NN, but differences between NN and SVM are not big, about 2-4%, depends on data ratio. Each prediction is done 100 times. Testing and training data is get by random selection each time. Above results presented in table 13. are averages of all predictions for current classifier, kernel and ratio.
6 Conclusions Performed experiments indicate high success rate in object classification and most of found object are subject of further analysis in term of texture recognition, and counting smaller objects. To summarize the results, the Neural Networks Radial Basis Function method based on the semantic analysis provides very promising results in classifying the malignant melanoma images. Results are good enough to be helpful for medics for semantic image filtering.
7 Future Works Further exploration of the data could be interesting and give more precise results. We want also compare our results with results that will be given by other methods. We want also increase success rate of red and skin objects for overall better success rate. Additionally we want to use this tool in early malignant melanoma
124
K. Przystalski et al.
detection algorithm in the future. This will be the goal of our future works. We want also compare current results based on dermatoscopic images and images based on SIAscope images.
Acknowledgment This research has been supported by the Research Grants No. N518 419038 and N518 506439.
References [Argenziano et al. 1998] Argenziano, G., Fabbrocini, G., Carli, P., DeGiorgi, V., Sammarco, P.E., Delfino, M.: Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions. Arch. Dermatl. 134, 1563–1570 (1998) [Buronni et al. 2004] Buronni, M., Corona, R., Dell’Eva, G., Sera, F., Bono, R., Puddu, P., Perotti, R., Nobile, F., Andreassi, L., Rubegni, P.: Melanoma computer-aided diagnosis: reliability and feasibility study. Clin. Cancer Res. 1881–1886 (2004) [Chuan et al. 2009] Yu-Chang, C., Wang, H.J., Li, C.F.: Semantic analysis of real-world images using support vector machine. Experts Systems with Applications, 10560–10569 (2009) [Deng et al. 2001] Deng, Y., Manjunat, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence (2001) [Liu et al. 2005] Liu, Y., Zhang, D., Lu, G., Ma, Y.: Region-based image retrieval with high-level semantic color names. In: IEEE Proc. of the Multimedia Modeling Conf., pp. 180–187 (2005) [Menzies 1999] Menzies, S.W.: Automated epiluminescence microscopy: human vs machine in the diagnosis of melanoma. Arch. Dermatol. 135, 1538–1540 (1999) [Menzies et al. 2005] Menzies, S.W., Bischof, L., Talbot, H., et al.: The Performance of SolarScan. An automated dermoscopy image analysis instrument for the diagnosis of primary melanome. Archive of Dermatology, 1388–1396 (2005) [Muezzinoglu 2006] Muezzinoglu, M.K., Zurada, J.M.: RBF-based neurodynamic nearest neighbor classification in real pattern space. Pattern Recognition (2006) [Stolz et al. 1994] Stolz, W., Riemann, A., Cognetta, A.B., Pillet, L., Abmayr, W., Hölzel, D., Bilek, P., Nachbar, F., Landthaler, M., Braun-Falco, O.: ABCD rule of dermatoscopy: a new practical method for early recognition of malignant melanoma. Eur. J. Dermatol. 7, 521–528 (1994) [Wu et al. 2008] Wu, Y. C., Lee, Y.S., Yen, S. J., Yang, Y.C.: Robust and efficient multiclass SVM models for phrase pattern recognition. Pattern Recognition 2874–2889 (2008)
Further Research on Automatic Estimation of Asymmetry of Melanocytic Skin Lesions P. Cudek1, J.W. Grzymała-Busse1,2, and Z.S. Hippe1 1
Institute of Biomedical Informatics, University of Information Technology and Management, Rzeszów, Poland {zhippe,pcudek}@wsiz.rzeszow.pl 2 Department of Computer Science, University of Kansas, Lawrence, KS
[email protected]
Abstract. This paper presents a method for automatic identification of asymmetry in digital images containing melanocytic skin lesion. Our method is a part of the new system for classifying skin lesion using Stolz strategy, based on the ABCD rule.
1 Introduction In our introductory publication [Hippe at al. 2011] we briefly described the problem of melanoma diagnosis based on the application of ABCD rule [Stolz at al. 2006]. Among various types of melanocytic skin lesions readily recognized using the rule, we devote our attention to Junctional lesion, Junctional and dermal lesion, Atypical/dysplastic dermal lesion, Dermal lesion and Palmo-plantar lesion from the Lesion group, and Superficial melanoma and Nodular melanoma from the Melamoma group. We assume that every improvement of melanoma diagnosis has significant impact on saving human live. However, it should be mentioned that for some time the research carried out in our group is also directed to dynamic generation of selected melanocytic skin lesion features [Hippe at al. 2006], just features applied in the ABCD rule.
2 General Characteristic of the Developed Module According to information contained in the previous section, the current task of our research is to develop and implement the new diagnostic module within an existing computer program system [Grzymała-Busse et al. 2005; WWW-1 2011] for classification of melanocytic skin lesion. The new diagnostic module described briefly in next sections, is treated by us as a necessary step required before we can reach the basic goal of our research, namely the computer-aided synthesis of images of skin lesions. In other words, it seems necessary first to learn how we can
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 125–129. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
126
P. Cudek, J.W. Grzymała-Busse, and Z.S. Hippe
automatically discover all specific features (symptoms) of a lesion, and then how we can simulate them in the synthesized image. However, we would like to ephasize that the diagnostic module (partly described in this Chapter) will be eventually used in the final step of our research, when automatic extraction of diagnostic features from synthetically created images of skin lesions must be executed. The specific features of a skin lesion recognized by the new module are directly related to already cited ABCD rule. We now focus our attention on the best extraction of the letter A required by the rule, namely A for Asymmetry.
Fig. 1 The general structure of automatic image recognition system form perspective of asymmetry assessment
It is assumed that the developed module should provide the ability to analyze medical images in JPG, BMP, PNG, and TIF graphic formats. After the loading of an investigated image (see Fig. 1) the preprocessing operation is performed, responsible for improving the quality of picture and creating the next version of the image used in subsequent steps. First of all, the real colour of the analyzed image is algorithmically converted into grayscale status according to (2): Y= 0.299R + 0,587G + 0.114B
(2)
where: Y – is pixel value in the grayscale, whereas R, G, B are respective components of RGB color value. In the next step of preprocessing an adaptive histogram equalization used to improve the local contrast of the image. To achieve this, the Contrast Limited Adaptive Histogram Equalization (CLAHE) [Zuiderveld 1994] method is applied. This method computes several histograms corresponding to a distinct
Further Research on Automatic Estimation
127
section of the image, and uses them to redistribute the lightness values of the image. At this stage a blurred version of the image is also created, used then in the segmentation process.
3 Asymmetry Assessment In ABCD rule the asymmetry assessment involves information about number of symmetry axes lying in lesion area. Logical value of the letter A can be: symmetric lesion (there are two perpendicular axes of symmetry), 1-axial asymmetry (there is only one axis of symmetry) and 2-axial asymmetry (there is no axis of symmetry). The numerical values used in the calculation of TDS parameter for the above logical values are 0, 1 and 2 points, respectively. The developed algorithm for evaluation of asymmetry (see Fig. 2.) is based on the analysis of the black and white image created as a result of segmentation. In this image white dots belong to lesion area, whereas black dots represent healthy skin.
Fig. 2 Algorithm for the evaluation of lesion asymmetry
In the first step a center of gravity (GC) is determined. Next, algorithm creates an array containing the length of straights (radiuses) outgoing from the GC point
128
P. Cudek, J.W. Grzymała-Busse, and Z.S. Hippe
with angle in range between 0 and 359 degrees. Next task is to find straights, which can be symmetry axis of lesion. For this purpose, for each of the 180 potential axis of symmetry SFAα (Score For Axis) is calculated as presented in the following procedure: Repeat for β=1 to β =179 r1 = get_radius(β) r2 = get_radius(-β) difference =|r1 - r2| if(r1
129
Radius length
Further Research on Automatic Estimation
Fig. 3 Comparison of radiuses for the potential axis of symmetry with an angle α equal 0
Acknowledgment This research has been supported by the grant No. N N516 482640 from the National Research Center in Cracow, Poland.
References [Grzymała-Busse et al. 2005] Grzymała-Busse, J.W., Hippe, Z.S., Knap, M., Paja, W.: Infoscience technology: The impact of internet accessible melanoid data on health issues. Data Science J 4, 77–81 (2005) [Hippe at al. 2006] Hippe, Z.S., Grzymała-Busse, J.W., Piątek, Ł.: Randomized dynamic generation of selected melanocytic skin lesion features. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and WEB Mining, pp. 21–29. Springer, Heidelberg (2006) [Hippe at al. 2011] Cudek, P., Grzymała-Busse, J.W., Hippe, Z.S.: Asymmetry of digital images describing melanocytic skin lesions. In: Burduk, R., Kurzyński, M., Woźniak, M., Żołnierek, A., et al. (eds.) Computer Recognition Systems 4. AISC, vol. 95, pp. 605–611. Springer, Heidelberg (2011) [Stolz at al. 2006] Stolz, W., Braun-Falco, O., Bilek, P., Landthaler, M., Burgdorf, W.H.C., Cognetta, A.B.: Atlas of dermatoscopy (Polish Edition). Ed Office Czelej Sp. z o.o, Lublin, Poland, pp. 41, 210 (2006) [WWW-1 2011] http://www.melanoma.pl/ (accessed March 29, 2011 ) [Zuiderveld 1994] Zuiderveld, K.: Contrast limited adaptive histogram equalization, Graphics Gems IV. Academic Press Professional, Inc., New York (1994)
Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses A. Świtoński1,2, R. Bieda2, and K. Wojciechowski1,2 1
Polish-Japanese Institute of Information Technology, Aleja Legionów 2 41-902 Bytom Poland {kwojciechowski,aswitonski}@pjwstk,edu.pl 2 Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice Poland {konrad.wojciechowski,adam.switonski,robert.bieda}@polsl.pl
Abstract. We have described the advantages of the multispectral imaging in different applications, outlined its challenges and analyzed approaches to multispectral objects detection. We have proposed, built and deployed multispectral acquiring device with liquid crystal tunnable filter for the endoscopy diagnosis. We have applied the device in the photodynamic diagnosis for cancer detection based on the spectral pixel signatures and supervised machine learning techniques. We have evaluated introductory step in the spectrum matching approach - spectrum estimation, by linear transformations of the pixels spectral signatures. We have used image dataset acquired of the GretagMacbeth colorchecker. We have examined visualization methods of multispectral image, helpful for a gastroscopy and colonoscopy diagnostician. We have tried to calculate colour image by the linear regression methods.
1 Introduction 1.1 Multispectral Imaging Multispectral image contains spectral data of its every pixel and is represented by the function I(x,y,λ). It has the structure of three dimensional image cube, where two dimensions are responsible for spatial and the third one is of the spectral domain. In fact, the image gives information of the intensities I of the radiance with the specified wavelengths λ which are measured in different image points (x,y). The accuracy of the spectrum estimation depends on the number of image channels - wavelengths in which intensity is captured. There is no strict distinction between multispectral and hyperspectral images, but it is assumed that multispectral ones contain tens and hyperspectral hundreds of channels. A typical color image aggregates some spectral data. The sight effect is caused by electromagnetic radiance of the specified range. The electromagnetic radiance reaching the eye passes through the special optical set, which focuses the light on the retina. Three types of photoreceptors are located on the surface of the retina
Z.S. Hippe et al. (Eds.): Human - Computer Systems Interaction, AISC 99, Part II, pp. 131–145. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
132
A. Świtoński, R. Bieda, and K. Wojciechowski
and each of them aggregates data of the captured radiance. Finally, collected data is sent by the optic nerve to the human brain which reconstructs the image. That is the reason why a color image is represented by three color channels, no more and no less, exactly three as three types of photoreceptors. There is a number of different color spaces, but because of the hardware constraints, in most cases color images are captured and displayed in the RGB color space. It means that we estimate color by the amount of three basic colors: red, green and blue and display color by mixing them. That is not exactly what human receptors do, but it is very close and is considered to be a sufficient approximation. A multispectral image contains much more information than a typical color image. It gives detailed spectrum data of its every pixel, in contrast to the color image, which has only some kind of aggregation of the spectrum. What is more, because of the hardware limitations and applied RGB sensors in capturing, color images are not able to store complete color palette - there are some colors distinguishable by a human and undistinguishable by the color image. A multispectral image can be visualized. To avoid the loss of information, each spectral channel should be displayed as a separate monochromatic image. Thus, the analysis and interpretation of such a sequence of images by human is very difficult and in case of the hyperspectral image almost impossible. In practical applications, multispectral imaging should be supported by automatic or semiautomatic pattern recognition systems. Spectral data could improve efficiency of such systems because of the better distinction of detected objects in the spectral space than in color space. Objects which look the same can have different spectrum and on the other side, objects which look very different could have some common spectrum features. We can build custom visualization of the spectral images with artificial colors. It can be done by selecting most informative spectral bands or reducing spectral space in such a way to keep most important data. It could improve contrast between image objects in comparison to the corresponding color images. There are numerous areas in which multispectral imaging is applied, say biomedicine. There are a lot of diseases and pathological areas which are noticeable in the spectral space; tumor and cancer detection are best examples [Woolfe et al. 1999, Rajpoot 2004, Masood et al. 2008]. Collecting spectral data by the land information systems helps to improve more detailed objects detection and classification, for instance rocks, soils and vegetation. Exemplary method could be found in [Carvalho et al 2003]. Also it makes easier to evaluate the state of plants population, ocean flora or forest stands and allows for a deeper analysis. [Hsu 2007] applied hyperspectral imagery for the detection of vegetation of ten different classes. In archeology and art history is used in verifying the authenticity of works of arts, as for instance paintings or books and detecting some hidden contents.[Lau et al 2008] analyze 16th century paintings to characterize the paint layers within micro samples based on the multispectral imaging, etc.
Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses
133
1.2 Photodynamic Diagnosis Photodynamic diagnosis is the method of the tissue analysis based on the fluorescent spectroscopy. The tissue illuminated by the violet light emits the fluorescent spectrum which depends on the amount of the fluorine substances absorbed by the tissue. There are two basic approaches in acquiring the fluorescent spectrum: • photodynamic diagnosis: some time before the diagnosis, special, selectively absorbed, fluorescent colorant is locally put on the examined tissue. • autofluorescence: based on the natural absorption of the fluorescent substances contained by the cells. It requires a much more detailed analysis of the spectrum, because the differences are much less remarkable. Photodynamic diagnosis could be used in the cancer detection. Cancerous tissues characterize by a much faster absorption of the fluorine substances than the healthy ones. It could be noticed on the reflected tissue spectrums.
2 Multispectral Capturing Device for Gastroscopy and Colonoscopy Endoscopy, in other words looking inside, allows to examine interior organs by inserting a special device called an endoscope directly into the organs. Basic endoscope components are as follows: the front part tube, transmitting the light from the inside, an optical fiber system to deliver light and a lens system. .In a typical case we obtain a color image of the interior organs, which can be observed directly by the medical expert through the objective or in some cases it can be captured by an attached color camera. Endoscopy is mainly used in two types of medical tests: gastroscopy and colonoscopy. In gastroscopy we examine the front part of the alimentary canal, the endoscope is inserted through the mouth. In colonoscopy we examine the back part - endoscope is inserted through an anus. To obtain a multispectral image instead of a color image by the endoscope we have to attach a multispectral capturing device to the endoscope. The tube which transmits the light in fact transmits electromagnetic radiance, which is needed in capturing multispectral images. We have proposed the multispectral capturing device and applied it in endoscopy. The main component of the device is a liquid crystal tunable filter [Gat 2000], which is able to control spectral transmission electronically by keeping the band of a selected wavelength and removing the rest. It is possible by the proper polarization of crystal plates. The filter is the front part of the device mounted directly after the optical focusing system. The electromagnetic radiance which passed through the filter is registered by a high sensitivity monochrome camera. The monochrome camera summarizes the radiance from the visible light range. Thus if radiance passes through the filter, the camera registers only the radiance of the specified wavelength. The camera and the filter are synchronized by the described below, dedicated software. Such a process of the multispectral channel
134
A. Świtoński, R. Bieda, and K. Wojciechowski
acquisition is iterated for all selected wavelengths of the multispectral image. The resolution of the spectral domain depends on the ability of the filter - size of the filtered window. The structure of proposed capturing device is presented in Fig. 1.
Fig. 1 The structure of the proposed capturing device
The model of the dedicated control software is shown in Fig. 2. The main component is the Main Control Module, responsible for the synchronizing the camera and the filter, iterating channels acquisition and generating events with information of the acquisition progress. The GUI Module and Database Module register proper listeners of the events and they display captured image for the user and store it in the database respectively. The main module communicates with the filter and the camera by the proposed driver interfaces.
Fig. 2 Control software of the proposed multispectral device
In constructing the device, we have chosen a Varispec liquid crystal tunable filter [WWW-2 2010], an Andor Luca monochrome camera [WWW-3 2010]. The above mentioned window size of the filter gets 21 different disjoint spectral channels from the range of the visible light. The average switch time for the filter is about 50ms.
Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses
135
The Andor camera has a low noise CCD matrix, a VGA resolution and a 16 bit grayscale. The acquisition time depends on the intensity of delivered light to the observed objects. In the best case of the highest intensity of the light source which we have tested, it requires more than 30ms and usually not less than 50ms to take photos at the acceptable level of noise. It looks to be long, especially in comparison to the typical color camera, which requires only about 10ms of acquisition time, in good light conditions, for the whole color image. It is the result of the applied monochrome camera, which summarizes the intensity of the radiances of the whole visible light range. If the radiance passes through the filter which removes the radiance outside specified narrow range, the total intensity is very low, thus it requires a greater acquisition time. Because of technical hardware limitations, there is no possibility to use dedicated sensors for every captured spectral channel. To reduce the time we can use more sets of filter-camera and capture different channels asynchronously. Summarizing, in the best case of lighting, capture the image with full spectral resolution takes more than one second. During this time the objects should be static to achieve the channels with directly corresponding scenes. If they are non-static the special correction techniques have to be applied, for instance, to find transformations between channels and match them we can try to use the optical flow methods [Fleet et all 2006]. Prototype model of our mutlispectral capturing device is presented in Fig. 3. Exemplary images of the device are accessible at [WWW-1].
Fig. 3 Prototype multispectral capture device
In the following sections we present the results of the pixel based classification applied to the cancer diagnosis, regression for the spectrum correction and transformation of the spectral space into RGB space.
3 Processing and Recognition of the Multispectral Images Because a much more detailed description of the image pixels and usually better distinction of the objects in the spectral space than in the color space, in recognition of the multispectral images and detecting their objects, pixel based classification approach is used very often.
136
A. Świtoński, R. Bieda, and K. Wojciechowski
When we know the exact shapes of the objects spectrums we can try to match the pattern and pixels spectrums based on some kind of a distance metric. However, such a method requires spectrum correction techniques. Monochrome camera gives the intensity of the light which depends on the delivered light to the endoscope and the acquisition time. There is no defined simple transformation from the spectral space of the multispectral image and the spectral space of the pattern spectrums. The second problem is that the filter lets the radiance pass in a non uniform way for different wavelengths. The applied Varispec filter smothers the radiance of the blue light wavelengths (400-500nm) much more than for the red light (600-700nm). If we do not know the pattern spectrum, but we have labeled images with the detected objects, we can apply supervised machine learning techniques. We extract pixels signatures and on the basis of the labeled images we build train set to teach a classifier. In the worst case, it means we do not have pattern spectrums and labeled images, we can try unsupervised machine learning. Similarly to 3-channel color images we can use other classical approaches in the recognition process. Most often such a process consists of the following stages: preprocessing, which mainly removes the noise and in some cases enhances the image contrast, segmentation that divides the image into disjoint regions and labeling the regions based on the extracted features. Another challenge in processing multispectral images is visualization. Separate visualization of only a few channels is acceptable, but in case of tens or hundreds of them it is not. To obtain color image we have to transform a spectral space into three dimensional one. We may use one of the well known techniques for the feature space reduction or apply attributes selection methods. Another solution is to display original colors the same as in typical color image. Having multispectral data and pattern spectrum of three basic colors, red, green and blue, we should be able to restore the contents of the colors needed to describe the proper color. Unfortunately it is not as trivial as it looks to be, because of the problem described above of inaccurate spectrum representation by multispectral images. We can try to correct the spectrum or directly build transformers to transform from the spectral to the color space.
4 Cancer Detection We have tested cancer tissue detection based on the photodynamic diagnosis rules with multispectral imaging for the fluorescent spectrum acquisition. We have applied pixel based classification approach with a supervised machine learning. We have prepared a train set, based on the group of 20 patients with a diagnosed cancerous tissues. We have taken multispectral photographs of the patients tissue. Similarly like in the photodynamic diagnosis, we have delivered blue light by the endoscope and in the second iteration, the white light, which gives complete spectral information of the tissue. The pairs of the photographs with different light sources correspond directly. Such an outcome has been achieved by switching the light during the acquisition with the stationary endoscope and patient. We
Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses
137
have photographed uniform regions, which means that each photograph contains only cancerous or only healthy tissue. Finally, we have obtained 40 different photographs - 10 of them represent cancerous tissues and have been acquired with the blue light. the second group of 10 - healthy tissue with blue light, the third - cancerous tissue with white light and the last group - healthy tissue with white light. We have used our multispectral acquiring device with a full available spatial and spectral resolution - VGA mode and 21 spectral channels. In the following step, we have extracted pixels signatures - for every pixel in the image set with tissue representation, we have collected the spectral data and its image tissue class in separate signatures. The detection of the region of interest was necessary to complete the task. The objective is circular and camera matrix is rectangular. Similarly like in other medical devices, the our device acquires whole accessible region by the endoscope. Pixels outside the endoscope region are dark, represent nothing and do not need to be considered. In our experiments we have used Naive Bayes, k-Nearest Neighbor, Multilayer Perceptron, Kernel SVM, LibLinear, RBFNetwork, ConjuctiveRule, DecisionTable and PART classifiers [Witten and Frank 2005]. We have selected parameters ranges of the classifi-
ers and iterated tests for the each unique set of the classifier parameter values. For instance, for the Multilayer Perceptron we have examined different learning rates, a number of epochs, complexity of the network and the influence of attributes normalization. The results of the detection are shown in Fig. 4. The chart presents aggregated classifier efficiencies, understood as percents of correctly detected pixels, obtained by the tested classifiers. In the aggregation we have chosen the highest efficiency of every classifier from the set of experiments made for selected ranges of their parameters. The results are satisfactory - the best classifier, multilayer perceptron, has almost 98% efficiency. A bit worse are kNN and RBFNetworkk classifiers with 95% efficiency. In general, white light gives much better results. In the best case for the blue light we have acheived only 72% efficiency. 100 95 90
80 White light
75
Blue light
70 65 60 55
Classifier
Fig. 4 Cancer detection efficiencies
T P AR
e
D ec is io nT ab le
or k
C on ju ct iv eR ul
R B FN et w
LI B LI ne ar
S VM
M LP
K N N
B ay es
50
N ai ve
PCT Correct
85
A. Świtoński, R. Bieda, and K. Wojciechowski
138
For the MultiLayet Perceptron, we have visualized the detection results of three exemplary images (Fig. 5). The images are labeled with three colors: black - region outside ROI, green - detection of healthy tissue and red - cancer detection. The first image represents healthy tissue and as we have expected, it is almost green. The second one is a photograph of cancerous tissue and correctly it is almost red. The third one contains some parts of healthy tissues and some of cancerous - that is why it is partially red and partially green. We can notice some noise in the images, which is probably the result of the acquisition process by the monochrome camera in the low intensity conditions, caused by the applied spectral filtering. However, it can be easily remove by a simple postprocessing, by means of for instance morphological operators.
Fig. 5 Cancer detection
Further details of the experiments and the results can be found in [Świtoński et al. 2010].
5 Spectrum Estimation As it has been described above, other way of detecting objects based on the spectral signatures of image pixels are the spectrum matching techniques. In such a case the knowledge of the pattern spectrums of detected objects is required and what is more, to compare spectral data effectively, both spectrums have to be represented in the same spectral space. Because there is usually no given transformation from the space of pixel signatures of the acquiring device to the space of the patterns, in the introductory step in such an approach, we have to find the transformation. In some cases if we know detailed features of the acquiring device, we are able to find it analytically. For the proposed device we should know smothering properties of the filter, its window size in every spectral point and for the camera respectively, relationship between the intensity of the light and the monochrome scale. Such information is usually inaccessible and what is more, the reflected spectrum depends on the circumstances of acquisition as well, as for instance the intensity and spectrum of the delivered light and exposition time of the camera. In our case it is even more complicated, because the light is delivered by the endoscope. Thus reflected intensity depends on the distance to the photographed surface - the greater the distance, the weaker the light intensity.
Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses
139
It is much easier to estimate spectrums based on the train set in which we have pixels signatures and their pattern spectrums. The transformation found will be valid for the mutlispectral images taken in the same conditions. We have used GretagMacbeth colorchecker (Fig. 6) to prepare the train set. The colorchecker has a specified datasheet with exact spectrums of every color. We have taken separate image for each color, extracted pixels signatures inside the ROI and associated them with corresponding spectrum.
Fig. 6 GretagMacbeth colorchecker
Fig. 7 presents the spectrums of the colors: green, blue, orange and white calculated as average values for the spectral channels of the proper multispectral image and corresponding pattern spectrums specified by the colorchecker datasheet. The differences are remarkable. The first is the scale which should be normalized, for instance in the linear way. It is noticeable that normalization factor is not constant and differs in the spectral dimension. It is much greater for the high frequencies than for the low frequencies. 1
1000
0,9
900
0,8 800
0,7 Green
700
Blue Orange
600
White
Green
0,6
Blue
0,5
Orange
0,4
White
0,3
500
0,2 400
0,1
68 8
65 6
62 4
59 2
56 0
52 8
49 6
46 4
43 2
40 0
65 6
68 8
62 4
59 2
52 8
56 0
49 6
46 4
40 0
0
43 2
300
Fig. 7 Mutlispectral image (left) and pattern (right) spectrums of the colors: green, blue, orange and white.
We can explain that by studying Varispec filter properties, shown in Fig. 8. The charts present windows size and the intensity passing by the filter for the acquired spectral channels. The sizes of windows are almost the same, but the intensity passing is not uniform. The radiance of shorter wavelengths is smothered more than for longer ones. Unfortunately, there is no such an explanation for the differences of the wavelengths above 700nm. For the last channel we can notice a little
A. Świtoński, R. Bieda, and K. Wojciechowski
140
bit weaker radiance passing than for the previous one, but the differences on the spectrums are much more remarkable. It can be caused by the delivered white light source, which may not contain the radiance above 700nm or by the properties of the Andor camera. The camera may calculate intensities of the wavelengths not equally in the process of acquiring a monochrome image. Filter 51384 70%
400 410 420 430 440
60%
450 460 470 480 490
50%
500 510
PolarizedT(% )
520 530
40%
540 550 560 570 580
30%
590 600 610 620
20%
630 640 650 660 670
10%
680 690 700 710
0% 400
720
450
500
550
600
650
700
750
Wavelength (nm)
Fig. 8 Spectral characteristic of the VariSpec Filter
We have assumed linear relationship between the spectral space of the multispectral image and patterns. To find linear combinations for the following wavelengths we have used the classical least mean square method. We have tested two approaches called 1vs1 and ALLvs1. In the 1vs1 spectrum value in the pattern space relies only on the corresponding single value of the multispecttral image space. To the contrary, in the ALLvs1 approach it is the linear combination of all the values. Relative absolute error
Mean absolute error 0,16
100 90
0,14
80 0,12
70 ALLvs1 1vs1
60
ALLvs1
0,1
1vs1
50 0,08
40 30
0,06
40 0 41 6 43 2 44 8 46 4 48 0 49 6 51 2 52 8 54 4 56 0 57 6 59 2 60 8 62 4 64 0 65 6 67 2 68 8 70 4 72 0
20 0,04 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21
Fig. 9 Linear regression errors
The results of the linear regression are presented in Fig. 9, which shows relative absolute and mean absolute errors for the following wavelengths. The total mean relative absolute error for the 1vs1 is 49%, for the ALLvs1 42%. and total mean absolute error is 0.1 and 0.08 respectively. The substantial difference between the approaches for the shortest waves in the range 400-430nm could be explained by the properties of the VariSpec filter. It smothers those frequencies in the strongest
Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses
141
way. It cause the noise that makes those channels less informative and to estimate spectrum better some nearest values are required. The relative error of 40% does not look to be satisfactory, but mean absolute error of about 0.08 with scale in the ranging from 0 to 1 means that the average error is less than 10% of the scale. This result is surely acceptable. Such differences between relative and absolute errors are probably caused by the low intensities spectrums of dark colors. Low absolute errors in such cases effect great relative ones. In Fig. 10, we have visualized the results of the estimation of four above presented colors: white, blue, green and orange. They confirm prior observation about better estimation by the ALLvs1 approach, which seems to be unintuitive. The train dataset has been prepared in the analogous conditions to the real endoscopy the light not very intensive and short exposure time. Nonlinear properties of the VariSpec filter or the Andor camera could be another reason of the errors. Smothering the radiance or acquiring it by the monochrome camera could depend on its intensity in nonlinear way. The results of spectrum estimation by the kernel methods could be found in [Michalak and Switonski. 2010]. Green
Blue
1
1
0,9
0,9
0,8
0,8
0,7
0,7
0,6
Pattern
0,5
ALLvs1
0,6
Pattern ALLvs1
0,5
1vs1
0,4
1vs1
0,4
0,3
0,3
0,2
0,2
0,1
0,1
0
0
40 0 41 6 43 2 44 8 46 4 48 0 49 6 51 2 52 8 54 4 56 0 57 6 59 2 60 8 62 4 64 0 65 6 67 2 68 8 70 4 72 0
1
2
3
4
5
6
7
8
Orange
9 10 11 12 13 14 15 16 17 18 19 20 21
White
1
1
0,9
0,9
0,8
0,8
0,7
0,7
0,6
Pattern
0,5
ALLvs1
0,6
Pattern ALLvs1
0,5
1vs1
0,4
1vs1
0,4
0,3
0,3
0,2
0,2
0,1
0,1
0
40 0 41 6 43 2 44 8 46 4 48 0 49 6 51 2 52 8 54 4 56 0 57 6 59 2 60 8 62 4 64 0 65 6 67 2 68 8 70 4 72 0
0 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21
Fig. 10 Spectrum estimation of the selected colors
To reduce estimation errors, we can apply the classification approach. In that case we classify pixels signatures as one of the colors of the colorchecker and replace its spectrum by the spectrum of the identified color. It is possible to achieve classification efficiency over 95%, with SVM classifier [Michalak and Switonski. 2010]..
142
A. Świtoński, R. Bieda, and K. Wojciechowski
However, such a solution is not scalable and in fact should be applied only to the spectrums available in the colorchecker.
6 Multispectral Image Visualization in Endoscopy Diagnostics During gastroscopy and colonoscopy diagnosis inspection there is a strong requirement of a quick preview for an medical expert to be able to focus on the most significant and suspiciously looking regions and take labeled photos for further analysis. It is necessary especially in the development phase of the multispectral device supporting medical diagnosis of any kind. If we have prepared, tested and successfully evaluated detection methods or calculating properties helpful in diagnosis, we can visualize their results. To avoid the loss of information we should display all channels of a multispectral image. It is impractical - the preview which contains 21 separate monochrome images is unclear for a human. Theoretically, we can reduce the number of separate images by grouping every three channels in single color images, as for instance RGB components. However, in such a case we still have 7 different images with much more difficult interpretation. If we know which spectral channels are most informative for a given diagnosis we can select only them for the visualization. There is one more problem with the preview on raw multispectral channels. The proposed device gives 16 bit monochrome scale and as we can notice in Fig. 7 only a small range of the scale is usually used. In such a situation image objects would be indistinguishable and would look to be filled with a single color. Human sight is not able to recognize more than 256 graylevels and what is more, monitors with 32bit color depth are limited to 256 graylevels too. That is why, we have to apply some preprocessing of the images, for instance linear normalization of the given range to complete visible grayscale. The crucial problem is the proper selection of the normalized range. If we normalize according to the range - maximum and minimum of the processed image, there would be no direct correspondence among displayed images. Probably the best quick preview visualization is default color image. The endoscope can be extended by the external device called teaching, which gives ability to observe color image of the inspected region, but this solution has two disadvantages. It is uncomfortable for the medical expert, who looks through the teaching and takes photos by dedicated software. What is more, by attaching teaching, the reflected light is splitted into two destinations - teaching and the camera. It means the radiance reaching the camera is weaker, so it requires a greater exposure time and increases already long acquisition of single frame. Taking into consideration all above described problems, we have decided to calculate color image based on the multispectral data. If we know accurate spectrum of the analyzed color and spectrum of three basic color components - red, green and blue, we can calculate them directly. As it has been presented in the previous section, estimating accurate spectrum generates noticeable errors, so we have evaluated calculation of the color components directly using the pixels signatures.
Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses
143
We have used train dataset of GretagMacbeth colorchecker once again, but its spectrums have been replaced by the red, green and blue color components values. Similarly like in the previous case we have used linear transformation, approximated by mean square method and calculated mean absolute error. The results are presented in Fig. 11. They are much better than the spectrum estimation - about 6.5% of the global scale. Once again. we could apply described in the previous section classification approach, but we would obtain only 24 color palette. 50 45 40 35 30 25 20 15 10 blue
green
red
Fig. 11 Color regression results
In Fig. 12 we have visualized the color regression results on the example multispectral image of the skin tissue diagnosis. The right image is additionally postprocessed by the Gaussian blurring to remove visible noise. The visualization seems to be correct - the region outside ROI is black and skin is represented by the pink and red colors. We could suspect that the image has been captured in worse light conditions than the images of the train dataset.
Fig. 12 Calculated by linear regression color image
7 Discussion and Conclusions We have focused on the applications of the mutlispectral imaging in the colonoscopy and gastroscopy diagnosis. We have proposed multispectral acquiring device with ability to connect to an endoscope and analyzed their properties. The device is flexible. It also can be used in taking normal multispectral photographs without the endoscope. In such a situation in the place of the endoscope we have to mount proper optical objective which is responsible for the zoom and focuses the light on the camera matrix. The control software defines driver
144
A. Świtoński, R. Bieda, and K. Wojciechowski
interfaces of the hardware, thus we can apply different camera and different filter. It only requires implementation of those simply interfaces, which usually is based on the native drivers. We have examined multispectral imaging in tissue cancer detection by pixel base classification approach. Further, we have tried to estimate accurate spectrum and color of the image pixels based on their spectral signatures. We have applied linear transformation of the multispectral image spectral space. The results of the pixels based classification approach in cancer detection are satisfactory. The best classifier efficiency is rated at the level 98%. It is compliant with photodynamic diagnosis rules. The spectrum estimation produce noticeable errors. Both spectral spaces differ and we can form a hypothesis that they cannot be transformed in the linear way. We are going to examine the nonlinear methods to improve the results and on finishing that, apply spectrum matching approach to object detection.
Acknowledgment This work was financed from the PolishMinistry of Science and Higher Education resources in 2009-2012 years as a research project.
References [Carvalho et al 2003] Carvalho, O.A., Carvalho, A.P.F., Guimaraes, R.F., Lopes, R.A.S., Guimaraes, P.A., Souza, M.E., Pedreno, J.: Classification of hyperspectral image using SCM methods for geobotanical analysis in the Brazilian savanna region. In: Proc. of the IEEE Intl. Symp. on Geoscience and Remote Sensing (2003) [Gat 2000] Gat, N.: Imaging spectroscopy using tunable filters: a review. In: Proc. SPIEInt. Soc. Opt. Eng. (2000) [Hsu 2007] Hsu, P.H.: Feature extraction of hyperspectral images using matching pursuit. ISPRS J. of Photogrammetry and Remote Sensing 62(2), 78–92 (2007) [Lau et al 2008] Lau, D., Villis, C., Furman, S., Livett, M.: Multispectral and hyperspectral image analysis of elemental and micro-Raman maps of cross-sections from a 16th century painting. Analytica Chimica Acta 610(1), 15–24 (2008) [Masood et al. 2008] Masood, K., Rajpoot, N.: Spatial analysis for colon biopsy classification from hyperspectra. Annals of BMVA (4), 1–16 (2008) [Michalak and Switonski. 2010] Michalak, M., Świtoński, A.: Spectrum evaluation on multispectral images by machine learning techniques. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2010. LNCS, vol. 6375, pp. 325–333. Springer, Heidelberg (2010) [Rajpoot 2004] Rajpoot, K., Rajpoot, N.: SVM optimization for hyperspectral colon tissue cell classification. Medical Image Computing and Computer Assisted Intervention (2004) [Świtoński et al. 2010] Świtoński, A., Michalak, M., Josiński, H., Wojciechowski, K.: Detection of tumor tissue based on the multispectral imaging. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2010. LNCS, vol. 6375, pp. 325–333. Springer, Heidelberg (2010)
Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses
145
[Witten and Frank 2005] Witten, I.H., Frank, E.: Data mining, 2nd edn. Practical machine learning tools and techniques. Elsevier, Amsterdam (2005) [Woolfe et al. 1999] Woolfe, F., Maggioni, M., Davis, G., Warner, F., Coifman, R., Zucker, S.: Hyperspectral microscopic discrimination between normal and cancerous colon biopsies. IEEE Trans. on Medical Imaging 99(99) (1999) [WWW-1 2010] Website of the PJWSTK Multispectral group, http://multispectral.pjwstk.edu.pl/ (accessed December 1, 2010) [WWW-2 2010] Andor Luca Website, http://www.andor.com/scientific_cameras/luca (accessed Novemeber 2, 2010) [WWW-3 2010] Varispec Filters, http://www.spectralcameras.com/varispec (accessed Novemeber 2, 2010)
A Machine Learning Approach to Mining Brain Stroke Data T. Mroczek1, J.W. Grzymała-Busse1,2, Z.S. Hippe1, and P. Jurczak1 1
Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, 35-225 Rzeszów, Poland {zhippe,tmroczek,pjurczak}@wsiz.rzeszow.pl 2 Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence (KS) 66045-1621, USA
[email protected]
Abstract. Learning models related to brain stroke data of 162 anonymous patients were generated in the form of sets of production rules using for selecting of descriptive attributes Glasgow Outcome Scale (GOS) and Modified Rankin Scale (mRS). The developed set of rules were then optimized leading to high accuracy rules, having distinctly decreased number of logic condition.
1 Introduction A brain stroke is one of the most serious and frequent neurological damages worldwide. Due to rapidly developing loss of brain functions, a stroke is a very serious medical emergency, habitually leading to death. A proper evaluation of patients that survived stroke leads to appropriate medical care and stroke rehabilitation. It is very important to undergo treatment to help stroke patients to return to normal life. This process is assisted with the scale outcomes that help with an estimate of the everyday living skills and dependence of these patients on help of other people. Such estimate is difficult since there is no agreement on the scale outcomes. The document „International classification of impairments, disabilities and handicaps (ICIDH)” issued by the World Health Organization determines the following groups of the stroke scale outcomes: damage scales, functional scales and scales defining the quality of life [Stram et al. 2005]. In an assessment of the medical care, the following conditions should be taken into an account: disability associated with the skills of everyday living, also called Activities of Daily Living, ADL, the dependence of a patient on other people and the status of consciousness. There exists a few such scales [Książkiewicz et al. 2007] that are are frequently used in medical practice, such as the National Institutes of Health Stroke Scale (NIHSS), Stroke Impact Scale (SIS), Barthel Index (BI), Glasgow Outcome Scale (GOS) and Modified Rankin Scale (mRS).
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 147–158. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
148
T. Mroczek et al.
An extended team conducted research on the analysis of real-life data describing patients affected by a brain stroke. Additionally, some selected tools of machine learning were used for an estimate of properties of two distinct scales used for stroke patients: the Glasgow Outcome scale and the modified Rankin scale. An attempt was made to find a correlation between standards applied to the patient status using both scales. The results of these research was published in [Grzymała-Busse et al. 2008], [Grzymała-Busse et al. 2009]. Recently, a brain stroke data set was processed [Mroczek et al. 2010] to find hidden regularities in a form of the rule sets. The main idea was to use a data mining system called NGTS to find knowledge hidden in the data set. This paper extends the previous results and is aimed at finding an optimal model in a form of the rule set for all outcomes associated with brain stroke patients for two scales: GOS and mRS.
2 Experiments Due to an analysis of the rule sets induced from the stroke patient data set, conducted with the help of the NGTS system, the most relevant attributes (from a view point of classification) were identified in the set of 42 attributes describing the data set1. During our previous experiments with both scales, GOS and mRS, some relevant attributes were also selected [Mroczek et al. 2010]. Using these results, our current experiments were intended to induce rules of the type IF...THEN for an estimate of all degrees of the patient functional status for both scales, GOS and mRS. To accomplish that, some attributes were deleted and some operations were performed to optimalize the rule set; or more precisely, to minimize the classification error. These operations were based on deleting redundant rules and subsumed rules as well as deleting some redundant conditions. A corresponding decision on the rule deletion was based on some parameters associated with rules, such as the rule strength, accuracy, specificity and generality. As a result, the selected rules were characterized as being more efficient during classification and/or classifying more cases from a given concept. On the other hand, redundant conditions were removed from the rules taking into account
1
The database contains 162 cases of brain dysfunction. Each case stored in the database was diagnosed, confirmed and assigned to concepts of two scales: Glasgow Outcome Scale (GOS) and modified Rankin Scale. The main difference between the two scales is based on the classification method of a patient’s condition. The Glasgow Outcome Scale assigns five discrete values to patient’s status: 1 (death), 2 (persistent vegetative state), 3 (severe disability), 4 (moderate disability) and 5 (good recovery). The second scale (the modified Rankin Scale) has six levels of the stroke severity: 0 (healthy patient), 1 (selfreliant patient), 2 (small disability), 3 (distinct disability), 4 (outsized disability), 5 (total disability) and 6 (artificially introduced degree, indicating death). The cases in the database are described using 42 attributes, which can be formally assigned to five classes (concepts) corresponding to scores of the Glasgow Outcome Scale and six classes (concepts) corresponding to scores of the modified Rankin Scale, respectively.
A Machine Learning Approach to Mining Brain Stroke Data
149
the attribute frequency in the induced rules. Thus, an estimate of the rule efficiency was more complex, with the error rate not being the only criterion.
3 Results During the first stage of our research, the brain stroke data set, with cases classified according to both scales, Glasgow Outcome and modified Rankin, was used for rule induction. Then resubstitution was used for validation, results are presented in Table 1. Table 1 Resubstitution Results Glasgow
Modified
Outcome Scale
Rankin Scale
84
99
Number of cases classified correctly [%]
85.80
89.51
Number of cases classified incorrectly [%]
0.00
0.00
Number of cases unclassified at all [%]
14.20
10.49
Number of rules
During rule induction for the scale GOS, 84 rules were generated. In this rule set there was 11 rules for the first degree GOS1, 4 rules for the second degree GOS2, 12 rules for the third degree GOS3, 27 rules for the fourth degree GOS4 and 30 rules for the fifth degree GOS5. For the scale mRS 99 rules were induced. This set of 99 rules contains 16 rules for the degree mRS0, 24 rules for the degree mRS1, 22 rules for the degree mRS2, 6 rules for the degree mRS3, 12 rules for the degree mRS4, 9 rules for the degree mRS5 and 10 rules for the degree mRS6. It was observed that the number of induced rules may be reduced since some rules have not participated in classification or classified cases that were classified correctly by other rules. Additionally, the frequency of attributes occurring in rules, for both scales, is presented in Table 2. Due to identification of the most relevant attributes (from a classification view point), some conditions were removed from rules. This operation did not affect the number of correctly classified cases. Therefore the optimal rule sets, for both scales, GOS and mRS, were obtained. Because of the space limit, we present only some reduced rules, for the GOS scale (Table 3) and for the mRS scale (Table 4).
150
T. Mroczek et al. Table 2 Attributes in Rule Sets Frequency quotient
Variable2
Glasgow Outcome Scale (GOS)
modified Rankin Scale (mRS)
GOS1 GOS2 GOS3 GOS4 GOS5 mRS0 mRS1 mRS2 mRS3 mRS4 mRS5 mRS6 A2 A12 A14 C1 A1 A4 A8 A10 A11 B1 B3 B4 B5 B7 B8 B9 B11 B12 B13 B14 B15 B17 B18 BH C3 C4 C5 C6 C7 C8 D1 D2 D3 F1 F3 F5 F6 F8 F9 G1 G2 G3
2
5 4 2 1 3 3 3 3 1 1 1
1 3 1 1 1 1
3 5 4 1 4 2 1 5 2 1 2 1 2
1 1
11 15 18 4 7 12 9 12 3 7 4 5 2 1 1 5 3
11 18 16 9 10 13 8 16 9 4 7 5 5 2 1 5 3
1 2 1
1 2
1
1 5 2 2 2
10 7 5 3 8 8 6 11 4 4 4 3 3 3 1 1 2 1 1 3 1 1
2 1 1 1
1 3 1
4 1 1
1 1 1 1 3 2 2
2 2 2 2 3 1 1 1 2 1 1
1 1 1 1
1 1
9 17 11 7 6 8 7 13 6 5 5 4 5 7 2 4 4 3 3 1 6 3 5 2 2 3 1 3 3 4 1 1
1 1 1 1
9 14 13 3 9 8 8 11 1 10 10 5 2 3 2 4 2 2 3 1 2 4 2 1 1 4 1 3 3 2 3 1 2 1 1 3 1
2 1 2
1 1 1
All variables were described in [4].
2 3 3 1 1 2 1 2 1 2 1
2 2 1
1 1
3 2 1 6 4 1
5 4 2 1 2 3 2 3 1 1 1
1
1
1
1 1
1 1 1 1
1
1
1
1 1 1
1
4 1
1 1
6 2 5
1 1
1
1
5 4 7 2 4 3 1 5 2 3 3
1
1 1 1
1 1
A Machine Learning Approach to Mining Brain Stroke Data Table 3 Optimized rule sets induced by NGTS for the scale GOS Original rules
Optimized rules
RULE 16 (H =13.4962) IF A2 IS m AND A14 IS brain_infraction AND A1 IS 17..67 AND A4 IS city AND A10 IS above_3_days AND C7 IS awareness_problem THEN Glasgow_Outcome_Scale IS 3
RULE 16 (H =13.4962) IF A1 IS 17..67 AND A10 IS above_3_days AND C7 IS awareness_problem THEN Glasgow_Outcome_Scale IS 3
RULE 29 (H =13.3755) IF A2 IS f AND A12 IS brain_infraction AND A14 IS brain_infraction AND A4 IS city AND A10 IS below_1_hour AND C7 IS alert THEN Glasgow_Outcome_Scale IS 4
RULE 29 (H =13.3755) IF A14 IS brain_infraction AND A10 IS below_1_hour AND C7 IS alert THEN Glasgow_Outcome_Scale IS 4
RULE 37 (H =11.3080) IF A12 IS brain_infraction AND A14 IS brain_infraction AND A1 IS 67..90 AND A8 IS 3..37 AND B1 IS present AND B3 IS present AND B4 IS present AND C4 IS 55..100 THEN Glasgow_Outcome_Scale IS 4
RULE 37 (H =11.3080) IF A1 IS 67..90 AND A8 IS 3..37 AND B1 IS present AND B4 IS present AND C4 IS 55..100 THEN Glasgow_Outcome_Scale IS 4
RULE 43 (H =9.0931) IF A12 IS brain_infraction AND A14 IS brain_infraction AND A4 IS city AND A8 IS 3..37 AND A10 IS 3-6_hours AND B3 IS absent AND B4 IS absent AND B5 IS absent AND B8 IS absent AND B9 IS absent AND B11 IS absent AND D3 IS absent THEN Glasgow_Outcome_Scale IS 4
RULE 43 (H =9.0931) IF A12 IS brain_infraction AND A4 IS city AND A10 IS 3-6_hours AND B3 IS absent AND B5 IS absent AND B9 IS absent AND B11 IS absent AND D3 IS absent THEN Glasgow_Outcome_Scale IS 4
151
152
T. Mroczek et al.
RULE 49 (H =7.9591) IF A2 IS m AND A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A8 IS 3..37 AND B1 IS present AND B4 IS absent AND B7 IS absent AND B9 IS present AND B17 IS absent THEN Glasgow_Outcome_Scale IS 4
RULE 49 (H =7.9591) IF A2 IS m AND A12 IS brain_infraction AND C1 IS 36..37.6 AND B1 IS present AND B4 IS absent AND B9 IS present AND B17 IS absent THEN Glasgow_Outcome_Scale IS 4
RULE 59 (H =12.7232) IF C1 IS 36..37.6 AND A1 IS 17..67 AND A4 IS city AND A8 IS 3..37 AND A10 IS above_3_days AND A11 IS neurological_department AND B1 IS present AND B3 IS absent AND B4 IS absent AND B7 IS absent AND B9 IS absent AND B11 IS absent AND B15 IS absent THEN Glasgow_Outcome_Scale IS 5
RULE 59 (H =12.7232) IF A1 IS 17..67 AND A10 IS above_3_days AND B1 IS present AND B7 IS absent AND B9 IS absent AND B11 IS absent AND B15 IS absent THEN Glasgow_Outcome_Scale IS 5
RULE 60 (H =12.6375) IF A2 IS f AND C1 IS 36..37.6 AND A4 IS city AND A8 IS 3..37 AND A10 IS above_3_days AND B1 IS present AND B3 IS absent AND B4 IS absent AND B17 IS absent THEN Glasgow_Outcome_Scale IS 5
RULE 60 (H =12.6375) IF A2 IS f AND C1 IS 36..37.6 AND A4 IS city AND A10 IS above_3_days AND B1 IS present AND B17 IS absent THEN Glasgow_Outcome_Scale IS 5
RULE 61 (H =12.5015) IF A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A4 IS city AND A8 IS 3..37 AND A10 IS 3-6_hours AND A11 IS neurological_department AND B1 IS present AND B5 IS absent AND B8 IS absent AND B9 IS absent
RULE 61 (H =12.5015) IF A4 IS city AND A10 IS 3-6_hours AND B5 IS absent AND B8 IS absent AND B9 IS absent AND B11 IS present AND B15 IS absent THEN Glasgow_Outcome_Scale IS 5
A Machine Learning Approach to Mining Brain Stroke Data AND B11 IS present AND B15 IS absent THEN Glasgow_Outcome_Scale IS 5 RULE 65 (H =11.8352) IF A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A4 IS city AND A8 IS 3..37 AND A10 IS 2-3_days AND A11 IS neurological_department AND B3 IS absent AND B4 IS absent AND B5 IS absent AND B9 IS absent AND B15 IS absent THEN Glasgow_Outcome_Scale IS 5
RULE 65 (H =11.8352) IF A4 IS city AND A10 IS 2-3_days AND B4 IS absent AND B5 IS absent AND B9 IS absent AND B15 IS absent THEN Glasgow_Outcome_Scale IS 5
RULE 77 (H =7.5698) IF A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A4 IS city AND A8 IS 3..37 AND A10 IS above_3_days AND A11 IS neurological_department AND B3 IS absent AND B4 IS absent AND B5 IS absent AND B7 IS absent AND B9 IS absent AND B11 IS absent AND B14 IS absent AND B15 IS absent AND B17 IS absent AND B18 IS absent AND Brain_Hemorrhage IS 90..200 AND C4 IS 55..100 AND C5 IS RULEr_rhythm AND C6 IS stay_alert AND C7 IS alert AND C8 IS POCS THEN Glasgow_Outcome_Scale IS 5
RULE 77 (H =7.5698) IF B11 IS absent AND B14 IS absent AND B15 IS absent AND B17 IS absent AND B18 IS absent AND Brain_Hemorrhage IS 90..200 AND C4 IS 55..100 AND C5 IS RULEr_rhythm AND C6 IS stay_alert AND C7 IS alert AND C8 IS POCS THEN Glasgow_Outcome_Scale IS 5
153
154
T. Mroczek et al. Table 4 Optimized rule sets induced by NGTS for the scale mRS
Original rules
Optimized rules
RULE 5 (H =12.7294) IF A2 IS m AND C1 IS 36..37.6 AND A1 IS 17..67 AND A4 IS city AND A8 IS 3..37 AND A10 IS above_3_days AND A11 IS neurological_department AND B3 IS absent AND B4 IS absent AND B5 IS absent AND B8 IS absent AND B11 IS absent AND F8 IS absent THEN Rankine_Scale IS 0
RULE 5 (H =12.7294) IF A2 IS m AND A1 IS 17..67 AND A4 IS city AND A10 IS above_3_days AND B11 IS absent AND F8 IS absent THEN Rankine_Scale IS 0
RULE 9 (H =11.7300) IF A2 IS m AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A4 IS city AND A8 IS 3..37 AND A10 IS 2-3_days AND A11 IS neurological_department AND B1 IS present AND B3 IS absent AND B4 IS absent AND B8 IS absent AND B9 IS absent THEN Rankine_Scale IS 0
RULE 9 (H =11.7300) IF A4 IS city AND A10 IS 2-3_days AND B1 IS present AND B3 IS absent AND B4 IS absent AND B8 IS absent AND B9 IS absent THEN Rankine_Scale IS 0
RULE 27 (H =8.4409) IF A2 IS m AND A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A1 IS 17..67 AND A8 IS 3..37 AND B4 IS absent AND B5 IS absent AND B7 IS absent AND B9 IS present THEN Rankine_Scale IS 1
RULE 27 (H =8.4409) IF A2 IS m AND A12 IS brain_infraction AND C1 IS 36..37.6 AND A1 IS 17..67 AND B9 IS present THEN Rankine_Scale IS 1
RULE 28 (H =8.0772) IF A2 IS f AND A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6
RULE 28 (H =8.0772) IF B4 IS absent AND B11 IS absent AND B12 IS absent AND B13 IS absent
A Machine Learning Approach to Mining Brain Stroke Data AND A1 IS 17..67 AND A4 IS city AND A8 IS 3..37 AND A10 IS above_3_days AND A11 IS neurological_department AND B1 IS absent AND B3 IS absent AND B4 IS absent AND B5 IS absent AND B7 IS absent AND B8 IS absent AND B9 IS absent AND B11 IS absent AND B12 IS absent AND B13 IS absent AND B15 IS absent AND B17 IS absent AND B18 IS absent AND Brain_Hemorrhage IS 90..200 AND C4 IS 55..100 AND C6 IS stay_alert AND C7 IS alert AND C8 IS POCS THEN Rankine_Scale IS 1
AND B15 IS absent AND B17 IS absent AND B18 IS absent AND Brain_Hemorrhage IS 90..200 AND C4 IS 55..100 AND C6 IS stay_alert AND C7 IS alert AND C8 IS POCS THEN Rankine_Scale IS 1
RULE 30 (H =7.9405) IF A2 IS f AND A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A4 IS city AND A8 IS 3..37 AND A11 IS neurological_department AND B1 IS present AND B3 IS absent AND B4 IS absent AND B5 IS absent AND B7 IS absent AND B8 IS absent AND B9 IS absent AND B11 IS absent AND B12 IS absent AND B13 IS absent AND B14 IS absent AND B15 IS absent AND B17 IS absent AND B18 IS absent AND Brain_Hemorrhage IS 90..200 AND C3 IS 60..95 AND C4 IS 55..100 AND C5 IS RULEr_rhythm AND C6 IS stay_alert AND C7 IS alert
RULE 30 (H =7.9405) IF B9 IS absent AND B11 IS absent AND B13 IS absent AND B14 IS absent AND B15 IS absent AND B17 IS absent AND B18 IS absent AND Brain_Hemorrhage IS 90..200 AND C3 IS 60..95 AND C4 IS 55..100 AND C5 IS RULEr_rhythm AND C6 IS stay_alert AND C7 IS alert AND C8 IS difficult_THEN_define AND D3 IS absent AND F3 IS absent AND F6 IS present THEN Rankine_Scale IS 1
155
156
T. Mroczek et al.
AND C8 IS difficult_THEN_define AND D3 IS absent AND F3 IS absent AND F6 IS present THEN Rankine_Scale IS 1 RULE 34 (H =6.4896) IF A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A8 IS 3..37 AND A11 IS neurological_department AND B3 IS present AND B5 IS absent AND B7 IS absent AND B11 IS present AND B12 IS absent AND B15 IS absent AND B18 IS present THEN Rankine_Scale IS 1
RULE 34 (H =6.4896) IF A11 IS neurological_department AND B3 IS present AND B11 IS present AND B12 IS absent AND B15 IS absent AND B18 IS present THEN Rankine_Scale IS 1
RULE 54 (H =7.7205) IF A14 IS brain_infraction AND A4 IS city AND A8 IS 3..37 AND B1 IS present AND B3 IS present AND B5 IS absent AND B7 IS absent AND B8 IS present AND B15 IS absent
RULE 54 (H =7.7205) IF A4 IS city AND B1 IS present AND B8 IS present AND B15 IS absent
THEN Rankine_Scale IS 2
THEN Rankine_Scale IS 2 RULE 62 (H =4.0062) IF A2 IS f AND A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A1 IS 17..67 AND A4 IS city AND A8 IS 3..37 AND B1 IS absent AND B3 IS absent AND B7 IS absent AND B9 IS absent AND B12 IS absent AND B13 IS absent AND B14 IS absent AND B17 IS absent AND B18 IS absent AND Brain_Hemorrhage IS 90..200 AND C3 IS 60..95 AND C4 IS 55..100
RULE 62 (H =4.0062) IF A14 IS brain_infraction AND B1 IS absent AND B3 IS absent AND B18 IS absent AND Brain_Hemorrhage IS 90..200 AND C3 IS 60..95 AND C4 IS 55..100 AND C6 IS stay_alert AND C7 IS alert AND C8 IS difficult_THEN_define THEN Rankine_Scale IS 2
A Machine Learning Approach to Mining Brain Stroke Data
157
AND C6 IS stay_alert AND C7 IS alert AND C8 IS difficult_THEN_define THEN Rankine_Scale IS 2 RULE 59 (H =6.0123) IF A2 IS f AND A12 IS brain_infraction AND A14 IS brain_infraction AND C1 IS 36..37.6 AND A4 IS city AND A8 IS 3..37 AND A11 IS neurological_department AND B1 IS present AND B3 IS absent AND B4 IS absent AND B5 IS absent AND B7 IS absent AND B8 IS absent AND B9 IS absent AND B11 IS absent AND B12 IS absent AND B13 IS absent AND B15 IS absent AND B17 IS absent AND B18 IS absent AND C4 IS 55..100 AND C5 IS RULEr_rhythm AND C6 IS stay_alert AND C7 IS alert AND D1 IS absent AND D2 IS absent AND D3 IS absent AND F1 IS absent AND F3 IS absent AND F6 IS absent THEN Rankine_Scale IS 2
RULE 59 (H =6.0123) IF B1 IS present AND C5 IS RULEr_rhythm AND C6 IS stay_alert AND C7 IS alert AND D1 IS absent AND D2 IS absent AND D3 IS absent AND F1 IS absent AND F3 IS absent AND F6 IS absent THEN Rankine_Scale IS 2
4 Conclusion Our rule set optimization process proved that it is possible to simplify rule sets. Results of our experiments show that in the set of 84 rules induced for the scale GOS and 99 rules induced for the scale mRS two rules were subsumed by other rules and 200 conditions were redundant. Deletion of these redundant conditions did not affect the accuracy. The same attributes occur in the rule sets induced for both scales, mRS and GOS. In 90% of rules, the same attributes occur in both rule sets, an exception is attribute A8 that was unique for the scale GOS and the attribute B3 that was unique for the rule set for mRS. Most likely, the fact that the same attributes occur in both rule sets is related to the fact that these attributes are truly relevant.
158
T. Mroczek et al.
References [Grzymała-Busse et al. 2008] Grzymała-Busse, J.W., Hippe, Z.S., Mroczek, T., Buciński, A., Strepikowska, A., Tutaj, A.: Informational database on brain strokes: validation of the Glasgow outcome scale and Rankin scale. In: Proc. 5th Conf. on Databases in Research: Bases-Systems-Applications, Sopot, Poland, pp. 127–131 (2008) (in Polish) [Grzymała-Busse et al. 2009] Grzymała-Busse, J.W., Hippe, Z.S., Mroczek, T., Paja, W., Buciński, A.: A Preliminary Attempt to Validation of Glasgow Outcome Scale for Describing Severe Brain Damages. In: Hippe, Z.S., Kulikowski, J.L. (eds.) HumanComputer Systems Interaction Backgrounds and Applications, pp. 173–182. Springer, Berlin (2009) [Książkiewicz et al. 2007] Książkiewicz, B., Nowaczewska, M., Wicherska, B., Rajewski, P., Rinc, R., Puchowska-Florek, M., Pałka, T.: Kliniczne monitorowanie udaru mózgu. Udar mózgu 2, 89–96 (2007) (in Polish) [Mroczek et al. 2010] Mroczek, T., Grzymała-Busse, J.W., Hippe, Z.S.: A New Machine Learning Tool for Mining Brain Stroke Data. In: Proc. 3rd IEEE Int. Conference on Human System Interaction, Rzeszow, Poland, pp. 246–250 (2010) [Stram et al. 2005] Stram, M., Kozak-Sykała, A., Kozak, K.: Punctual scales applied to opinion degree of damage brain on section of apoplexy. In: Zbiorowa, P. (ed.) Rozważania kliniczne i opiekuńcze w chorobach układu nerwowego, Annales UMCS, Lublin, pp. 202–205 (2005) (in Polish)
Using Eye-Tracking to Study Reading Patterns and Processes in Autism with Hyperlexia Profile R. Pazzaglia1,2, A. Ravarelli3, A. Balestra1, S. Orio1, and M.A. Zanetti4 1
Center for the Study and Research on Autism (C.S.R.A.), Pavia, Italy Lombardy School of Psychotherapy (S.L.O.P.), Pavia, Italy
[email protected] 3 Vision and Multimedia Lab, University of Pavia, Italy
[email protected] 4 Department of Psychology, University of Pavia, Pavia, Italy 2
Abstract. The aim of this study is to present the application of the eye-tracking technology to the research on autistic spectrum disorders (ASD) with a special interest on language impairments and text comprehension and production deficits. We discuss data and results obtained from a single case study research regarding an adult autistic male with a hyperlexia profile. Our results support the usage of eye-tracking technology in research and diagnostic contexts that make difficult an intrusive human-machine interaction.
1 Introduction This monograph illustrates the results we obtained using the eye-tracking technology in a single case study that involves a young autistic adult (named S. in the text) with an hyperlexic profile. The multidisciplinary approach that we adopt makes possible the fusion between the many core features that compose this research and leads to a strong integration between different methodologies. First of all we must consider the nature of the autism under a psychological and psychiatric perspective to understand the scientific and clinical relevance of this syndrome that remains one of the most obscure, inaccessible and fascinating syndromes in the psychiatric range. Secondly we must approach to the language and communication psychology, because one of the main issues in autism is the social impairment that undermines intersubjectivity, reciprocity and therefore communication. Our case study subject is not deeply impaired in his communication skills and he has a good level of interaction with people who care about him, however he shows a text comprehension skill that is far under his reading and writing abilities. This behavior and S’ clinical history, led us to consider hyperlexia as one of the possible reason to explain his impairment in this area and we
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 159–174. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
160
R. Pazzaglia et al.
figured out the way to study this phenomenon in a way that excludes the direct interaction with a subject witch is quiet unable to give reliable answer due his peculiar autistic condition. Last, but not least, we must deal with the eye-tracking devices, developed for different kinds of disciplines from medicine to consumers’ behavior and, in this case, borrowed from artificial vision engineering. In the next chapters of this writing, before the presentation and discussion of our data and results, we will describe the main theories and methodologies about autism, hyperlexia and eyetracking technology to better explain the interaction between these research branches that made possible this study. 1.1 Autistic Spectrum Disorders (ASD): History and Overview The term “autism” was coined by the eminent European psychiatrist Eugen Bleuler in the beginning of the 20th century to explain a closure of the Self (autòs is the greek term for Self) and a severe retreat from interactions with others and the world typical of some psychotic manifestations. In 1943 and in 1944, independently from each other, the psychiatrists Leo Kanner in USA [Kanner 1943] and Hans Aspeger in Austria [Asperger 1944] described two groups of children that presented strange behaviors like lack of interest in human relationships, affective and language impairments, learning deficits and absence of developmental deterioration. The two scientists used the term “autism” to label this syndromic frame and they both published their researches in their respective countries. The work of Kanner and Asperger represents the base for the modern study of autism and is deeply connoted by a strong clinical approach. Today, the disorders classified in the autistic spectrum are Autistic Disorder Syndrome, Asperger Syndrome, Childhood Disintegrative Disorder, Rett Syndrome and Pervasive Developmental Disorder Not Otherwise Specified (PDD-NOS). For what concerns the mentioned psychiatric diagnostic classification, we want only remind that also inside single syndromes like the Autistic and the Asperger ones, there is incomplete agreement among researchers about the features and symptoms that concur to create the syndrome description and also inside each diagnostic label we can find a great individual variance that goes from severe to mild states of impairment. The main criteria to make an autistic diagnosis possible are continuously under revision from the clinical world community. The most common accepted guidelines in the psychiatric community, suggest to evaluate three important issues: 1. A clinical deficit in social interaction that involves verbal and non verbal communication, lack of reciprocity in social relationships and impossibility to maintain social interactions and relationships; 2. Restricted and repetitive behaviors, interests and activities; 3. Early childhood onset of the symptoms.
Using Eye-Tracking to Study Reading Patterns and Processes in Autism
161
From these guidelines we can deduce that ASD diagnosis is mainly based on clinical evidences and that the core impairments of this syndrome is the social interaction deficit. In addition to the clinical observation of the symptoms, clinicians can use standardized tests suited for the ASD evaluation, especially in the social interaction area or in the cognitive skills area. However no computer-assisted device like eye-tracking or brain imaging techniques are required nowadays to evaluate ASD in order to make an ASD diagnosis. For what concerns the epidemiology of autism, it is not so easy to make a reliable evaluation because the diagnostic criteria are often under revision. In the last years the incidence ratio is evaluated in 1:200 in North America and Europe (and this incidence makes autism a not rare disease) with prevalence in the male population with a ratio of 4:1. The etiopathogenesis of autism remains unclear and there are no treatments to “heal” autism. An early diagnosis followed by special educational programs can improve positive skills and surround dysfunctional behaviors allowing autistic individuals to reach better levels of integration. 1.2 Theories on Autism During the last 50 years of research, many theories and models were developed to explain autism. Some of these theories, like the psychoanalytic ones, are now mainly dismissed, other perspectives are supported by experimental data from genetics or brain imaging techniques, but there are not theories able to explain completely the autism complexity. Contributions from genetics highlight an hereditary component, for example the role of a few number of genes involved in the affiliative behavior regulation, at the same time other researches suggest that genes involved in autism are “activated” in the presence of external factors. Every year new genes are supposed to be candidates in the ASD development and, at the same time, other genes are disconfirmed as possible candidates. The latest findings in the genetics research establish a “genetic risk” for autism that results from a complex interaction between multiple genes and environment. The genetic component is now estimated as the highest risk factor in the autism development, but current methodologies are able to estimate genetic mutations only in 20% of cases [Geschwind 2009]. During the Nineties, after the discovery of the mirror neuron system in the monkey brains and strong evidences that confirm the existence of this system also in the human brain, some claims supported the implication of this system in the autism syndromes. The mirror neuron system consists in a specialized population of neurons, located in motor areas of the brain cortex, that responds to sensitive stimuli coping action-oriented sequences. The simple view of a goal oriented action (like take a glass of water or grasp an object) are able to activate the system; this findings drove scientists to look at mirror neurons like a modality to imitate and mimic automatically and extended the results to the study of empathy and comprehension of others with a set of interesting experiments. The
162
R. Pazzaglia et al.
well-known deficits of social interactions and understanding of others in autistic people supports the controversial hypothesis that a mirror neuron malfunction (broken mirrors) can be in part responsible of this syndrome [Oberman et al. 2005]. Unfortunately there are still not enough evidences of the presence and role of mirror neurons in the human brain cortex, but there are strong proofs of their implication in empathy, understanding of others and intersubjectivity, just to mention only a few areas of investigation. Nevertheless the complexity and heterogeneity of autism seems to hints that more than one single factor is able to explain all the autistic manifestations. Another well known psychological theory that tried to explain autism is the Theory of Mind (ToM) deficit. ToM is a cognitive-developmental theory that put in evidence the peculiar human capacity to think at others taking into account the existence of their minds, minds equipped with desires, wishes, intentions and able to produce behaviors. The idea that other people own a mind rises in children during the first years of life and some tests were developed to evaluate if the “other person perspective” is present or not (false beliefs tests). Autistics do not perform well in these tests, so it supposed that autism is a consequence of a particular ToM deficit. Also in this case, the explanation is unable to cover the whole phenomenon. Another conceptual model to explain autism was developed in the late 1980s and is called weak central coherence theory (WCC). This model suggests that a deficit in the perceptual-cognitive organization can drive to a fragmentary and limited capacity to understand and organize sensitive stimuli. Autistic subjects often report their fragmentary vision of the world and a confusion of the perceptual senses; the use of a sensitive modality like touch for instance, can produce the activation of another sensitive function like vision or sound (this phenomenon is known as synesthesia and it is a cognitive trait associated with autism). Experimental evidences support this model, like the other ones mentioned in this overview, but once again they are not able to explain autism that still remains a scientific mystery and a challenge for researchers and clinicians. 1.3 Hyperlexia and Autism The subject of our study presents some reading and comprehension impairments that drove us to consider the phenomenon of hyperlexia as a possible explanation (for the clinical description of the case study see Section III). Hyperlexia is the ability to master word reading spontaneously without an adequate level of comprehension. The level of word recognition and word decoding skills is normal or above, on the contrary the text comprehension skills are impaired. This phenomenon is described in the literature from the last century and it seems to be associated with ASD but it also occurred in children with a typical development. A review of the literature by Grigorenko, Klin and Volkmar [Grigorenko et al. 2003] highlights some important results obtained by the recent research on this issue. Hyperlexia occurs during the child early development, often before the child is
Using Eye-Tracking to Study Reading Patterns and Processes in Autism
163
3 years old and it is connoted by the spontaneous ability to read single printed words without any formal instruction on reading. A central feature for the definition and identification of hyperlexia is the discrepancy between word recognitiondecoding, that must be high respect age and general cognitive skills, and word comprehension, that must be low respect the same parameters. The word recognition skill can remain stable or can decrease during the life span, not the same occurs with the comprehension impairments that always remain stable during the life span. Very often hyperlexic children, seem to enjoy reading and in some cases an obsessive compulsion in reading was observed, also these behaviors can decrease or disappear in the next developmental stages. There is not agreement or sufficient evidences to clarify other important points regarding hyperlexia and in particular if it should be considered as a disability or a precocious super-ability and its relationship with the ASD super-abilities like calculus, mnemonic and calendaristic super-abilities. The most accredited approach for a possible hyperlexia explanation suggests the existence of a dissociation between lexical reading and comprehension with a direct lexical link from orthography to phonology not mediated by semantic skills. These theory is supported by experimental evidences of better performances in non-words reading skills by hyperlexic readers, also if affected by ASD.
2 Eye-Tracking Technology and Its Applications 2.1 Introduction In recent years, eye-tracking systems have greatly improved, beginning to play an important role in the HCI field. Eye-tracking relates to the capability of some devices to detect and measure eye movements, with the aim to precisely identify the user’s gaze direction (usually on a screen). The acquired data are then recorded for subsequent use, or directly exploited to provide commands to the computer. In the past, eye-trackers were invasive systems, very often involving head-mounted devices. Fortunately, since a few years things have changed and computer-controlled cameras are now the standard [Duchowski 2004]. Eye-tracking has now evolved to the point where the user can move almost freely in front of the camera(s), within certain limits, and moreover a good accuracy (1 degree or better) is achieved throughout the whole working range. These devices, which operate without contact with the user, are called remote eye-tracking systems. Video-based eye-trackers use infrared lighting, which is invisible and so not disturbing, and follow the eyes by measuring how light is reflected by the cornea and by the retina through the pupil [Zambarbieri 2006].
164
R. Pazzaglia et al.
The eye-tracker we use in our laboratory – the Tobii 1750, one of the most widespread eye-tracking devices – combines video-oculography with infrared light reflection. The system, which looks like a common LCD screen, is provided with five NIR-LED and an infrared CCD camera, integrated in the monitor case. Infrared light generates corneal reflections whose locations are connected to gaze direction. Eye positions are recorded at a frequency of 50 Hz, and with a precision of about 0.5° of the visual field. Eye-tracking can be profitably exploited for evaluation purposes, allowing a great amount of data about users’ behaviors to be obtained. Thanks to the acquired information, users’ gaze paths can be reconstructed, and possible focuses of attention identified. An eye-tracking device can give a great amount of data useful in HCI studies, but also for diagnostic purposes and for the design of special interfaces dedicated to people with cognitive or physical impairments. Available data are, for example, number and duration of fixations on a text or on a single word or the visual search strategy used by the user to explore screen content. Due the non-invasive features of the system, this technology can be successfully used for test sessions with people with mental disorders or physical impairments without compromising the interaction with the pc or their comfort. Especially, the system is suitable for studies about reading behavior that can be used for research and diagnostic purposes, but also for the design of interfaces adapted to people affected by linguistic disorders like dyslexia and, like in our experiment, hyperlexia. 2.2 Human Reading Processes Eye-tracking technology is able to produce a highly detailed picture of any reading session with objective data collected on-line. Since the middle of the 70s the main interest was the analysis of the eye movement during reading, with the technological and informatics improvement that followed, now the main interest is the study of the underlined cognitive processes that are behind the surface of reading behavior. As observed by Starr and Rayner [Starr and Rayner 2001], after more than 30 years of research we have a reasonable account of the human behavior during reading with consistent data to support it and some open questions. We know that reading activity is a rapid sequence of eye movements called saccades and fixations. Mainly, fixations are forward oriented during reading (from left to right in our language for instance), but in some cases the reader comes back across the text that was already read; in this case fixations are called regressions. The percentage of regressions in a skilled reader is about 15% and the saccade size is 7-9 letter spaces. Poor readers or dyslexic readers present a different eye-movement behavior with more fixations, more regressions, longer fixations and shorter saccadic movements. We also know that shorter words (2-3 letter
Using Eye-Tracking to Study Reading Patterns and Processes in Autism
165
words) are fixated for less time, about 25%, than longer words (8 or more letter words) and that content words are fixated much more (85% of the reading time) than non-content words. 2.3 Eye Tracking Applications in ASD Eye-tracking devices were used in recent years to investigate some peculiar aspects of ASD, not only related to the reading processes or in general to the language impairments, but also related to another important aspect: the lack of social interaction skills. These experiments focused on the gaze behavior of autistic subjects during faces or social interactions stimuli processing. Pictures of human faces with different emotional patterns or video clips of interaction scenes were processed using eye-tracking devices. As highlighted in a literature review by Boraston and Blackemore [Boraston and Blakemore 2007] while normal subjects usually spend more time on the core features of the human face (eyes, nose and mouth), autistic subjects spend less time on these features, in particular the eye region seems to be the less processed. The article also put in evidence the great potential of the eye-tracking technology regarding the ASD research, hoping its extension to other perceptual aspects and stimuli not yet investigated. Text should be, in our opinion, one of these emerging fields of research. The psychology of text comprehension processes views at text comprehension as a complex process in witch many cognitive skills are involved. The point of disagreement among researchers is not about the typology of cognitive skills involved in text comprehension, but it’s about their role or hierarchy position in the whole process. The main theories about text comprehension processes are the bottom-up approach, or memory based approach, that focuses on the main role of perception and memory skills as the main factors involved in comprehension, and the top-down approach, or knowledge based approach, that instead focuses on the high level cognitive skills like general knowledge stored in long term memory, or reasoning and thinking processes. As highlighted by Kintsch in his Construction-Integration Model approach [Kintsch 2005], both these two models are able to glimpse some crucial aspects, but necessary they skip others, so the need of a model that try to unify the two different approach in a single theoretical perspective. If the result of text comprehension is a mental representation of the text, hyperlexia in ASD should represent a new research perspective to deal with the psychology of text comprehension processes, just because we have in this case reading without comprehension or with an insufficient level of comprehension restricted to orthographic and procedural processes.
166
R. Pazzaglia et al.
3 Experimental Design 3.1 Clinical Description The subject of our experiment is a 25 years old male with a diagnosis of autism. His clinical history starts when he was 3 years and 6 months old; at that time he presented a range of behaviors compatible with autism. The onset was quiet fast and in 4 months S. lost language and refused relationships with the other figures (parents and family members). After a long period of rehabilitation and special educational trainings, S. started again to speak when he was 11. He also learned to write and he recovered some social competences. He presents an I.Q. score of 55 points at WAIS-R test battery, but a complete and reliable evaluation of his cognitive skills presents some difficulties due his low responsiveness to direct examinations and tests sessions that require a direct interaction in form of question/answer. He also presents behavioral patterns typical of autistic spectrum like echolalia, stereotypes, lack of spontaneous social interactions and language impairments. Our interest focused on language, reading and text comprehension skills and from his anamnesis history we found that when he was about 2 years old, he was able to read spontaneously single words or short texts. This ability disappeared with the autistic symptoms onset and with the loss of language faculty. Nowadays S. shows interest in short texts, such as can be found on posters or advertisements, often this interest is followed by a request of explanation of the content and meaning of the text he read. Due his difficulties to answer our questions, sometimes it is impossible to know if he understands or not completely the meaning of the text he reads, in not spontaneous situations he shows otherwise a full comprehension of text that requires him to put in practice actions patterns. He recognizes numbers and he has some calculus ability, but is unable to recognize numbers as indicators of years, dates or hours. We hypothesize that S. had a hyperlexia profile in childhood and that still now, some of his reading and comprehension impairments are the results of that original phenomenon. 3.2 Test Session We assumed to detect differences between our subject reading behavior and normal reading behavior as reported in literature and as recorded in a control normal subject matched by age. Several types of texts have been edited in order to verify our hypothesis. Texts font was black Arial on white background and presented this features: A. Text with a list of words of the same semantic field; B. Text with repeated well-known words;
Using Eye-Tracking to Study Reading Patterns and Processes in Autism
167
C. Standard control text; D. Text with a nursery rhyme. The subjects were asked to read aloud at a starting signal. All the texts have been submitted also to a control subject as control condition
4 Experiment 1: Results 4.1 General Overview In Fig. 1 is reported the reading session performed by a control subject. The text says: “There are two crocodiles and an orangutan
two small snakes, a golden eagle
the cat, rat, elephant
not missing any more:
just do not see the two unicorns” The total reading time is 6.884 ms. and fixations are in total 21.
Fig. 1 Control subject
In Fig. 2 is reported the reading session performed by S., subject of our experiment. The total reading time is 20.228 ms. and fixations are in total 46.
Fig. 2 Subject of our experiment
A surface comparison between the two sessions data shows two different pattern of reading behavior. The control subject shows a number of fixations typical of a skilled reader and an average span of skipped letter spaces between fixations (15-20), also the number of regressions is in line with the standard parameters reported in literature. The total reading time of the subject of experiment, is 3 times
168
R. Pazzaglia et al.
longer than the control subject total reading time and it is connoted by a high number of fixations and regressions and also by a small span of skipped letter spaces (2 or less). 4.2 Comparison with Reading Processes Standard Measures The data obtained from the reading session reported in Fig. 2 were compared with the data reported in the literature regarding 3 main parameters. In Fig. 3 is illustrated the mean of the subjects of experiment fixations duration and the means of the normal population by age classes. The subject of experiment shows a mean of 187 ms. for fixation, this result indicates faster fixation times than adult population standard means.
Fig. 3 The subject of experiment presents shorter fixations
Fig. 4. Indicates the total number of fixations every 30 words. The subjects of experiment mean match the mean of a reader with age included from 8 to 11.
Fig. 4 Number of fixations for 30 words
Fig. 5. Indicates the percentage of regressions of the subject of experiment. The subject performs a 22% of regression movements, a percentage typical of a 11 years old reader.
Using Eye-Tracking to Study Reading Patterns and Processes in Autism
169
Fig. 5 Percentage of regression movements
4.3 Fixation Times and Word Features We compared the fixation time means of the subject of experiment in order to evaluate differences of fixation times in relationship with non-content word features like the word length. We considered syllables and letters as dependent variables and we carried out an analysis of variance (ANOVA) to evaluate differences. We found significative differences in both situations. In Fig. 6. is reported the comparison between short and long syllabic words and fixation time means (F = 6,55; df = 1,43; p < .005).
Fig. 6 Comparison between 1-2 syllabic words and 2-3 or more syllabic words fixation means
In Fig. 7 is reported the comparison between short and long words and fixation time means (F = 8,54; df = 1,43; p < .005).
170
R. Pazzaglia et al.
Fig. 7 Fixations duration means and words length
4.4 Conclusion The whole frame of S.’s reading behavior is characterized by a fragmented reading style that doesn’t match completely the frames of other impaired reading processes like dyslexia. The high number of fixations (and regressions) in fact is typical of dyslexic readers as reported by Starr & Rayner [Starr and Rayner 2001], but the low duration of fixations is not typical of dyslexia in witch fixations are longer than in non-dyslexic readers. While hyperlexia is a comprehesion/semantic related problem with no decoding impairments, dyslexia is a letter level decoding deficit with no comprehension impairments. Our results are not strong enough to support the hypothesis that hyperlexia is the opposite of dyslexia, but the low duration of S.’s fixations suggests that letters and words are decoded faster, but with more intervals and fixations than normal readers. Studies about hyperlexia are just a few if compared with researches on dyslexia. It is not correct to conceptualize hyperlexia as a subtype of dyslexia, but indeed a subgroup of Developmental Language Disorders that requires further researches to better evaluate its evolution during lifespan and its implication in other complex syndromes. Not yet explicable is the result regarding the higher time spent on shorter words than longer words that is in contradiction with the researches we found in the reading processes literature regarding normal and impaired readers.
5 Experiment 2: Results 5.1 General Overview We mentioned S’ lack of spontaneous behaviors and the difficulty to start actions without an explicit invitation. This pattern is a common feature of ASD and our subject is not an exception. Otherwise, also in the reading behavior, we noted in our clinical experience with S., that sometimes he start to read spontaneously if some conditions occur like when the text is short (like advertisements) and if he wants to stop his educational tasks and take a break. It’s not possible yet to predict
Using Eye-Tracking to Study Reading Patterns and Processes in Autism
171
when and why he exactly starts to read texts spontaneously and his level of text comprehension. At the same time we must highlight that his spontaneous reading actions are often followed by request of clarification of the meaning of the text that he just read. This observation suggests at the same time that his comprehension is not complete and that he probably knows that and for this reason he asks for help to understand the text meaning. Usually, after our explanations, no more attention is given to the texts and his interest seems to vanish; also for this reason we still don’t know if he is able to complete and integrate his text comprehension with our explanations. In our experimental design, the S’ lack of spontaneous reading was avoided by our explicit instructions. S. was warned and informed about the experimental procedures, and we carried out some simulation sessions before the main experimental sessions. In these simulations S performed the same tasks that he performed later during the real experimental sessions, but he was assisted to better understand step by step the experiment procedure. The main point of interest was the understanding by S. of the start time after the text presentation on the eye-tracking computer monitor. Verbal instruction about this issue was not enough to make S. able to start reading text spontaneously and some practical experience was required to make S. able to learn the task procedure. In the following section of this paper we will give account of one of this preliminary sessions in witch we recorded a peculiar reading behavior. 5.2 Results In Fig. 8 is reported the S.’ reading behavior during a preliminary reading session. The surface reading pattern is similar to the reading pattern described in the experiment 1 with an average number of fixations and a long total reading time: 48.279 ms.
Fig. 8 Subject of experiment
The text says: “ Andrew this morning went to school. There he did his homework with her teacher and he played with his school mates, at the end of the day he went back to home with his father and he take a snack”. This text was selected because he simply presents some word that we know familiar to S. like “teacher”, “school”, “father” and “snack”. The whole text was read in 28.498 ms.
172
R. Pazzaglia et al.
In Fig. 9 is reported the same reading behavior of the control subject matched by age. In this case the total reading time is 17.198 ms and the fixations number, the regression percentage, the fixations duration and the saccadic movements are normal for a skilled reader.
Fig. 9 Control subject
In Fig. 10 is reported the S.’ reading behavior that occurred spontaneously, before the subject was asked to start the reading session. The reading process starts from the bottom of the text, the fixations are 9 in total and the last one is again on the end of the text.
Fig. 10 Control subject
In Fig. 11 is reported the S.’ spontaneous reading process followed by the reading session he performed after our instruction to start reading. The fixation number 10 is the first fixation after S. stopped his spontaneous reading process, it’s outside the text and it’s linked to a few fixations at the bottom of the text (fixations number 11 and 12) linked again with fixations on the first word of the text (Andrea). From this moment S. started to read in the same way he performed on the previous text exposed in Exp. 1.
Fig. 11 Control subject
5.3 Discussion In Experiment 2 we show results regarding the only one session in witch S. performed a non-assisted reading task. In this session we can find two different reading modalities. If we subtract from the whole reading time of this text, the reading
Using Eye-Tracking to Study Reading Patterns and Processes in Autism
173
time S. used to read the text using his non-spontaneously reading modality, we find that S. “looked” at the text in 2.292 ms. performing less than 10 fixations. Once again we deal with data that are difficult to explain because we can’t be sure if S.’ “looked” the text or if he “read” the text. Otherwise we arranged the fixation threshold at 100 ms. and the first eight S.’ fixations are included in a range that goes from 120 ms. (the shorter one) to 259 ms (the longest one). This behavior suggests that S. put attention on the word he fixed, but we cannot establish if his attention was due a semantic comprehension of the word or if it was drawn by a surface comprehension of the word linked to a non-semantic recognition process. The whole reading time of less than 3 seconds also suggests that S. approaches texts at least with two different reading modalities; the one described in Experiment 2 is spontaneous and should be the original S.’ hyperlexic reading modality that he showed in his early childhood.
6 General Discussion In this final chapter we would like to emphasize points of interest and also limitations of our research. The scientific limitations are connected to the main issue that this study is a “single case” study that makes difficult a generalization of the results. Furthermore the subject of experiment was a young adult and the majority of hyperlexia studies involve children; this aspect can make ardous a comparison with our data and data obtained with the same methodology, but with younger subject. However, abnormalities in eye movements recorded in children and adolescents with ASD appear also in adults with ASD. An aspect of this research that we want indeed highlight is the importance of the eye-tracking technology in the ASD research and in particular in relationship with perceptual and reading aspects. The deficits in communication and social skills that distinguish autistic people make difficult to obtain direct answers in experimental environments due the low level of responsiveness and in many cases, the total absence of language. The low level of intrusiveness of modern eye-tracker devices can help the researchers to obtain valuable data without a complete participation of the subjects of experiment and without the usage of additional device like head-mounted instrumentation or special glasses. Furthermore, the quick setup of the eye-tracking calibration and its low request of attention is optimal when we deal with ASD subjects. In experiments with both reading and general perceptive stimuli, fixations are considered the cognitive moment of the process, the moment when the subject extracts and elaborates the perceptual content. Eye-tracking can seize this activity allowing the subject a “passive” behavior and avoiding a direct interaction mediated by questions and answers tasks during the experiments. In order to reach better results in this field we look forward for software improvements to better implement the human-machine interaction. Also de development of portable eye-tracking device could be interesting in the autism research area allowing complete ecological experiments with high reliable data. The recursion of pattern behaviors during reading in normal and impaired subjects can also refine diagnostic frames mediated by eye-tracking technology. The use of eye-tracking technology to improve diagnostic accuracy is still a novel approach, but it is already able to obtain important
174
R. Pazzaglia et al.
results in the field of language disorders. We must point out that in ASD an early diagnosis can be crucial in order to plan effective interventions. We hope that this approach will be extended consistently not only at the language impairments study like hyperlexia, but also at the whole ASD perceptive structure that is still mainly unknown.
References [Asperger 1944] Asperger, H.: Die autistischen psychopathen im kindesalter. Archive für Psychiatrie und Nervenkrankheiten 117, 76–136 (1944) [Boraston and Blakemore 2007] Boraston, Z., Blakemore, S.J.: The application of eyetracking technology in the study of autism. Journal of Physiology 581(3), 893–898 (2007) [Duchowski 2004] Duchowski, A.T.: Eye tracking methodology – theory and practice, 2nd edn. Springer, London (2004) [Geschwind 2009] Geschwind, D.H.: Advances in autism. Annual Review of Medicine 60, 367–380 (2009) [Grigorenko et al.2003] Grigorenko, E.L., Klin, A., Volkmar, F.: Annotation: hyperlexia: disability or superability? J. of Child Psychology and Psychiatry 44(8), 1079–1091 (2003) [Kanner 1943] Kanner L (1943) Autistic disturbances of affective contact. The Nervous Child 2, 217–50, Reprint Kanner, L. Acta Paedopsychiatrica 35 (4),100–136 (1968) [Kintsch 2005] Kintsch, W.: An Overview of Top-Down and Bottom-Up Effects in Comprehension: The CI Perspective. Discourse Processes 39(2&3), 125–128 (2005) [Oberman et al.2005] Oberman, L.M., Hubbard, E.M., McCleery, J.P., Altschuler, E.L., Ramachandran, V.S., Pineda, J.A.: EEG evidence for mirror neuron dysfunction in autism spectrum disorders. Cognitive Brain Research 24(2), 190–198 (2005) [Starr and Rayner 2001] Starr, M.S., Rayner, K.: Eye movements during reading: some current controversies. Trends in Cognitive Science 5(4), 56–163 (2001) [Zambarbieri 2006] Zambarbieri, D.: Movimenti oculari. Collana Ingegneria Biomedica Pàtron Editore (2006)
Ontology Design for Medical Diagnostic Knowledge M. Jaszuk1,2, G. Szostek2, and A. Walczak2,1 1
Programming Department, University of Information Technology and Management, Rzeszów, Poland {marek.jaszuk,grazyna.szostek}@gmail.com 2 Information Systems Institute, Military University of Technology, Warsaw, Poland
[email protected]
Abstract. The paper gives an overview of research devoted to developing a semiautomatic methodology of building a semantic model of medical diagnostic knowledge. The methodology is based on natural language processing methods which are applied to analyze medical texts. As a result of the process, the semantic model of symptoms is generated. This model is a foundation for building a model of diagnostic technologies. The described methodology and the resulting model are developed specifically for the Polish language.
1 Introduction Semantic models are a very important tool for structuring knowledge from a specific domain. This is achieved by identifying and properly defining a set of relevant concepts that characterize a given application domain. In recent years, research related to development of such models produced tangible results concerning the definition of language standards [WWW-1 2010] and increasingly powerful model editing and management tools [WWW-2 2010]. Despite the availability of these tools, populating domain semantic models with a sufficiently large number of concepts is a tedious and time consuming process, preventing wide scale production and usage of the models by industrial institutions. The topic of automatic methods for ontology creation is a subject of growing interest among the scientific community (see e.g. [Navigli 2004]). These methods assume that the required knowledge from the domain of interest is incorporated in specialist texts. Creating an ontology requires identifying important terms in the text and discovering possible relations between them. This is possible to achieve by treating the text with the aid of numerous natural language processing methods (NLP) and tools. Among the most important tasks here, is the creation of the domain vocabulary, word sense disambiguation, identification of phrases, computing semantic similarity and elimination of synonyms, and finally identifying the semantic relations between terms.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 175–189. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
176
M. Jaszuk, G. Szostek, and A. Walczak
Another important issue that should be mentioned is the particular language specifics. The majority of automatic ontology creation methodologies are developed for the English language. As the methodologies are based on processing expressions in a natural language, they are language specific. In our work we are focusing on building semantic models for the Polish language. It belongs to the flexive languages group, and in consequence its structure differs significantly from English. In consequence methodologies developed for the English language will not work, and we had to develop our own methodology. A number of domain ontologies have already been created (see e.g. [WWW-3 2010]). These ontologies usually incorporate a huge number of specialist terms and were created with no particular purpose in mind. Models of this kind are not the best choice when specific applications are considered. They are too general, contain a lot of unnecessary elements and their structure is not always suitable for the specific application. Our approach focuses on creation of models designed for precisely defined applications. The choice of the semantic model to represent the data is not an accident. In medical diagnostics the different technologies generate results in various formats. The results can be delivered as a textual description, an image, a video record, a table, or a graph. The diversity of formats makes further data processing very complicated, and diagnosing a patient given such data is extremely difficult for a computer system. The problem becomes much easier if the data is uniformly represented. The semantic format allows for transforming the heterogenic data into the homogeneous representation. To represent the diagnostic data in the semantic format it is necessary to construct a model of medical symptoms which will be a foundation for further processing of the data. The paper is organized as follows. Section II contains the formal specification of the basic terms, such as ontology and semantic network. In Section III the fundamental assumptions about the structure of the model of symptoms are described. In section IV we describe a model of diagnostic technologies (DT) built on the foundation of the model of symptoms. Section V describes details of the methodology used for building semantic models as a result of text corpora processing.
2 The Formal Specification An ontology is defined as a directed graph: OG = TV , TE ,
(1)
where: TV = {t1 , t 2 ,..., t n }
(2)
Ontology Design for Medical Diagnostic Knowledge
177
is a collection of types of vertices (concepts represented by the graph), and TE = {(t i , t j ) : t i , t j ∈ TV }
(3)
is a collection of types of edges connecting the vertices (semantic relations between concepts in the ontology). In our approach the vertices represent all the terms used to describe symptoms. The edges of the graph define all the possible relations between the terms. A semantic network is a network relating individual realizations of concepts defined in the ontology. Formally the semantic network can be written as: S sem = V , E , vt , et ,
(4)
where: V = {v1 , v 2 ,..., v nV }
(5)
is a set of instances, E = {e1 , e 2 ,..., enE }
(6)
is a set of edges, vt (v j ) = t i
(7)
is a function assigning the network vertex v j to a respective type ti , and et (e j ) = (t i , t j )
(8)
is a function assigning the edge e j to its respective type (t i , t j ) ∈ TE . In our approach the semantic network serves as a model of disease symptoms.
3 The Semantic Model of Symptoms The semantic models that are going to be built have a specific purpose and incorporate only a part of the whole medical knowledge. Namely we are going to build semantic models of medical diagnostic knowledge, i.e. models describing disease symptoms that can be observed in patients. Before building any model we can utilize an a priori knowledge about the structure of natural language symptoms description. This will allow for making some assumptions about the structure of the models, and choosing appropriate techniques to construct them. We can also optimize the models according to their future use. The primary purpose of the model is collecting diagnostic data about a patient. Building any semantic model requires developing an ontology of the terms used in
178
M. Jaszuk, G. Szostek, and A. Walczak
the domain of interest. This means that we need to start our work from developing an ontology of the diagnostic knowledge. As the basic element of this knowledge is a symptom, the starting point will be identifying the structure of the ontology of a symptom. This will be the basic building block leading further to constructing the DT model. The examples of some commonly carried out DT are: interview, physical examination, blood test, X-ray imaging, bronchoscopy, skin allergic tests, etc. As a result of any DT we get a set of measurements, among which the most valuable are the ones informing about deviations from the normal state. The deviations are important because they are symptoms providing clues for diagnosing the possible diseases. The sample symptoms could be: kaszel (eng. cough), wydzielina (eng. secretion), obrzęk, (eng. swelling), etc. When analyzing descriptions of symptoms one can note that most of the symptoms are accompanied with a set of additional parameters making the description more precise. Let us take for example a typical sentence taken from a bronchoscopy record: W drzewie oskrzelowym zalegająca w dużej ilości wydzielina śluzowa (eng. “A huge amount of mucus residing in a bronchial tree”). The symptom in the sentence is wydzielina śluzowa (eng. mucus). As one can see the symptom here is expressed not by a single word but by a phrase. It should be underlined that a single node in the model has to represent a particular meaning, which does not have to be a single word. The phrase wydzielina śluzowa can be replaced by its synonym śluz. This indicates that the phrase has meaning of a single noun and thus should be represented by a single node. The symptom identified in the above sentence is specified by additional parameters. The first of them is the phrase specifying the exact location: w drzewie oskrzelowym (eng. in a bronchial tree). The additional expressions which specify the symptom are: zalegająca (eng. residing) and w dużej ilości (eng. in a huge amount). These phrases provide two additional parameters characterizing the symptom. The first of them is the adjective zalegająca, the second one is the information about the amount – w dużej ilości. The above considerations lead to the semantic model of the wydzielina śluzowa symptom (see Fig. 1). The model consists of five nodes representing semantic classes (terms used to describe the symptom). Except the symptom class (Wydzielina_śluzowa) we have three classes representing the features of the symptom: Zalegająca (the adjective), Drzewo_oskrzelowe (the place of appearance), Ilość (the name of the parameter specifying the symptom). The parameter Ilość is further specified by the adjective Duża. To complete the model we need to link the nodes by defining several semantic relations. In the discussed case the relations are: charakteryzuje_się (eng. is characterized with) – the relation linking the symptom and the Zalegająca adjective node, and also the parameter Ilość with the Duża adjective, występuje_w_miejscu (eng. appears in a place) – the relation linking the symptom with the place of appearance, specyfikowany_przez (eng. specified by) – the relation linking the symptom and the name of the parameter which specifies the symptom.
Ontology Design for Medical Diagnostic Knowledge
179
Fig. 1 The semantic model of the mucus symptom
The model constructed in the above example is a part of the mucus symptom ontology. To make this particular symptom ontology complete it is necessary to collect all the terms which can be used to characterize the mucus symptom and define appropriate relations linking the central node with the features. Of course in cases of particular patients only selected features of the symptom will be observed. Thus the cases will be represented as semantic networks which are instantiations of selected nodes from the symptom ontology. The considerations presented for the mucus symptom suggest the general structure of the semantic model of a symptom. As it can be seen this model includes one central node which can be considered a core symptom. Sometimes such a single node model is sufficient to represent a symptom, but usually we need to add a number of additional nodes further specifying the core symptom. Let us define a general class which incorporates all the possible core symptoms. This class will be called Objaw (eng. symptom). The symptom node has a number of connections to other nodes representing all the possible features which should be added to make the model complete. The model shown in Fig. 1 introduced three particular features which can be generalized to a more abstract level by defining three classes: Przymiotnik (eng. adjective) – represents all the possible adjectives which can be used in the model, Miejsce (eng. place) – represents all possible places in the body, in which the symptom can be localized, Parametr (eng. parameter) – this feature represents all the possible parameters that could specify a symptom (like
180
M. Jaszuk, G. Szostek, and A. Walczak
size, color, frequency, etc.). As the Ilość parameter shows, some features of the core symptom are more complex and consist of more than just a single node. Except the groups of features mentioned above, there are also other groups of features which could be used to describe symptoms. Some of the most typical examples can be represented by the following classes: Pora_dnia (eng. time of day) – the time of appearance or intensification, Czynnik_wywołujący (eng. causing factor) – the factor which causes appearance of the symptom (like the air temperature, sun radiation, air moisture, etc.), Substancja_wywołująca (eng. causing substance) – the substance causing it (like cosmetics or other chemicals), etc. Introducing all of the above classes leads to formulating the general semantic model of symptoms. To make the model complete we need to define all the necessary relations linking the Objaw class with its features. Except the relations mentioned earlier, we define the following ones: występuje_w_porze_dnia (eng. appears in time of the day) – links the symptom with time of its appearance, wywoływany_przez_czynnik (eng. caused by a factor) – links the symptom with the factor causing it, wywoływany_przez_substancję (eng. caused by a substance) – links the symptom with a substance causing its appearance. The above considerations lead to formulation of the general model of symptoms. Unfortunately the model describes properly only a part of all the possible diagnostic data. To incorporate all the remaining data we need to define a more general structure. Let us consider the central node of the model again. This node does not always have to be a symptom. In many cases this is just a name of some parameter, which combined with some additional information becomes a symptom. Let us take for example the following statement: Ciśnienie krwi podwyższone (eng. blood pressure elevated). The central node of the model is the Ciśnienie (eng. pressure) concept. This concept has two features, which are: Krew (eng. blood), and an adjective Podwyższony (eng. elevated). The adjective is linked with the mentioned earlier relation charakteryzuje_się, and the Krew node is linked with a relation dotyczy (eng. refers to). The Ciśnienie concept is not a symptom itself, but connected with the two features gives a symptom as a whole. Moreover the diagnostic data about a patient contains numerous pieces of information that are not symptoms at all, but still provide valuable diagnostic evidence. Some examples of such data are: profession, age, sex, addictions, etc. Semantic classes representing the discussed kind of information cannot be represented by the Objaw class on the general level. This indicates that some other class should be defined as the most general representative of all the data. Thus we define a class named Atrybut_medyczny (eng. medical attribute), which is the most general representation of the central node of any diagnostic data model (Fig. 2). The Objaw class and other classes representing other categories of data, are subclasses of the Atrybut_medyczny class. As a result we get a hierarchic structure in which the lowest level consists of classes representing the actual data registered in the diagnostic process, while classes from higher levels represent groups of classes representing particular kinds of data.
Ontology Design for Medical Diagnostic Knowledge
181
Fig. 2 The general structure of diagnostic data semantic model
To build the semantic model of diagnostic data, we need to identify the nodes representing the medical attributes and combine them with appropriate features. All the terms necessary to build the model, together with all the possible relations between them, define the complete ontology of symptoms. This ontology is a foundation for building semantic networks describing symptoms observed in particular patients. The standard methodology of developing semantic models is based on using ontology editors like Protegé [WWW-2 2010], and manual creation of the models. For technologies generating results in a very simple tabular form such a methodology would be easy to apply, leading to constructing the models in a relatively short time. Unfortunately many DT deliver their results in a more difficult to analyze form. The results of interview and the physical examination are described in natural language. Also all kinds of medical imaging technologies, like X-ray imaging, or computer tomography, should be mentioned. The results obtained with these technologies are usually accompanied with textual descriptions. Such descriptions are a source of terms and semantic relations, which can be used to construct semantic models of symptoms. Unfortunately the process of manual construction of such models is not that easy as in case of simple tables. In fact it is an exhausting and a time consuming task. This is due to a variety of possible words and phrases used to describe the results, and also due to a large number of semantic relations that occur in such descriptions. To speed up the process we are going
182
M. Jaszuk, G. Szostek, and A. Walczak
to build the models of symptoms in a semi-automatic way, with the aid of appropriate natural language processing methods (NLP). The source materials for this process can be any texts, like book chapters, or samples of medical descriptions. The only expectation is that the texts should describe symptoms from the domain of interest. The model built in such a way of course will not be perfect and some manual corrections will be necessary. The corrections should, however, be much easier, than manually building the model from scratch.
4 The Model of Diagnostic Technologies From the perspective semantic model of symptoms, a diagnostic technology should be viewed as a tool aimed at observing a collection of medical attributes together with all the features characterizing them. It should be also noted that part of the symptoms can be recorded with more than one DT. Frequently this situation takes place for the case of the interview and the physical examination, because many of the symptoms can be reported both by a patient as well as observed by a physician. To avoid distribution of identical diagnostic data between different semantic models of particular DT we create a single model of symptoms from the domain of interest (ontology of symptoms). Each of the symptoms is then linked with the appropriate nodes representing technologies used to record them (Fig. 3). As a result we get the ontology of diagnostic technologies. A particular DT model is a substructure of the whole DT ontology, and can be easily extracted from the whole model.
Fig. 3 The general structure of the semantic model of diagnostic technologies
Ontology Design for Medical Diagnostic Knowledge
183
It should be noted here that the model presented in Fig. 3 does not take into account the internal structure of symptoms. The models of symptoms are built of elementary terms which are single nouns, adjectives or simple phrases. But the internal structure of a particular symptom is not important from the whole DT model perspective. Fig. 4 illustrates a fragment of the TD model with sample technologies and sample symptoms for each of them. Let us take for example a common symptom like cough. We can distinguish many different kinds of cough like cough productive, or cough non-productive. The two identified types of cough are two different symptoms, which consist of the core symptom (cough), and an adjective characterizing it. In consequence the semantic model of each of the symptoms is built of two nodes. But viewed from the perspective of the DT model (Fig. 3,4), each of the symptoms is a single node. This indicates that we have to deal with two conceptual spaces. The first space consists of elementary terms coming from natural language, from which the semantic models of symptoms are built. The second conceptual space is spanned by diagnostic technologies and symptoms. The transformation between the two different spaces is just a reduction of the graph structured semantic model of a symptom to obtain a single node in the DT model. It should be also noted that all the semantic models of symptoms can be divided into two groups: simple symptoms and complex symptoms. The simple symptoms are built of a medical attribute and a single feature (or no feature at all). The mentioned cough symptom is a good example of a simple symptom. The complex symptom is a medical attribute and a combination of multiple features. The symptom presented in Fig. 1 is a good example of such a situation. In this model the medical attribute (Wydzielina śluzowa) is connected with three features and this combination makes the complete description. Every complex symptom can be decomposed into a collection of simple symptoms, and vice versa – complex symptom can always be built given a collection of its components (simple symptoms). From the perspective of the DT the more sensible solution is to construct the model of simple symptoms. Symptoms observed in a particular patient or symptoms specific for a particular disease can always be built as a combination of simple symptoms from the model. Thus the DT model is built only of simple symptoms. Fig. 4 sows some examples of simple symptoms in the model. The discussed complex symptom from Fig. 1 is included in a decomposed form as a collection of simple symptoms linked with the Bronchoscopy DT.
184
M. Jaszuk, G. Szostek, and A. Walczak dyspnoea:appears in a situation:effort
Interview
cough:is characterized with:productive cough:is characterized with:nonproductive deformation:appears in a place:chest
Physical examination
hoarseness . . .
Blood test
erythrocytes:is characterized with:high erythrocytes:is characterized with:low . . . mucous:is characterized with:empurpled
Bronchoscopy
. . .
mucus:is characterized with:residing mucus:specified by:amount:is characterized with:huge mucus:appears in a place:bronchial tree . . .
Fig. 4 The DT model with sample technologies and sample symptoms
5 NLP Methods for Building Semantic Models from Text The process of building semantic models consists of several steps. Completing it requires preparing the domain vocabulary and the synonyms vocabulary. Except that, we need to define a set of possible semantic relations and the sentence schemas repository. All of these resources are used together with a set of NLP methods to identify the semantic models of symptoms in the text.
Ontology Design for Medical Diagnostic Knowledge
185
5.1 Building the Domain Vocabulary The starting point in the process is creation of the domain vocabulary. Such a vocabulary is a selection of words and phrases used in the considered domain. In our case this domain is the description of medical symptoms. The method of creation of this vocabulary relies on comparison of statistical appearance of particular words in texts from the domain and in texts referring to general subject matter [Rutkowski 2005]. As a result we are able eliminate all the words which are unimportant for the assumed purposes. The vocabulary is not restricted only to single words, but it should also contain phrases. Some sample phrases appearing in the vocabulary are: struna głosowa (eng. vocal cord), drogi oddechowe (eng. airways), błona śluzowa (eng. mucosal membrane), etc. To identify the phrases, selected statistical methods are applied [Broda 2007]. 5.2 Building the Vocabulary of Synonyms The domain vocabulary is a source of terminology used to build the semantic models. It should be noted, however, that many of the vocabulary components have their synonyms. Using synonyms to construct the models would lead to a situation in which we get different semantic models representing the same symptoms. Such a situation is not acceptable. Thus it is necessary to identify all the synonyms appearing in the vocabulary. Identification of synonyms is one of the most difficult problems in NLP. In our case this is even more difficult because we need to search synonyms not only among single words but also among phrases. The first step of the work is to create synsets, i.e. sets of words and phrases with synonymic meaning. The creation of synsets is based on computation of semantic similarity for the elements of the domain vocabulary using standard methods (see e.g. [Pedersen 2006]). The methods of semantic similarity computation compare contexts in which particular phrases appear. As a result we get a numerical value, usually between 0 and 1. The higher the value, the more similar are the compared phrases. Some examples of synsets we have to deal with are: {wydzielina śluzowa, śluz} (eng. mucus), {wydzielina ropna, ropa} (eng. pus), {fałd głosowy, struna głosowa} (eng. vocal cord), etc. For the members of the above sets of phrases the semantic similarity will be close to 1. It should be remembered, however, that the semantic similarity detects only candidates for synonyms. Even if the computed similarity is very high, the actual meaning can differ. Let us consider the first two sample synsets: {wydzielina śluzowa, śluz} and {wydzielina ropna, ropa}. The computed semantic similarity will be high not only within the synsets, but also between the members of the separate synsets. The reason is that all of the four expressions are used in similar contexts. The differences in meaning are sometimes so subtle, that for non-expert, it is very hard to decide about incorporating particular phrases to appropriate synsets. Thus the final decision about the contents of particular synsets should be left to an expert in the field of interest. Having created the synsets, we need to identify in each of them the synonym which will be the representative of the whole synset. The optimal choice would be
186
M. Jaszuk, G. Szostek, and A. Walczak
the synonym which is the most frequently used. This synonym will be the name of the concept representing the whole synset. Having the set of concepts, we can start constructing the semantic models of symptoms. 5.3 Relations between Classes The general structure of the symptoms semantic model assumes a number of relations between classes from the lowest level of the ontology. The problem that we need to solve is identification of a set of possible relations between semantic classes. This task is performed manually because the set of possible relations is relatively small like the number of classes from the higher levels of the ontology hierarchy. Some of the relations, like charakteryzuje_się, wystepuje_w_miejscu, nasila_się_w_czasie, have already been introduced. Every relation except its name has its domain and range, i.e. the two sets of classes which are linked with this relation. Unfortunately identifying precisely the domain and range for each of the relations is relatively hard because of the huge number of classes. Thus we assume to complete this task with support of automatic methods. The source material for finding the semantic relations is the text from the domain of interest. In our case this is the text describing symptoms. First the morphosyntactic analysis of the text is made. This process consists of a number of steps: 1. Segmentation of text into separate sentences. 2. Tokenization – the text it is divided into tokens (words, numbers, punctuation marks). 3. Morphologic analysis – a set of inflexive analyses is generated for each of the words in the text with the aid the morphologic analyzer [WWW-2 2010]. 4. Linking of prepositions – allows for partial disambiguation of words linked with the prepositions. 5. Analysis of nouns with linguistic rules – allows for disambiguation of a noun case. 6. Disambiguation of adjectives – allows finding pairs noun-adjective. Details of the above steps are not the subject of this paper and will be discussed separately. The most important thing is the result of the morpohosyntactic analysis. After the whole process each of the words in the text is annotated with its morphologic tag (grammatical characteristics: the part of speech, the number, the case, the tense, the gender, etc.). The morphosyntactic analysis detects also selected semantic relations: charakteryzuje_się (relation between a noun and an adjective), występuje_w_miejscu (relation between a noun and another noun in the locative case), and dotyczy (relation between a noun and another noun in the genitive case). These three relations, however, are not all which are possible. To detect the remaining relations other methods have to be used. The morphologic annotations refer to single words. As mentioned earlier, the domain vocabulary contains not only single words, but also phrases. Thus, when we move to the semantic level, it is necessary to identify phrases in each of the sentences. The source of phrases is the domain vocabulary.
Ontology Design for Medical Diagnostic Knowledge
187
After identifying the phrases we can start detecting the remaining semantic relations in the sentences. As already mentioned the morphosyntactic analysis identifies selected relations between the elements of the sentences. This is usually not enough to complete the model which can be detected in the sentence. In the whole sentence we should be able to identify a model which can be represented as a graph similar to the one presented in Fig. 1. The detected up to this moment relations are subgraphs of the whole graph. Our task now is to find relations linking the separate subgraphs. This problem can be solved using a set of sentence schemas [Polański 1980] and the assignment of the classes from the lowest level of the symptoms ontology hierarchy to the classes from the higher levels. The detailed description of the algorithm of using the sentence schemas goes beyond the scope of this article. It is enough to say that the key element of every sentence schema is a verb. After finding the verb and identifying the appropriate sentence schema we are able to integrate the subgraphs into a complete model of a symptom. 5.4 Building the Ontology of Symptoms A single sentence describing symptoms is a source material allowing for generating a single symptom graph, or multiple graphs if multiple symptoms are described in the sentence. The graphs are built of words and the relations found in the text and describe individual symptoms. This collection of graphs is not an ontology yet. The words should be replaced with the synset names, i.e. the names of semantic classes. After this operation we have a set of graphs consisting of related instances of terms from the ontology. In our approach the instances are particular appearances of a given term in the textual material. According to the definition (see Eq. 4-8), such a model can be considered a semantic network generated from the ontology to be built. The last step to complete building the ontology is to integrate all the networks into a single model. Every triple taken from the network (relation between two terms) brings a new element to the ontology. Of course the same triples can occur many times in the semantic network, but only the first occurrence of a given triple results in extending the ontology. After collecting all the triples found in the semantic network the ontology building process is finished. As one can note the ontology building process can be considered an inverse transformation starting from a semantic network from which an ontology is extracted. The ontology is, however, only a formal model. It defines the terms and possible relations between them. For us the most important issue are the models of actual symptoms. These models already exist because the symptoms are modeled by the semantic network extracted from the text. The only thing that should be done is clearing the network from the repeating symptoms which came from the source text.
6 Discussion and Conclusions We described a conception of a system designed to semi-automatically build semantic models representing the medical diagnostic knowledge. What distinguishes this system form the other ontology building systems, is that it is not a general
188
M. Jaszuk, G. Szostek, and A. Walczak
purpose system. During its design we knew exactly what kind of data will be extracted from the text. This allowed for creating appropriate vocabularies, and also made possible to make general assumptions about the structure of the model. The model of symptoms built in the described way can have numerous applications. In our approach it is a foundation for building a model of diagnostic technologies. This model can further be used for collecting patient data. The symptom model can also be a foundation for constructing a model of disease units. Given the patient data and semantic models of diseases it is possible to build a decision support system for supporting the diagnostic process. It should also be mentioned, that although the model of symptoms was designed for the Polish language, its general structure is quite universal. This means that there should be no problem with utilizing the structure for constructing similar models for other languages than Polish. The problem is, however, with the natural language processing methods. The methodology presented in this paper is language specific. The Polish language belongs to the inflexive languages group. It seems to be relatively easy to move the described methodology to the other inflective languages, especially to the other slavonic languages. Unfortunately moving the methodology to languages which are not inflexive is more complicated and requires significant changes. Another solution is translating the once created models to other languages.
Acknowledgment This work was financially supported by the European Union from the European Regional Development Fund under the Operational Programme Innovative Economy (Project no. POIG.02.03.03-00-013/08).
References [Broda 2007] Broda, B., Derwojedowa, M., Piasecki, M.: Recognition of structured collocations in an inflective language. In: Proc. International Multiconference on Computer Science and Information Technology - 2nd International Symposium Advances in Artificial Intelligence and Applications (AAIA 2007), Wisła, Poland, pp. 237–246 (2007) [Navigli 2004] Navigli, R., Velardi, P.: Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics 30(2), 151–179 (2004) [Pedersen 2006] Pedersen, T., Pakhomov, S.V.S., Patwardhan, S., et al.: Measures of semantic similarity and relatedness in the biomedical domain. J. of Biomedical Informatics 40, 288–299 (2006) [Polański 1980] Polański, K.: The syntactic-generative dictionary of Polish verbs, Kraków, vol. 1–7 (1980- 1993) (in Polish) [Rutkowski 2005] Rutkowski, W.: Automatic methods of semantic networks building for search systems. Akademia Ekonomiczna w Poznaniu, Poznań. Master’s thesis (2005) (in Polish)
Ontology Design for Medical Diagnostic Knowledge
189
[Szczeklik 2006] Szczeklik, A.: Internal diseases. Medycyna praktyczna, Kraków (2006) (in Polish) [WWW-1 2010] OWL web ontology language reference, http://www.w3.org/TR/owl-ref/ (accessed November 27, 2010) [WWW-2 2010] The Protègè ontology editor and knowledge acquisition system, http://protege.stanford.edu/ (accessed October 3, 2010) [WWW-3 2010] Ontology, http://ontologyonline.org/ (accessed November 27, 2010) [WWW-4 2010] The morphologic analyser Morfeusz SGJP, http://sgjp.pl/morfeusz/ (in Polish) (accessed November 27, 2010)
Rule-Based Analysis of MMPI Data Using the Copernicus System J. Gomuła1,2, W. Paja3, K. Pancerz3, and J. Szkoła3 1
The Andropause Institute, Medan Foundation, Warsaw, Poland
[email protected] 2 Cardinal Stefan Wyszyński University in Warsaw, Poland 3 Institute of Biomedical Informatics, University of Information Technology and Management in Rzeszów, Poland {wpaja,kpancerz,jszkola}@wsiz.rzeszow.pl
Abstract. Our research concerns psychometric data coming from the Minnesota Multiphasic Personality Inventory (MMPI) test. MMPI is used to count the personality-psychometric dimensions which help specialists in diagnosis of mental diseases. In this paper, we present a part of the Copernicus system – the tool for computer-aided diagnosis of mental diseases based on personality inventories. This part is devoted to the rule-based analysis of the MMPI data expressed in the form of the so-called profiles. The paper characterizes the knowledge base embodied in Copernicus which can be used for the rule-based analysis of the patients’ MMPI data as well as the functionality of the designed tool.
1 Introduction For several decades, an increasing attention has been focused on various methods and algorithms of data mining and data analysis. Researches in the area of the socalled computational intelligence are strongly developed. A lot of computer tools for data mining and analysis have been proposed. However, the majority of such tools are the general-purpose systems requiring some user’s credentials in computer science. This concerns also graphical user interfaces. In the case of tools supporting a medical diagnosis, there is a need to develop dedicated and specialized computer systems with suitable graphical user interfaces permitting their use in the medical community. In this paper, the part of the tool called Copernicus for analysis of the MMPI data in the form of profiles of patients with mental disorders is presented. A clinical base for this tool is the Minnesota Multiphasic Personality Inventory (MMPI) test [Dahlstrom et al. 1986] delivering psychometric data on patients with selected mental disorders. MMPI is one of the most frequently used personality tests in clinical mental health as well as psychopathology (mental and behavioral disorders).
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 191–203. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
192
J. Gomuła et al.
The test builds upon multidimensional and empirical presumptions. It was designed and published first in 1943 in a questionnaire form by a psychologist S.R. McKinley and neuropsychiatrist J.Ch. Hathaway from the University of Minnesota. Later the inventory was adapted in above fifty countries. The MMPIWISKAD personality inventory is a Polish adaptation of the American inventory. It has been used among other modern tools for carrying out nosological differential diagnosis in psychiatric wards, counseling services, prison ward services as well as in the public hospitals for collective screenings and examinations (e.g. professional soldiers for the public/military services such as the army, fire brigades, police and airline pilots). Interestingly, effectiveness of a prescribed therapy can be evaluated by means of such an inventory. MMPI is also commonly used in scientific research. The test is based upon the empirical approach and originally was translated by M. Chojnowski (as WIO) and elaborated by Z. Płużek (as WISKAD) in 1950. American norms were accepted there. The clinical scales in the MMPIWISKAD are not the scales of clear symptoms, since the questions (items) simultaneously draw, in most cases, upon several scales. On the basis of the received responses (“Yes”, “Cannot Say”, “Not” ) to selected questions we may make up a total count for the subject being examined, measuring obtained raw results against the reference and clinical scales, as being directly related to specific questions (items). The profile (graph) that is built for such a case always has a fixed and invariable order of its constituents as distributed on the scales: - the validity part consists of three scales: L - the scale of lying which is sensitive to all false statements aiming at representing ourselves in a better light , F the scale which detects atypical and deviatory answers to all items in the test , K – the scale which examines self defensive mechanisms and detects subtler attempts of the subject being screened at falsifying and aggravation. - then the clinical part: basic clinical scales have numbers attributed so that a profile can be encoded to avoid negative connotations connected with the names of scales: 1. (Hp- Hypochondriasis), 2. (D - Depression), 3. (Hy - Hysteria), 4. (Ps - Psychopathic Deviate), 5. (Mf – Masculinity/Femininity), 6. (Pa - Paranoia), 7. (Pt - Psychastenia), 8. (Sc - Schizophrenia), 9. (Ma - Hypomania), 0. (It – Social introversion). Each scale has a clinical profile offset assigned to allow characterization of the part called narrative diagnosis to verify which conditions are fulfilling/matching a set of logical rules defined by a profile. In years 1998-1999 a team of researchers consisting of W. Duch, T. Kucharski, J. Gomuła, R. Adamczak created two independent rule systems devised for the nosological diagnosis of persons that may be screened with the MMPI-WISKAD test. The test capitalizes on a set of rules induced from the C 4.5 decision tree algorithm and from a resulting FSM neuro-fuzzy network. This type of self-taught networks can grow upon tagged examples (cases) – but when under constant supervision or with a tutor. Such an approach appeared to be very intriguing and promising because
Rule-Based Analysis of MMPI Data Using the Copernicus System
193
by means of such rule sets (fulfilling assumed logical conditions) we can rationally and in a structured way present a diagnosis with a substantial degree of exactness and, more importantly, prove such diagnosis using all conditions fulfilling a chosen logical decision rule. It has also turned out that a few dozens of logical rules is enough to devise a diagnostic rules system that effectively establishes a nosological diagnosis. Such few rules comprise usually only small expert decision systems: across twenty nosological groups (containing, of course, a clinical norm group) and the aid of C4.5 algorithm, 61 diagnostic rules were derived as compared to the FSM network where 98 diagnostic rules were necessary. As the next stage a standard 10 cross-validation test was carried out to estimate accuracy. Having compiled the knowledge base of the already diagnosed cases we can easily construe the rules knowledge base enabling the diagnosis of new cases. The resulting knowledge base can be taught new cases making it even more diagnostically useful in the future.
2 MMPI Data Each case (patient) x is described by a data vector a(x) consisting of thirteen descriptive attributes: a(x)=[a1(x), a2(x), ..., a13(x)]. If we have training data, then to each case x we additionally add one decision attribute d – a class to which a patient is classified. Each class corresponds to one of psychiatric nosological types: norm (norm), neurosis (neur), psychopathy (psych), organic (org), schizophrenia (schiz), delusion syndrome (del.s), reactive psychosis (re.psy), paranoia (paran), manic state (man.st), criminality (crim), alcoholism (alcoh), drug induction (drug), simulation (simu), dissimulation (dissimu), and six deviational answering styles (dev1, dev2, dev3, dev4, dev5, dev6). For the training data, we formally have a decision system (decision table) S=(U,A,d) in the Pawlak’s form [Pawlak 1991], where U is a set of cases (patients), A is a set of descriptive attributes corresponding to scales, and d is a decision attribute determining a nosological type (class, category). Additionally, for each nosological type y we have defined a pattern vector p(y) consisting of thirteen descriptive attributes, i.e., p(y)=[p1(y), p2(y), ..., p13(y)]. In data and pattern vectors, respectively, a1 (p1) corresponds to the scale of laying L, a2 (p2) corresponds to the scale of atypical and deviational answers F, a3 (p3) corresponds to the scale of self defensive mechanisms K, a4 (p4) corresponds to the scale of Hypochondriasis (1.Hp), a5 (p5) corresponds to the scale of Depression (2.D), a6 (p6) corresponds to the scale of Hysteria (3.Hy), a7 (p7) corresponds to the scale of Psychopathic Deviate (4.Ps), a8 (p8) corresponds to the scale of Masculinity/Femininity (5.Mc), a9 (p9) corresponds to the scale of Paranoia (6.Pa), a10 (p10) corresponds to the scale of Psychasthenia (7.Pt), a11 (p11) corresponds to the scale of Schizophrenia (8.Sc), a12 (p12) corresponds to the scale of Hypomania (9.Ma), a13 (p13) corresponds to the scale of Social introversion (0.It).
194
J. Gomuła et al.
Values of attributes are expressed by the so-called T-scores. The T-scores scale, which is traditionally attributed to MMPI, represents the following parameters: offset ranging from 0 to 100 T-scores, average equal to 50 T-scores, standard deviation equal to 10 T-scores. Data or pattern vectors can be represented in a graphical form as the so-called MMPI profiles. The profile always has a fixed and invariable order of its constituents (attributes, scales). Let a patient x be described by the data vector a(x)=[56, 78, 55, 60, 59, 54, 67, 52, 77, 56, 60, 68, 63]. Its profile is shown in Figure 1. The data set examined in the Copernicus system was collected by T. Kucharski and J. Gomuła from the Psychological Outpatient Clinic.
Fig. 1 MMPI profile of a patient (example); suppressors +0.5K, +0.4K, +1K, +0.2K – a correction value from raw results of scale K added to raw results of selected clinical scales
3 Knowledge-Base for Rule-Based Analysis Organizing available domain knowledge as well as dealing with the knowledge acquired through data mining methods can be realized in many different ways [Cios et al. 2007]. The organization of knowledge is fundamental to all pursuits of data mining. We can consider the main categories of knowledge representation schemes such as, rules, frames, graphs and networks. Knowledge representation in the form of rules is the closest method to the human activity and reasoning, among others, in making the medical diagnosis. Therefore, in the Copernicus system, rule-based analysis of MMPI data is one of the most important parts of the tool. The knowledge base embodied in the Copernicus system consists of a number of rule sets generated by different data mining and machine learning algorithms. In the most generic format, medical diagnosis rules are conditional statements of the form: IF conditions (symptoms) THEN decision (diagnosis).
Rule-Based Analysis of MMPI Data Using the Copernicus System
195
The rule expresses the relationship between symptoms determined on the basis of examination and diagnosis which should be taken for these symptoms before the treatment. In our case, symptoms are determined on the basis of results of patient’s examination using the MMPI test and they are expressed in ten T-scores basic clinical scales described in Section 2. Additionally, we add three validity scales detecting some incorrect examination. Values of all scales (validity and clinical) can be treated as continuous quantitative data. The total number of values covers a specific interval (from 0 to 100 T-scores). Building classification rules for such data can be difficult and/or highly inefficient. Therefore, for some rule generation algorithms, the so-called discretization is a necessary preprocessing step [Cios et al. 2007]. Its overall goal is to reduce the number of values by grouping them into a number of intervals. In many cases, discretization enables obtaining a higher quality of classification rules. Some discretization techniques based on rough sets and Boolean reasoning have been presented in [Bazan et al. 2000]. On the other hand, some algorithms (especially based on decision trees) applied for continuous data lead to rules with conditions in the form of intervals (for example, C4.5 - the most known decision tree generation algorithm). In general, a rule R used in the Copernicus system has the form: R: IF ai1(x) in [xli1, xri1] AND ... AND aik(x) in [xlik, xrik] THEN d(x) is dm, where ai1, ..., aik are selected scales (validity and clinical), xli1, xri1, ..., xlik, xrik are the left and right endpoints of intervals, respectively, d is a diagnosis, dm is one of psychiatric nosological types proposed for the diagnosis. In the literature, a lot of measures (factors) for assessment of rule qualities has been proposed. In the current version of the Copernicus, we use the most known standard measures. Let a set Rul(S) of classification rules be generated from the decision system S=(U,A,d), where the set U consists of n cases. For each rule R in Rul(S) having the form IF ai1(x) in [xli1, xri1] AND ... AND aik(x) in [xlik, xrik] THEN d(x) is dm we can calculate two values denoted as nC and nC+D. nC is a number of objects x in U such that ai1(x) belongs to [xli1, xri1] and ... and aik(x) belongs to [xlik, xrik] whereas nC+D is a number of objects x in U such that ai1(x) belongs to [xli1, xri1] and ... and aik(x) belongs to [xlik, xrik] and d(x) is dm. Each rule R in Rul(S) can be characterized by the following factors: • the accuracy factor ACC(R)= nC/nC+D, • the support factor SUPP(R)= nC/n, • the quality factor QUAL(R)= ACC(R)* SUPP(R), • the length factor LENGTH(R)=card({ai1, ..., aik}), where card denotes the cardinality of a given set. The accuracy factor expresses certainty (confidence) of the rule. In the term of quality, the closer to 1 the accuracy factor is, the better the rule. The support factor expresses the strength of the rule in a given set of cases. In the term of quality, the closer to 1 the support factor is, the better the rule. The quality factor is some
196
J. Gomuła et al.
aggregate measure for two previous factors. The length factor expresses generality of the rule in a given set of cases. The smaller the length factor is, the more general the rule. The last factor is important from the point of view of the Minimum Description Length (MDL) principle. Shortly speaking, in this principle, learning of classification rules can be understood as the ability to come up with a compact description of the data. Moreover, one of the most prominent paradigms in data model building is the Occam’s razor [Cios et al. 2007]. It states that the simplest model of the observed phenomena, in a given domain, is most likely to be the correct one. This paradigm is not always important for decision trees and decision (classification) rules. They cannot be too general. Currently, the knowledge base embodied in the Copernicus system has been created using different external algorithms and methods of rule generation and verification available in the popular data mining and machine learning software tools: WEKA, Rough Set Exploration System (RSES), RuleSEEKER, TreeSEEKER, NGTS. WEKA is a collection of machine learning algorithms for data mining tasks [Witten and Frank 2005]. In the WEKA system the most known algorithm C4.5 for decision tree generation is available. The Rough Set Exploration System (RSES) [Bazan and Szczuka 2005] is a software tool featuring a library of methods and a graphical user interface supporting a variety of rough set based computations. The RSES system enables us to generate rules using, among others, the genetic, covering and LEM2 algorithms. The first algorithm originates in an order-based genetic algorithm coupled with heuristic (see [Bazan et al. 2000]). It uses reducts for rule generation. Another two algorithms are based on a covering approach. A covering algorithm is described in [Bazan et al. 2000]. The LEM2 algorithm was proposed by J. Grzymala-Busse in [Grzymala-Busse 1997]. Covering-based algorithms produce less rules than algorithms based on an explicit reduct calculation. They are also (on average) slightly faster. The TreeSEEKER system contains several algorithms to generate decision trees. One of them is the TVR (Tree-Via-Rules) algorithm. In this algorithm the decision tree is created from fragments, which are sequences of paths from selected attributes to the decision attribute. In this way, a set of decision rules is also generated [Knap 2009]. The NGTS system is developed to generate decision rules using well-known algorithm called GTS (General-To-Specific). It is one of the sequential covering algorithms presented in [Hippe 1997]. The sets of rules obtained were improved using the RuleSEEKER system. The main optimizing process was based on an exhaustive application of a collection of generic operations [Paja and Hippe 2005]: finding and removing redundancy, finding and removing incorporative rules, merging rules, finding and removing unnecessary rules, finding and removing unnecessary conditions, creating missing rules, discovering hidden rules, rule specification, selecting final set of rules.
Rule-Based Analysis of MMPI Data Using the Copernicus System
197
Each set of generated classification rules were evaluated – via testing the classification accuracy of unseen cases. Several examples are stated in Tables 1 - 4.
116
68,24%
54
31,76%
0
0,00%
170
113
66,47%
57
33,53%
0
0,00%
L03
1535
510
T03
170
110
64,71%
60
35,29%
0
0,00%
L04
1535
510
T04
170
119
70,00%
51
30,00%
0
0,00%
L05
1535
510
T05
170
106
62,35%
64
37,65%
0
0,00%
L06
1535
499
T06
170
111
65,29%
59
34,71%
0
0,00%
L07
1535
508
T07
170
114
67,06%
56
32,94%
0
0,00%
L08
1535
521
T08
170
119
70,00%
51
30,00%
0
0,00%
L09
1535
504
T09
170
124
72,94%
46
27,06%
0
0,00%
L10
1530
510
T10
175
114
65,14%
61
34,86%
0
0,00%
Not classified cases [%]
170
T02
Not classified cases
Inorrectly classified cases [%]
Incorrectly classified cases
Correctly classified cases [%]
Correctly classified cases
T01
502
Testing set
510
1535
Number of rules
1535
L02
Number of cases
L01
Training set
Number of cases
Table 1 Results for the NGTS algorithm
136
80,00%
34
20,00%
0
0,00%
L02
1535
466
T02
170
131
77,06%
39
22,94%
0
0,00%
L03
1535
508
T03
170
134
78,82%
36
21,18%
0
0,00%
L04
1535
415
T04
170
142
83,53%
28
16,47%
0
0,00%
L05
1535
411
T05
170
140
82,35%
30
17,65%
0
0,00%
L06
1535
405
T06
170
137
80,59%
33
19,41%
0
0,00%
L07
1535
477
T07
170
136
80,00%
34
20,00%
0
0,00%
L08
1535
469
T08
170
137
80,59%
33
19,41%
0
0,00%
L09
1535
466
T09
170
134
78,82%
36
21,18%
0
0,00%
L10
1530
409
T10
175
137
78,29%
38
21,71%
0
0,00%
Not classified cases [%]
170
Not classified cases
Inorrectly classified cases [%]
Incorrectly classified cases
Correctly classified cases [%]
Correctly classified cases
T01
Testing set
416
Number of rules
1535
Number of cases
L01
Training set
Number of cases
Table 2 Results for the TVR algorithm
198
J. Gomuła et al.
117
68,82%
53
31,18%
0
0,00%
170
110
64,71%
60
35,29%
0
0,00%
L03
1535
426
T03
170
109
64,12%
61
35,88%
0
0,00%
L04
1535
419
T04
170
117
68,82%
53
31,18%
0
0,00%
L05
1535
424
T05
170
106
62,35%
64
37,65%
0
0,00%
L06
1535
408
T06
170
109
64,12%
61
35,88%
0
0,00%
L07
1535
427
T07
170
115
67,65%
55
32,35%
0
0,00%
L08
1535
419
T08
170
118
69,41%
52
30,59%
0
0,00%
L09
1535
416
T09
170
122
71,76%
48
28,24%
0
0,00%
L10
1530
430
T10
175
118
67,43%
57
32,57%
0
0,00%
Not classified cases [%]
170
T02
Not classified cases
Inorrectly classified cases [%]
Incorrectly classified cases
Correctly classified cases [%]
Correctly classified cases
T01
427
Testing set
423
1535
Number of rules
1535
L02
Number of cases
L01
Training set
Number of cases
Table 3 Results for the NGTS algorithm with optimization
Inorrectly classified cases [%]
Not classified cases
170
124
73,10%
11
6,40%
46
26,90%
T02
170
135
79,50%
10
5,90%
35
20,50%
L03
1535
256
T03
170
135
79,50%
16
9,60%
35
20,50%
L04
1535
241
T04
170
129
76,00%
6
3,80%
41
24,00%
L05
1535
227
T05
170
136
80,10%
15
8,80%
34
19,90%
L06
1535
257
T06
170
117
69,00%
13
7,60%
53
31,00%
L07
1535
223
T07
170
139
81,90%
10
5,70%
31
18,10%
L08
1535
248
T08
170
127
74,90%
13
7,80%
43
25,10%
L09
1535
226
T09
170
135
79,50%
10
5,90%
35
20,50%
L10
1530
219
T10
175
124
70,80%
15
8,30%
51
29,20%
Not classified cases [%]
Incorrectly classified cases
Correctly classified cases [%]
Correctly classified cases
T01
225
Testing set
267
1535
Number of rules
1535
L02
Number of cases
L01
Training set
Number of cases
Table 4 Results for the LEM2 algorithm
To determine ability of generated rules for classification of new cases a crossvalidation method was used. Cross-validation is frequently used as a method for evaluating classification models. It comprises of several training and testing runs. First, the data set is split into several, possibly equal in size, disjoint parts. Then, one of the parts is taken as a training set for rule generation and the remainder (the sum of all other parts) becomes the test set for rule validation. In our experiments a standard 10 cross-validation test was used (CV-10).
Rule-Based Analysis of MMPI Data Using the Copernicus System
199
4 The Copernicus System The Copernicus system supporting clinical psychologists in differential and clinical diagnosis based on the overall analysis of profiles of patients examined by means of personality inventories is a tool designed for the Java platform. This makes a tool a modern platform-independent, object-oriented, user-friendly application.
Fig. 2 The Graphical User Interface of Copernicus
The current version of this tool offers the following main functions: • Locating patients in a profile space using a wide variety of measures and indexes (e.g., general distance measures, specialized measures, psychopathology indexes). • Matching patient profiles to patterns of disorders using dendrograms generated by different clustering methods with a suitable visualization. • Creating diagrams for psychopathological indexes defined in the professional literature (e.g., Leary’s indexes, Diamond’s indexes, cf. [Dahlstrom et al. 1986]). • Visualizing patient profiles on the background of patterns of disorders as well as decision rules generated by popular data mining systems. An important thing is a unique visualization of decision rules (in the form of stripes put on profiles) supporting the nosological diagnosis. • Making diagnostic decisions for patients described by MMPI profiles on the basis of classification functions [Cios et al. 2007] obtained using the discriminant analysis module from the STATISTICA package and/or on the basis of rules embodied in the knowledge base. This knowledge base consists of several rule sets generated by different data mining and machine learning algorithms (see Section 3).
200
J. Gomuła et al.
The design and implementation of a presented tool take into consideration its modularity. Therefore, the tool can be easily extended to other intelligent methods used in data mining and analysis as well as to other kinds of data, for example, coming from another inventories. The graphical user interface of Copernicus is shown in Figure 2.
5 Rule-Based Analysis of MMPI Profiles Current status of research supports the idea that visualization plays an important role in professional data mining. Some pictures often represent data better than expressions or numbers. Visualization is very important in dedicated and specialized software tools used in different (e.g., medical) communities.
Fig. 3 Visualization of rules and profiles in Copernicus (example)
In the Copernicus system, a special attention was paid to the visualization of analysis of MMPI data for making a diagnosis decision easier. A unique visualization of classification rules in the form of stripes put on profiles has been designed and implemented. A visualization surface comprises two-dimensional space which will be called a profile space. The horizontal axis is labeled with ordered validity and clinical scales whereas the vertical axis is labeled with T-scores. An exemplary profile in this space has been shown in Figure 1.
Rule-Based Analysis of MMPI Data Using the Copernicus System
201
Each rule R in the form IF ai1(x) in [xli1, xri1] AND ... AND aik(x) in [xlik, xrik] THEN d(x) is dm can be graphically presented as a set of stripes placed in the profile space. Each condition part aij(x) in [xlij, xrij], where j=1, ..., k, from the rule R is represented as a vertical stripe on the line corresponding to the scale aij. This stripe is restricted from both the bottom and the top by values xlij, xrij, respectively. For each case x, a profile for x is said to be matched to a rule R if and only if ai1(x) belongs to [xli1, xri1] and ... and aik(x) belongs to [xlik, xrik]. The fact that x matches to R will be denoted by x |= R. Matching can be easily seen due to visualization of rules and profiles in the same space (see Figure 3). Let us consider a patient x described by the data vector a(x)=[56, 78, 55, 59, 58, 54, 67, 52, 77, 56, 86, 68, 59] (cf. Section 2). If we have two rules in the form: • R1: IF a3(x) in [54, 59] AND a4(x) in [56, 59] AND a11(x) in [85, 108] THEN d(x) is paran, • R2: IF a8(x) in [20, 55] AND a9(x) in [78, 79] AND a13(x) in [58, 59] THEN d(x) is norm, then we have two situations of mutual settings of rules and profiles. In the first case, a profile of x is fully matched to the rule R1 whereas in the second case, a profile of x is only partly matched to the rule R2. On the basis of rules a proper diagnostic decision can be made. Let Rul(S) be a set of rules constituting the knowledge base embodied in the Copernicus system and Rulmx(S) be the subset of Rul(S) including all rules to which the profile of x is matched. In general, rules from Rulmx(S) indicate different nosological classes to which a patient x can be classified. Therefore, there is a need to select only one best main decision. To make such a decision for a given patient x, proper aggregation factors are calculated. The aggregation factor is a weighted sum of different factors characterizing rules to which x is matched. Let di be one of nosological classes indicated by some rules from Rulmx(S). The class indicated by a given rule R will be denoted by CLASS(R). In the Copernicus system, we have implemented, among others, the following aggregation factors for the rules indicating the class di: • AGGR1(di)=card({Rj: x |= Rj and CLASS(Rj)= di})/card({Rj: x |= Rj}), • AGGR2(di)=0.8·max({QUAL(Rj): x |= Rj and CLASS(Rj)= di})+0.2·card({Rj: x |= Rj and CLASS(Rj)= di})/card({Rj: x |= Rj}), • AGGR3(di)=0.6·max({QUAL(Rj): x |= Rj and CLASS(Rj)= di})+0.2·avg({1– LENGTH(Rj): x |= Rj and CLASS(Rj)= di})+0.2 card({Rj: x |= Rj and CLASS(Rj)= di})/card({Rj: x |= Rj}), where functions used in formulas have the following meaning: card – cardinality of the set, max – the maximum value, avg – the average value whereas and is a conjunction operator. The meaning of |= was explained earlier. The aggregation factor AGGR1(di) is the simplest one. It expresses the relative number of rules denoting the class di in the set Rulmx(S) of all rules matched by x. The second aggregation factor AGGR2(di) takes into consideration also the maximal value of quality factors of rules from Rulmx(S) indicating the class di.
202
J. Gomuła et al.
The third aggregation factor AGGR3(di) takes into consideration additionally the average length of rules from Rulmx(S) indicating the class di. In this case, the smaller the average length is, the better the set of rules. The weighting coefficients can be elastically changed. For a given patient x, the class dk (for which the value of the selected aggregation factor is the greatest among all classes indicated by rules from Rulmx(S)) is taken as the best main diagnostic decision. If the classes indicated by rules from Rulmx(S) are sorted in the non-increasing order of their values of aggregation factors, then the consecutive classes (with adequately large values of the aggregation factors) can constitute some kind of supplementary diagnostic decision.
6 Conclusions In this paper, we have described a part of the Copernicus system - a new computer tool for analysis of MMPI profiles of patients with mental disorders. This part is focused on rule-based analysis of MMPI data. We have examined several algorithms of logical rule generation for compiling data on the mental disorders coming from the MMPI test. Rules generated using these algorithms constitute the knowledge base for rule-based analysis and decision making implemented in the Copernicus system. Copernicus is evolving continuously. It will be extended by supporting interpretation systems for MMPI based on a quantitative intra- and inter-profile approach as well as user-friendly and useful for psychologist-diagnostician visualization of results obtained in this way. Our main goal is to deliver to diagnosticians and clinicians an integrated tool supporting the comprehensive diagnosis of patients. The Copernicus system is flexible and it can be also diversified into supporting differential diagnosis of profiles of patients examined using another personality inventories.
Acknowledgment The research has been partially supported by the grant from the University of Information Technology and Management in Rzeszów, Poland.
References [Bazan et al. 2000] Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wroblewski, J.: Rough set algorithms in classification problem. In: Polkowski, L., Tsumoto, S., Lin, T.Y. (eds.) Rough Set Methods and Applications, pp. 49–88. Physica-Verlag, Heidelberg (2000) [Bazan and Szczuka 2005] Bazan, J.G., Szczuka, M.S.: The rough set exploration system. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets III, pp. 37–56. Springer, Heidelberg (2005)
Rule-Based Analysis of MMPI Data Using the Copernicus System
203
[Cios et al. 2007] Cios, K., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data mining. A knowledge discovery approach. Springer, Heidelberg (2007) [Dahlstrom et al. 1986] Dahlstrom, W., Welsh, G., Dahlstrom, L.: An MMPI handbook, vol. 1-2. University of Minnesota Press, Minneapolis (1986) [Grzymala-Busse 1997] Grzymala-Busse, J.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997) [Hippe 1997] Hippe, Z.S.: Machine learning – a promising strategy for business information processing? In: Abramowicz, W. (ed.) Business Information Systems, Academy of Economics Editorial Office, Poznan, pp. 603–622 (1997) [Knap 2009] Knap, M.: Research on new algorithms for decision trees generation. Ph. D. Thesis, AGH University of Science and Technology, Krakow (2009) (in Polish) [Paja and Hippe 2005] Paja, W., Hippe, Z.S.: Feasibility studies of quality of knowledge mined from multiple secondary sources. I. Implementation of generic operations. In: Klopotek, M., Wierzchon, S., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining, pp. 461–465. Springer, Heidelberg (2005) [Pawlak 1991] Pawlak, Z.: Rough Sets. Theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht (1991) [Witten and Frank 2005] Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation K. Adamczyk1,2 and A. Walczak2 1
Department of Applied Information Systems, University of Information Technology and Management, Rzeszow, Poland
[email protected] 2 Information Systems Institute, Military University of Technology, Warsaw, Poland
[email protected]
Abstract. We present an interpolation algorithm for digital images based on the extended Edge-Directed Interpolation. The proposed algorithm utilises localisation and orientation of edges extracted with wavelet edge extractors. The algorithm chooses an interpolation method of a given point of an image, depending on localisation with respect to extracted edges. We apply our method to grey scale digital images. In this paper we propose also some modifications to the method. We present the results obtained with our algorithm, in comparison to a few popular interpolation algorithms and we propose a plan of further algorithm extensions.
1 Introduction Rapid development of digital images acquisition hardware is driven by higher optical sensors resolution. It results in higher resolution of digital images as well as more captured details. The latest digital cameras, scanners have matrices with 12, 18 or even 20 Mega-pixels. However resolution of digital sensors has its limits. To obtain higher resolution software methods are involved. We should keep in mind that those methods do not increase the number of details in an image, they only reduce the effect of the so called ‘pixelisation’. That's why it is important to choose a method which precisely restores shapes of image structures at the highest possible zoom. Depending on application, we also require robust speed of calculation. The main goal of this article is to present the first stage of novel digital image interpolation method, which extends the Edge Directed Interpolation (EDI) [Allebach and Wong 1996]. EDI methods utilise localisation and orientation of edges extracted from analysed images. The first part of this paper contains a classification of interpolation methods and a short introduction to the adaptive EDI-like interpolation method. The second part presents the Adaptive Edge Extraction Algorithm (AEEA) for images based
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 205–222. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
206
K. Adamczyk and A. Walczak
on two-dimensional anisotropic wavelets, which is used in our interpolation method. In the next part of the paper, we present the main concepts of the first stage of our interpolation method based on AEEA which is focused on interpolation of image areas along edges without corners and we specify the algorithm details. In the last part, we propose some modifications to the method and we compare various interpolation methods to results obtained by final stage of our algorithm. Finally, we present the conclusions and we propose further algorithm development to achieve better results.
2 Interpolation Methods There are many digital images interpolation methods and there is constant development in this area. The methods could be divided into two categories: adaptive methods which adjust their operations to analysed image content, and nonadaptive methods which process the whole image in uniform. Non-adaptive methods are very popular in areas such as cameras, camrecorders, graphical cards and TV sets, because of its processing speed and simple implementation. Examples of this kind methods are: nearest neighborhood, bilinear interpolation and more complex bicubic, spline, sinc interpolation and their variations. They are often applied to image interpolation. They determine the lumination value of an unknown image pixel by different number of its neighbouring pixels analysis. The most trivial method is based on two neighbouring pixels, while the most complex one, sinc, utilises greater number of neighbouring pixels restricted by a specifically selected window function. The sinc method gives better results but it is slower than the trivial methods. Adaptive interpolation methods adjust their operation to the contents of the image. Results obtained with adaptive methods are superior to those of non-adaptive methods. The price that adaptive methods have to pay for it is deteriorated efficiency. Usually adaptive methods merge a few interpolation methods according to the contents of an analysed image part. The very important group among the adaptive algorithms are Edge-Directed Interpolation algorithms [Allebach and Wong 1996]. Their common feature is that they analyse an image using edge extraction. The interpolation is done by a proper interpolation coefficients selection to smooth the image along extracted edges. They do not smooth the image perpendicularly to the direction of edges. As a result EDI algorithms generate images with sharp edges. Examples of such algorithms are: EDI [Allebach and Wong 1996], New Edge Directed Interpolation (NEDI) [Li and Orchard 2001], Improved NEDI (iNEDI) [Asuni and Giachetti 2008] and Markov Random Field Model-Based EDI (MRF EDI) [Li and Nguyen 2008] and other newest methods which will be presented by authors in further analyses.
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation
207
3 Edge Extraction Based on Two-Dimensional Anisotropic Wavelet It's difficult to define an edge in an image. It is possible to find several definitions of edge. According to one of them, edge is a border between areas of different luminosity. It is the step edge. Another definition describes edge as a line and considers it to be a local brightening or darkening in images. It consists of two concurrent step edges. As presented in [Puzio and Walczak 2006] the wavelet transform may be used for edge detection in various scales, where edge is a wavelet support centre localisation, which has coefficient maximum value. The interpolation method presented in this paper utilises information about extracted edges in images. To be more precise, it utilises its localisation and direction. To allocate edge localisation and edge direction a two-dimensional anisotropic wavelet described in [Puzio and Walczak 2006] and edge extraction algorithm AEEA based on this wavelet [Puzio and Walczak 2008] was used. As we have already mentioned, edge extraction method utilises two-dimensional wavelet. This wavelet has multi-resolution and anisotropic properties. Thanks to this it is possible to extract image's details in various scales and directional orientation. Those properties decide that it is possible to use algorithm results presented in [Puzio and Walczak 2008] to our image interpolation method of EDI type.
4 Wavelet Edge-Directed Interpolation – The First Stage The presented algorithm is able to zoom interpolated images two times in any of the X and Y directions. The method applied recurrently enables us to zoom images n-times, where n is power of 2. The proposed method utilises edge extraction results given in [Puzio and Walczak 2008]. Those results contain information about detected edges. Among the obtained data we have information about edge localisation and edge direction in image plane. Besides, we receive information about wavelet size (scale) utilised to edge extraction which corresponds to the size of detected edges. In this article we introduce the ‘edge point’ term which is defined as a centre of a wavelet support for which coefficient calculated according to wavelet transform has its maximum value. 4.1 Allocation of “Edge Areas” As any other interpolation method, our method determines values of ‘new’ pixels based on values of a number of neighbouring pixels. Calculation of a ‘new’ pixel value depends on its localisation in an image. The presented method works in so
208
K. Adamczyk and A. Walczak
called ‘edge areas’ which we define as circular areas around all ‘edge points’ (see Fig. 1). The area size (circle radius) depends on the size (wavelet support) of a wavelet utilised for image edge extraction. Other areas, which do not belong to ‘edge areas’, are interpolated with the bilinear or bicubic method. Depending on the wavelet support of size n which was used for edges extraction, we determine a circular area with radius r around an examined ‘edge point’. n is a width of wavelet support given in pixels. Circular radius r has to satisfy the following condition: r=
n +1 2
(1)
All pixels inside the circle are marked as points belonging to the ‘edge area’. The distance d between the pixels and a corresponding ‘edge point’ is calculated between two points P(x,y) and P'(x',y') as follows: d = PP ' =
( x − x') 2 + ( y − y' ) 2
(2)
where P(x,y) is an ‘edge point’ and P'(x',y') is an ‘edge area’ pixel. This distance should satisfy the following condition: d
(3)
By putting (1) and (2) into (3) we obtain: ( x − x' )2 + ( y − y' )2 <
n +1 2
(4)
P(x,y) r
d P’(x’,y’)
Fig. 1 Location of ‘edge areas’
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation
209
Because we analyse digital image, we assume that distance between neighbouring pixels, which lie in the same vertical or horizontal line, is equal to 1. In that case the coordinates of any pixel could be described as P(x,y) where x and y are natural numbers which represent dilatation with respect to the left-upper image corner. The distance between pixels P(x,y) and P'(x',y') is labeled as |PP'|. The P' pixel coordinates may be expressed as: x ' = x + i and y ' = y + j where i , j ∈ C
(5)
where i and j are P' pixel coordinates dilatation with respect to pixel P in the horizontal and vertical direction respectively. This leads to: | PP ' |= =
( x − x' )2 + ( y − y ' ) 2 =
( x − x − i)2 + ( y − y − j )2 =
( x − ( x + i )) 2 + ( y − ( y + j )) 2 (−i) 2 + (− j ) 2 =
(6)
i2 + j2
The ‘edge area’ for the P(x,y) ‘edge point’ consists of all pixels which satisfy the following condition ∧ − r
i2 + j2 < r
(7)
Fig. 2 Example of ‘edge areas’ allocation in various wavelet scales: (a) scale n=3, (b) scale n=5
To each pixel belonging to ‘edge area’ is added information about its ‘edge point’ spatial orientation. Spatial orientation is defined as a number between 1 and 16. Each number denotes different spatial orientation. Because new pixels value calculation is done by using neighbouring pixels in the area 7x7 pixels, we have 16 basic directions as is presented in Fig. 3.
210
K. Adamczyk and A. Walczak
Fig. 3 16 basic directions in 7x7 pixels neighbourhood
For every ‘edge area’ pixel we choose the direction number closest to ’edge point’ spatial orientation. As a result of image ‘edge areas’ allocation we obtain an image with the ‘edge areas’ map, as presented in Fig. 4.
Fig. 4 Example of an image with ‘edge areas’ map
Such a map points out the image areas which will be interpolated with the presented method. The areas of the image, which do not contain edges, will be interpolated with bilinear or bicubic method. Thanks to this the amount of calculations will be reduced. The interpolation of corner areas will be presented in further analyses. 4.2 Classification of ‘New’ Pixels The next step of our method is to calculate values for ‘new’ pixels belonging to particular ‘edge areas’. We define the ‘new’ pixels as all pixels of the outcome image which were calculated in the interpolation process. Every pixel which belongs to ‘edge areas’ has a fixed information about its spatial orientation. This
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation
211
information is utilised in the pixel value calculation process. The number and localisation of neighbouring pixels used in the calculation of new pixel value varies and depends on this new pixel direction number. Assuming that we interpolate an image with double magnification, the outcome image will have 4 times more pixels than the original image. Interpolated pixels may be divided into 3 types marked as A, B and C. Pixel type depends on its localisation with respect to ‘known pixels’, which originate from original image with lower resolution. For every A, B and C type the ‘known’ direction group is created. The ‘known’ directions are all those, for which in the neighbourhood of size 7x7 lie pixels which originate from the lower resolution image (denoted as black dots in Fig. 5). These black pixels will be called ‘known pixels’. By white dots we denote unknown pixels, which will be interpolated. Red dot denotes an unknown pixel under consideration. All unknown pixels may be classified as belonging to A, B or C type. For every type of pixels we assigned some group of ‘known pixels’. This group is marked with blue dashed line in Fig. 5. Type A group consists of 4 pixels, while type B and C consists of 6 pixels. Except ‘known pixels’ for each type we determine so called ‘known directions’. The ‘known directions’ we define as directions, along which the ‘known pixels’ lie in the neighbourhood of size 7x7 pixels. We marked those directions with a red dashed line in Fig. 5. According to the above, for every type we obtain the following ‘known direction’ numbers: • Type A: 2,5,8,10,13,16 • Type B: 1,4,7,11,14 • Type C: 3,6,9,12,15 For every type of pixels we calculate the unknown pixel value in two ways. The calculation method depends on whether the direction assigned to the pixels is known or not.
Fig. 5 ‘Known pixels’ and ‘known directions’ for three types A, B, C of ‘new’ pixels
212
K. Adamczyk and A. Walczak
4.3 Calculation of Pixel Value for ‘Known Directions’ We assume that I' is an image interpolated from original image I, with twofold zoom and x',y' are its coordinates. For the ‘known direction’ the new pixel value depends on two neighbouring pixels only, which lie along a straight line in that direction and is calculated as arithmetical mean of values of these two pixels. Because the coordinates’ dilatations for every direction are constant, on the basis of considerations presented on Fig. 5 we are able to calculate Z matrix which contains dilatation coefficients with respect to the analysed pixel for all ‘known directions’: a1 1 3 2 3 1 2 1 1 Z = 0 −1 −1 − 2 −1 − 3 − 2 − 3
b1
a2
b2
0
−1
0 − 1 1 − 2 − 1 − 3 − 2 − 3 − 3 − 3 − 2 − 3 − 1 − 2 − 1 − 1
1 −3 1 −2 2 −3 1 −1 3 −2 2
−1
3 3
−1 0
3
1
2 3
1 2
1
1
2
3
1 1
2 3
1 2 3 4 5 6 7
(8)
8 9 10 11 12 13 14 15 16
Using equations (9) and (10) we are able to calculate p1 and p2 parameters: p1 = I ' ( x '+ Z [ k ][ a1 ], y '+ Z [ k ][ b1 ]) p 2 = I ' ( x '+ Z [ k ][ a 2 ], y '+ Z [ k ][ b2 ])
(9) (10)
The mean of parameters p1 and p2 is equal to the unknown pixel value for the ‘known direction’ k: I ' ( x', y' ) =
p1 + p 2 2
(11)
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation
213
4.4 Pixel Value Calculation for ‘Unknown Directions’ In contrast to the previous case, the new pixel value for ‘unknown directions’ depends on 4 neighbouring pixels. Those pixels are marked in Fig. 5 with blue dashed line. In that case we calculate the mean of parameters p1 and p2, but these parameters’ values depend this time on 4 pixels. Every p1 and p2 value depends on 2 pixels P1 and P2 from the group marked in Fig. 5 with blue dashed line, between which lies the direction assigned to the new pixel. For p1 and p2 calculation we use P1 and P2 values with some weights w1 and w2. Weight’s value depends on its distance to a P* point. The P* point is a point where the straight line that connects pixels P1 and P2 crosses the straight line which lies along the direction assigned to the analysed pixel as presented in Fig. 6.
P1(x1,y1)
P*(x*,y*)
α P(x,y)
P2(x2,y2)
Fig. 6 Weight calculation for pixels
The coordinates of point P* may be easily calculated because P1 and P2 pixels always lie on a straight line parallel or perpendicular to X axis. Therefore one of x* and y*values is always known. We calculate the second value using the formula given by (12) and (13): * y * = x *tg α if x is known
x* =
y * if y* is known tg α
(12) (13)
where α is an angle between X axis and a straight line crossing point P* and the interpolated pixel. From P1 and P2 pixels coordinates and P* pixel we may calculate w1 and w2 weights. Assuming that w1+w2=1 we get formulas (14) and (15) respectively:
214
K. Adamczyk and A. Walczak
w1 = 1 −
| P * P1 | | P1 P2 |
(14)
w2 = 1 −
| P * P2 | | P1 P2 |
(15)
Because the distance between neighbouring pixels which lie in one horizontal or vertical line is equal to 1, the distance |P1P2| is always equal to 2. By putting (2) into (14) and (15) we obtain new formulas for calculating weights: w1 = 1 − w2 = 1 −
( x * − x1 ) 2 + ( y * − y1 ) 2
(16)
2 ( x * − x2 ) 2 + ( y * − y2 ) 2
(17)
2
Just like in the previous case, 4 pixels coordinates, which values will be used to the calculation of new pixel value, are constant. Therefore, we may calculate N matrix which contains coefficients of coordinate dilatation with respect to analysed pixel for all ‘unknown directions’ of any type. a1
b1
a2
b2
c1
1 1 1 −1 −1 1 1 1 1 1 − − − 1 0 −1 0 1 1 2 1 2 1 − − − N = − 1 2 1 − 2 −1 1 2 2 −1 − 2 2 1 − 2 −1 0 1 0 −1 − 2 0
d1
c2
d2
1
−1
1
1
2
−1
2 0
1 1
− 1 − 1 − 2 − 2 0 − 1 − 1 − 1
1 −2 1
0
1
2
1 2 3 4 5
(18)
6 7 8
When w1 and w2 values are calculated it is possible to calculate p1 and p2 using formulas: p1 = w1 I ' ( x '+ N [ i ][ a1 ], y '+ N [ i ][ b1 ]) + w 2 I ' ( x '+ N [ i ][ c1 ], y '+ N [ i ][ d 1 ])
(19)
p 2 = w1 I ' ( x '+ N [i ][ a 2 ], y '+ N [ i ][ b 2 ]) + w 2 I ' ( x '+ N [ i ][ c 2 ], y '+ N [ i ][ d 2 ])
(20)
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation
215
where: 1 2 3 4 i= 5 6 7 8
for type A for type A for type B
1,3,4,14,15 directions 6,7,9,11,12 directions 2,3,5,6 directions
for for for for
8,9,10 directions 12,13,15,16 directions 1,2,16 directions 4,5,7,8 directions
type type type type
B B C C
for type C
10,11,13,14
directions
(21)
The next stage is the unknown pixel value calculation with equation (11). It is done for known and unknown directions, as well. The last developed stage of the algorithm is pixels interpolation in parts of an image which does not have ‘edge areas’. These parts of the image are interpolated with bilinear or bicubic method. Because our interpolation technique for the calculation of a ‘new’ pixel value utilises its neighbourhood of size 7x7 pixels, the values of pixels which lie closer than 4 pixels to the image border are calculated using the bilinear or bicubic method, as well.
5 Modified Version of the Method In order to improve the presented method there were made some modifications to it. The original algorithm using the 2D anisotropic wavelet proposed in [Puzio and Walczak 2008] was modified so that it could be better suited for the needs of this method. The changed elements are the way of analyzing an image as well as allocation of pixels to edges. The wavelet analysis is conducted for every pixel of an image. As a result, for every pixel is defined directional orientation, for which the wavelet coefficient is maximal. The bounds of edge allocation are much smaller than those of the original edge extraction algorithm, which results in the fact that even slight changes of brightness are treated as edges. Thanks to the fact, the image is interpolated more precisely. The modified version of the presented method consists of a few stages, the first of which is creating the directional map of the entire original image. Before the modification, the directional map was created as well, but only for the ‘edge areas’. The map results from the wavelet analysis of every pixel of the original image. The directional map of every pixel of the image contains information about its directional orientation and maximal value of wavelet quotient. The Fig. 7 presents the visualisation of directional map for some exemplary images. The pixels values at this visualization are values of directional orientations of this pixels and they are presented in degrees (from 0 to 360) and are normalised to 180.
216
K. Adamczyk and A. Walczak
Fig. 7 Exemplary visualisations of directional maps of some test images
The next step added in the modified version is to assign the directional orientation for ‘new’ pixels of an image. They are assigned by using the directional orientation of the nearest neighbouring pixels. For the A type pixels there are 4 neighbours, for B and C type there are 2 neighbours, as presented in the Fig. 5.
Fig. 8 The way of assigning the directional orientation for a ‘new’ pixel A type in a case when the values of wavelet coefficients c11 and c12 are the biggest and the straight lines determining the directional orientation of original pixels form a.) acute angle, b.) obtuse angle
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation
217
The way of assigning the directional orientation for ‘new’ pixels (P in Fig. 8) depends on the type of a ‘new’ pixel (A, B or C) and the value of maximal wavelet coefficient (c11, c12, c21, c22) of individual known neighbouring pixels (P11, P12, P21, P22). When the straight lines determining the directional orientations of the known pixels (k11, k22) form an acute angle, the directional orientation α of pixel P is expressed as:
α =
α 11 + α 22
(22)
2
when the straight lines k11, k22 form an obtuse angle, the directional orientation α of pixel P is expressed as:
α =
α 11 + α 22 2
+ 90
(23)
Fig. 8 presents the situation when the directional orientation of pixel A is calculated, and the wavelet coefficients c11 and c22 are bigger than c12 and c21. Otherwise, the directional orientation α of pixel P is determined similarly with taking into consideration the two neighbouring pixels, for which values of wavelet coefficients are the biggest. The last step of the method was also slightly modified. It consists in calculation of the value of the ‘new’ pixels on the basis of their previously determined directional orientation. The values of the ‘new’ pixels are calculated on the basis of four neighbouring pixels dependent on the directional orientation of the pixel as well as its type (A, B or C). The values of the neighbouring pixels considered when calculating the ‘new’ one are taken with some weights. The weights (w1 and w2) depend on the distance between the straight line determining the direction of the ‘new’ pixel and the neighbouring pixels that take part in the interpolation. The way of calculating the weights presents the Fig.9.
A
B
C
Fig. 9 Calculating weights (of interpolation coefficients) of A, B and C type pixels
218
K. Adamczyk and A. Walczak
The difference, in relation to the approch described in the 4th section of this article, is that by calculating the value of ‘new’ pixels into consideration is taken only the nearest neighbourhood as presented on the Fig. 9. The next difference consits in the fact that the weights of the pixels taking part in the interpolation are determined every time irrespective of the determined directional orientation. As a result of presented modifications determining the value of a ‘new’ pixel is made according to the pattern (24): P=
p1 + p 2 2
(24)
where for the situation presented on Fig. 9 as follows: p1 = w1 P12 + w2 P22
(25)
p 2 = w2 P11 + w1 P21
(26)
After putting (25) and (26) to (24) we receive the interpolated value of the considered ‘new’ pixel.
6 Initial Results The presented method and its modification, discussed in the previous point, were implemented in the Visual Studio environment by using the language C#. So far, there has not been conducted any work on the optimization of the action speed, since there are planned some other modifications. That is why the article does not include the comparison of the action speed with other interpolation algorithms. There are only the initial results of the action and comparison of the quality of images with some popular interpolation methods. In the test were used images known as ‘Lena’ and ‘Barbara’. To compare different interpolation method there was calculated the PSNR (Peak Signal-to-Noise Ratio) of images: the original IH (image in high resolution) and I' (image interpolated from I, where I is low resolution image obtained from IH by reducing its resolution 2 times). PSNR is calculated according to: PSNR ( I H , I ' ) = 10 ⋅ log 10
max(max( I H ), max( I ' )) 2 MSE
(27)
where MSE is the Mean Squared Error given by: MSE ( I H , I ' ) =
1 N ⋅M
N
M
i =1
j =1
([ I H ( i , j ) − I ' ( i , j )] 2 )
where M and N are width and height of compared images.
(28)
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation
219
Fig. 10 Test images: a.) ‘Lena’, b.) ‘Barbara’
Fig. 11 Comparison of images before and after interpolation a.) Lena’s eye before interpolation (low resolution), b.) Lena’s eye after interpolation (high resolution)
220
K. Adamczyk and A. Walczak
Fig. 12 Comparison of images before and after interpolation a.) Barbara’s elbow before interpolation (low resolution), b.) Barbara’s elbow after interpolation (high resolution)
BICUBIC PSNR = 30,54dB
ORIGINAL IMAGE IN LOW RESOLUTION
BILINEAR PSNR = 30,42dB
iNEDI PSNR = 32,56dB
PRESENTED METHOD PSNR = 33,27dB
Fig. 13 Comparison of segments of the image ‘Barbara’ obtained through interpolation using various methods including the information about PSNR
When the criterion is the PSNR index, the results of the presented method are better than those of traditional methods like bilinear or bicubic. Taking into consideration the image ‘Barbara’ the results are better also comapring to another method from the EDI group, namely New Edge Directed Interpolation [Asuni and Giachetti 2008]. During tests it was noticed, that the method does not manage well with the interpolation of images containing very small objects. That is one of the elements, on which improvement we still work.
Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation
221
7 Planning of the Future Work The presented algorithm is not the final product. There are still some problems to be solved, for example interpolation of very small details of an image. The next issue that is going to be undertaken by the authors in future, is a change in the way of assigning the directional oreintation for ‘new’ pixels. There are conducted some works, the aim of which is to cause more neighbouring pixels (close and distant) decide about the directional orientation of a ‘new’ one. Moreover, the way of determining the weights of the pixels taking part in interpolation is going to be changed. There are also plans aiming at modification the 2D anisotropic wavelet in order to suit it better to the interpolation requirements. Since the anisotropic wavelet used in the method has large-scale propoerty, the next element to be considered by the authors is the issue of interpolation an image with defining the edges in multi resolution. Any of these modifications should give better representation of original images and a decrease in the interpolation errors. As a result the calculation time can slightly increase.
8 Conclusions The initial results show that the presented method has potential to have good results at its final stage. Already at this stage of design the results are better or comparable to other so far presented interpolation algorithms. The modification presented in 6th section and further work should additionally improve interpolated images quality at a small cost of additional computational time. The presented results raise our hopes that the future research will give satisfactory results.
Acknowledgment This work was financially supported by the European Union from the European Regional Development Fund under the Operational Programme Innovative Economy (Project no. POIG.02.03.03-00-013/08).
References [Allebach and Wong 1996] Allebach, J., Wong, P.: Edge-directed interpolation. In: IEEE Int. Conf. on Image Processing, vol. 3, pp. 707–710 (1996) [Asuni and Giachetti 2008] Asuni, N., Giachetti, A.: Accuracy improvements and artifacts removal in edge based image interpolation. In: Proc. 3rd Int. Conf. Computer Vision Theory and Applications VISAPP 2008 (2008) [Li and Orchard 2001] Li, X., Orchard, M.T.: New edge-directed interpolation. IEEE Trans. on Image Processing 10(10), 1521–1526 (2001)
222
K. Adamczyk and A. Walczak
[Li and Nguyen 2008] Li, M., Nguyen, T.: Markov random field model-based edge-directed image interpolation. IEEE Trans. on Image Processing 17(7), 1121–1128 (2008) [Puzio and Walczak 2006] Puzio, L., Walczak, A.: 2D anisotropic wavelet for edge extraction in 2D signals. In: Proc. of Military Communication and Information Systems Conference MCC 2006, Gdynia, Poland, pp. 8.3 (2006) [Puzio and Walczak 2008] Puzio, L., Walczak, A.: Adaptive edge detection method for images. Opto-Electronics Review 16(1), 60–67 (2008)
Experimental Results of Model-Based Fuzzy Control Solutions for a Laboratory Antilock Braking System R.E. Precup1, S.V. Spătaru2, M.B. Rădac1, E.M. Petriu3, S. Preitl1, C.A. Dragoş1, and R.C. David1 1
Department of Automation and Applied Informatics, “Politechnica” University of Timisoara, Timisoara, Romania {radu.precup,mircea.radac, stefan.preitl,claudia.dragos}@aut.upt.ro,
[email protected] 2 Department of Energy Technology, Aalborg University, Aalborg East, Denmark
[email protected] 3 School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada
[email protected]
Abstract. This chapter presents aspects concerning the design of model-based fuzzy control solutions dedicated to the longitudinal slip control of an Antilock Braking System laboratory equipment. Continuous-time and discrete-time TakagiSugeno (T-S) fuzzy models of the controlled process are first derived on the basis of the modal equivalence principle. The consequents of the T-S models of the T-S fuzzy controllers are local state feedback controllers which are solutions to several linear quadratic regulator (LQR) problems and the parallel distributed compensation is next applied. Linear matrix inequalities are solved to guarantee the global stability of the discrete-time fuzzy control systems and to give the optimal state feedback gain matrices of the LQR problems. A set of real-time experimental results is included to validate the new fuzzy control solutions.
1 Introduction The Antilock Braking System (ABS) is important in the safety subsystems on cars to prevent the wheel locking during braking. Current approaches to ABS control can be classified in two categories, wheel acceleration and tire slip control. The wheel acceleration approaches do the slip control indirectly by controlling the deceleration/acceleration of the wheel through the brake pressure control from the actuator. The current direct slip control approaches deal with modeling and measuring the friction effects using their physical characteristics, creating limit cycles around the peak friction slip points, sliding mode control, feedback linearization, grey-system theory, fuzzy control, and so on [Oniz et al. 2009; Wang et al. 2009; Rădac et al. 2009; Li et al. 2010]. Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 223–234. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
224
R.E. Precup et al.
This chapter presents model-based fuzzy control solutions for an ABS laboratory equipment. Our approach is different to the approaches presented in the literature because it starts with the first-principle mathematical model of the process and it takes advantage of fuzzy modeling to model the process nonlinearities. Two discrete-time dynamic Takagi-Sugeno (T-S) fuzzy models of the process specific to ABS are suggested in [Precup et al. 2010]. They employ two approaches to obtain the continuous-time models which employ the calculation of the minimum and maximum values of the input variables and the local linearization of the process model at several operating points. The continuous-time models are next discretized and two T-S fuzzy controllers are designed. The T-S fuzzy models of the discrete-time T-S fuzzy controllers have the same operators in the inference engine, the same defuzzification method, the same linguistic variables and linguistic terms, and the same rule antecedents as those of the T-S fuzzy models of the process. The consequents of the rules are local state feedback controllers and the parallel distributed compensation (PDC) is applied to obtain the state feedback gain matrices in the consequents of the rules. Linear matrix inequalities (LMIs) are solved to guarantee the global stability of the fuzzy control systems (FCSs) in terms of common positive definite matrices and to offer the optimal state feedback gain matrices. Building upon our previous approach [Precup et al. 2010] we present new contributions expressed as two continuous-time dynamic T-S fuzzy models of the process and two continuous-time T-S fuzzy controllers. They represent, together with the discrete-time T-S fuzzy models of the process and with the discrete-time T-S fuzzy controllers, attractive slip control solutions. This chapter treats the following topics: The continuous-time and discrete-time dynamic T-S fuzzy models of the process are derived in the next section. The T-S fuzzy controllers are designed in Section 3. A set of real-time experimental results is presented in Section 4. The conclusions are given in Section 5.
2 Process Modeling The INTECO ABS laboratory equipment consists of two wheels (Fig. 1) which make id different to the usual electro-hydraulic actuators usually used in ABS. The lower wheel is accelerated to illustrate the car speed making the upper wheel will gain speed. Reaching the speed threshold value causes the upper wheel to initiate the braking sequence. Use is made of PWM-controlled DC motors to drive the lower wheel and the brake on the upper wheel. Iron and plastic surfaces are forced to contact leading to the nonlinear slip-friction curve. The braking system contains a cable wound on the motor’s shaft that acts upon a hand-brake mechanism with lever making the plates of the braking system to create pressure upon braking. Therefore the wheels are slowed and additional nonlinearities are ensured.
Experimental Results of Model-Based Fuzzy Control Solutions
225
Fig. 1 Experimental setup and diagram of principle of ABS laboratory equipment
The first-principle equations of the dynamics of the process model are J 1 x1 = Fn r1 μ (λ ) − d1 x1 − M 10 − M 1 , J 2 x 2 = − Fn r2 μ (λ ) − d 2 x2 − M 20 ,
(1)
where J1 and J2 are the moments of inertia of the wheels, x1 and x2 are the angular velocities, d1 and d2 are the friction coefficients in the wheels’ axes, M10 and M20 are the static friction torques that oppose the normal rotation, M1 is the brake torque, r1 and r2 are the radii of the wheels, Fn is the normal force that the upper wheel pushes upon the lower wheel, μ (λ ) is the friction coefficient depending on the slip, and x1 and x 2 are the angular accelerations of the wheels. The identification in terms of measurements and experiments leads to the process parameters [Precup et al. 2010] r1 = r2 = 0.99 m, Fn = 58.214 N, J 1 = 7.53 ⋅ 10 −3 kg m 2 , J 2 = 25.6 ⋅ 10 −3 kg m 2 , d1 = 1.1874 ⋅ 10 −4 kg m 2 /s, d 2 = 2.1468 ⋅ 10 −4 kg m 2 /s, M 10 = 0.0032 N m,
(2)
M 20 = 0.0925 N m .
The definitions of the longitudinal slip λ which plays the role of controlled output in the slip control system and of the nonlinear factor S (λ ) are
λ = (r2 x2 − r1 x1 ) /(r2 x2 ), x 2 ≠ 0, S (λ ) = μ (λ ) /{L[sin ϕ − μ (λ ) cos ϕ ]}
(3)
where L, L = 0.37 m , is the length of the arm upon which the upper wheel is fixed, o and ϕ , ϕ = 65.61 , is the angle between the normal direction in the contact point of the wheels and the direction of L.
226
R.E. Precup et al.
The nonlinear state-space equations of the controlled process are x1 = S (λ )(c11 x1 + c12 ) + c13 x1 + c14 + (c15 S (λ ) + c16 ) s1 M 1 , x 2 = S (λ )(c 21 x1 + c 22 ) + c 23 x 2 + c 24 + c25 S (λ ) s1 M 1 , M = c (b(u ) − M ), 1
31
(4)
1
where u is the control signal applied to the DC motor which drives the upper wheel and the numerical values of the parameters are obtained according to the recommendations given in [Rădac et al. 2009]. The third equation in (4) highlights the nonlinearity of the actuator b(u ) . The substation of x1 from (3) to the model (4) leads to the state-space equation 0 z 3 (λ , x 2 ) z1 ( λ , x 2 ) 0 x = 0 z 40 (λ ) z 5 (λ ) x + 0 b(u ) + d(λ , x 2 ), 0 c31 0 − c31
(5)
where the state vector is x = [λ x2
M 1 ]T ,
(6)
T indicates the matrix transposition, z1 (λ , x2 ) , z 3 (λ , x 2 ) , z 40 (λ ) and z5 (λ) are nonlinear functions, and d(λ , x 2 ) is the disturbance input vector. The first step in the derivation of the dynamic T-S fuzzy models of the process deals with setting the following universes of discourse of the three state variables in (6) such that to include the possible domains of variation of the variables in all ABS operating regimes: 0.1 ≤ λ ≤ 1, 20 ≤ x 2 ≤ 178, 0 ≤ M 1 ≤ 10
(7)
The first group of dynamic T-S fuzzy models of the process is characterized by the four input variables z1, z3, z40 and z5. These variables belong to the input vector z = [ z1
z3
z 40
z 5 ]T
(8)
The derivation of these dynamic T-S fuzzy models starts with the graphical calculation of the following sectors of the input variables: 0.6148 ≤ z1 ≤ 5.6851, 0.6167 ≤ z 3 ≤ 5.7135, − 0.009 ≤ z 40 ≤ −0.0084, − 5.6139 ≤ z 5 ≤ −5.4132 .
(9)
Two linguistic terms, Tv , j , v ∈{z1 , z3 , z 40 , z5 } , j = 1,2 , are then defined for each input variable. The modal equivalence principle is applied, and this leads to the modal values of the input membership functions that are set to the minimum and maximum values in the sectors (9). Fig. 2 illustrates the membership functions of the linguistic terms that correspond to the input variable z5.
Experimental Results of Model-Based Fuzzy Control Solutions
227
Fig. 2 Membership functions of z5
The complete rule base of this group of dynamic T-S fuzzy models consists of the rules R i , i = 1,16 . Each rule consequent is assigned to a continuous-time statespace model (5) obtained for the modal values of the input membership functions. The complete rule base of the continuous-time dynamic T-S fuzzy model is R i : IF z1 (t ) IS Tzi1 , j AND z 3 (t ) IS Tzi3 , j AND z 40 (t ) IS Tzi40 , j AND z 5 (t ) IS Tzi5 , j x (t ) = A i x(t ) + B i u(t ) THEN , i = 1, nR, y (t ) = C i x(t )
(10)
where the disturbance input vector has been omitted, y is the controlled output, the general notation Tvi, j is used for Tv ,1 or Tv , 2 , v ∈{z1 , z 3 , z 40 , z 5 } , nR is the number of rules, nR = 16 , and t is the independent time variable. The matrices in the consequents of the rules R1 and R16 of the T-S fuzzy model (10) are 0 0.6167 0.6148 0 A1 = 0 − 0.009 − 0.56139 , B1 = B16 = 0 , 0 20.37 0 − 20.37 0 5.7135 5.6851 − 0.0084 − 5.4312 , C1 = C16 = [1 0 0] . A16 = 0 0 0 − 20.37
(11)
All T-S fuzzy models of the process and of the T-S fuzzy controllers use the SUM and PROD operators in the inference engine, and the weighted average defuzzification method. The 16 models in the consequents of the T-S fuzzy model (10) are discretized accepting the zero-order hold and the sampling period set to Ts = 0.01 s for all fuzzy models. This leads to the complete rule base of the discrete-time dynamic TS fuzzy model [Precup et al. 2010]
228
R.E. Precup et al.
R i : IF z1, k IS Tzi1 , j AND z 3,k IS Tzi3 , j AND z 40,k IS Tzi40 , j AND z 5,k IS Tzi5 , j
x = A d ,i x k + B d ,i u k THEN k +1 , i = 1, nR, yk = Ci x k
(12)
1 16 nR = 16 , with the following matrices in the consequents of the rules R and R :
1.0062 0 0.0056 0.0006 1 − 0.0508 , B d ,1 = − 0.0053 , A d ,1 = 0 0 0.1843 0 0.8157 1.0585 0 0.0532 0.0055 1 − 0.0491 , B d ,16 = − 0.0052 , C1 = C16 = [1 0 0] , A d ,16 = 0 0 0.1843 0 0.8157
(13)
where k is the index of the current sampling interval. The second group of dynamic T-S fuzzy models of the process is characterized by the three input variables λ , x2 and M1 belonging to the input vector z=x
(14)
operating points Ai (λ , x2 , M 1 ) , i = 1,20 , are set, where λ ∈ {0.1,0.2,0.4,0.81,1} , x 2 ∈ {50,178} and M 1 ∈ {5,10} . Two samples of such operating points are A1 (0.1,50,5) and A20 (1,178,10) . The linguistic terms Tλ ,l , l = 1,5 , Next
20
Tx2 ,m , m = 1,2 , and TM1 ,n , n = 1,2 , are defined for λ , x2 and M1, respectively. The
modal equivalence principle is applied, therefore all membership functions of the input linguistic terms are set such that their modal values are the coordinates of the 20 operating points. The complete rule base of this group of dynamic T-S fuzzy models consists of the rules R i , i = 1,20 . Each rule consequent is assigned to a continuous-time statespace model (5) which is linearized at one of the 20 operating points. The complete rule base of the continuous-time dynamic T-S fuzzy model is R i : IF λ (t ) IS Tλi,l AND x2 (t ) IS Txi2 ,m AND M 1 (t ) IS TMi 1 ,n
x (t ) = A i x(t ) + B i u (t ) , i = 1, nR, THEN y (t ) = C i x(t )
(15)
i i i where the notations Tv ,l , Tv ,m and Tv ,n are used for the linguistic terms Tλ ,l , l = 1,5 , Tx ,m , m = 1,2 , and TM ,n , n = 1,2 , respectively, nR = 20 , and v ∈ {λ , x 2 , M 1 } . The 2
1
matrices exemplified for the consequents of the rules R1 and R20 of the T-S fuzzy model (15) are
Experimental Results of Model-Based Fuzzy Control Solutions
− 4.8872.9524 − 0.0335 2.1978 0 A1 = − 88.4212 − 0.009 − 5.4582 , B1 = B 20 = 0 , 20.37 − 20.37 0 0 0.3354 − 0.0242 0.642 A 20 = − 31.8385 − 0.0084 − 5.6139 , C1 = C 20 = [1 0 0] . 0 0 − 20.37
229
(16)
The discretization of the 20 models in the consequents of the T-S fuzzy model (15) in the same conditions accepted for the first T-S fuzzy models result in the rule base of the discrete-time dynamic T-S fuzzy model [Precup et al. 2010] R i : IF λk IS Tλi,l AND x 2,k IS Txi2 ,m AND M 1,k IS TMi 1 ,n
x k +1 = A d ,i x k + B d ,i u k THEN , i = 1,20, y k = Ci x k
(17)
nR = 20 , with the following matrices exemplified for the consequents of the rules R1 and R20: 0.9524 0 A d ,1 = − 0.863 1 0 0 1.0034 A d , 20 = − 0.3189 0
0.0194 0.0021 − 0.0583 , B d ,1 = − 0.0058 , 0.1843 0.8157 0 0.0058 0.0006 1 − 0.0517 , B d , 20 = − 0.0054 , C1 = C 20 = [1 0 0] . 0.1843 0 0.8157
(18)
The continuous-time and discrete-time dynamic T-S fuzzy models of the process will be used as follows in the design of the T-S fuzzy controllers.
3 Design of Takagi-Sugeno Fuzzy Controllers The T-S fuzzy controller to be designed as follows for the continuous-time dynamic T-S fuzzy model (10) of the process is characterized by the rules R i : IF z1 (t ) IS Tzi1 , j AND z 3 (t ) IS Tzi3 , j AND z 40 (t ) IS Tzi40 , j AND z 5 (t ) IS Tzi5 , j THEN u (t ) = λref (t ) − K *i x(t ), i = 1, nR,
(19)
where λref is the reference input viz. the desired/imposed longitudinal slip and K *i , i = 1, nR , nR = 16 , are the optimal state feedback gain matrices. The T-S fuzzy
230
R.E. Precup et al.
controller to be designed for the continuous-time dynamic T-S fuzzy model (12) of the process is characterized by the rules R i : IF z1,k IS Tzi1 , j AND z3,k IS Tzi3 , j AND z 40,k IS Tzi40 , j AND z5,k IS Tzi5 , j THEN u k = λref ,k − K *i x k , i = 1, nR,
(20)
where nR = 16 . The complete rule base of the T-S fuzzy controller to be designed for the continuous-time dynamic T-S fuzzy model (15) of the process is R i : IF λ (t ) IS Tλi,l AND x 2 (t ) IS Txi2 ,m AND M 1 (t ) IS TMi 1 ,n THEN u (t ) = λref (t ) − K *i x(t ), i = 1, nR,
(21)
where nR = 20 . The complete rule base of the T-S fuzzy controller to be designed for the discrete-time dynamic T-S fuzzy model (17) of the process is R i : IF λk IS Tλi,l AND x2,k IS Txi2 ,m AND M 1,k IS TMi 1 ,n THEN u k = λref ,k − K x k , i = 1, nR,
(22)
* i
where nR = 20 . The PDC framework justifies the separate design of the local state feedback controllers in the rule consequents. In this context the optimal state feedback gain matrices K *i , i = 1, nR , are the solutions to the continuous-time LQR optimization problems ∞
K *i = arg min I (K i ), I (K i ) = [x T (t , K i ) Q x(t , K i ) + r u 2 (t , K i )] dt, Ki
0
(23)
Q ≥ 0, Q = [qij ]i , j =1,n , qij = q ji , i, j = 1,3, r > 0
for the T-S fuzzy controllers with the rules defined in (19) and (21), and the solutions to the continuous-time LQR optimization problems ∞
K *i = arg min I (K i ), I (K i ) = [xTk (K i ) Q xTk (K i ) + r uk2 (K i )] , k =1
Ki
(24)
Q ≥ 0, Q = [qij ]i , j =1,n , qij = q ji , i, j = 1,3, r > 0
for the T-S fuzzy controllers with the rules defined in (20) and (22). The normalized firing strengths (membership functions of fuzzy sets) hi are nR
nv
i =1
α =1
hi ( z ) = wi (z ) /[ wi ( z )], wi (z ) = ∏ Tvi,α (z ), i = 1, nR ,
(25)
with nv = 4 for the T-S fuzzy controllers with the rules defined in (19) and (20) and nv = 3 for the T-S fuzzy controllers with the rules defined in (21) and (22).
Experimental Results of Model-Based Fuzzy Control Solutions
231
The equilibrium point of the FCS is globally asymptotically stable if there exists a common positive definite matrix P such that the following LMIs are fulfilled: (A i − B i K *i ) T P + P ( A i − B i K *i ) < 0, i = 1, nR, [(A i + A j − B i K *j − B j K *i ) / 2]T P + P [(A i + A j − B i K *j − B j K *i ) / 2] ≤ 0,
(26)
∀ 1 ≤ i < j ≤ nR, such that hi ∩ h j ≠ ∅
for the T-S fuzzy controllers with the rules defined in (19) and (21), and (A d ,i − B d ,i K *i ) T P ( A d ,i − B d ,i K *i ) − P < 0, i = 1, nR, [(A d ,i + A d , j − B d ,i K *j − B j K *i ) / 2]T P [(A d ,i + A d , j − B d ,i K *j − B d , j K *i ) / 2] −
(27)
P ≤ 0, ∀ 1 ≤ i < j ≤ nR, such that hi ∩ h j ≠ ∅
for the T-S fuzzy controllers with the rules defined in (20) and (22). Once the dynamic T-S fuzzy models of the process are derived the model-based design of the T-S fuzzy controllers starts with setting the weights in the LQR optimization problems (23) and (24) to meet the desired/imposed FCS performance specifications. Next the LMIs (26) and (27) are solved to obtain the common positive definite matrix P. If no such matrices are obtained then it will be necessary to modify the weights in the LQR optimization problems and the FCS performance will be affected. However, as shown in [Precup et al. 2010], the LMIs are sufficient stability conditions, and they can be relaxed from the point of view of the matrix P. The state feedback gain matrices K *i are obtained finally in terms of the following equations: K *i = r −1 (B d ,i ) T P , i = 1, nR
(28)
for the T-S fuzzy controllers with the rules defined in (19) and (21), and K *i = [r + (B d ,i )T P B d ,i ]−1 (B d ,i ) T P A d ,i , i = 1, nR
(29)
for the T-S fuzzy controllers with the rules defined in (20) and (22).
4 Experimental Results Accepting the constant reference input λ ref = 0.5 some samples of the real-time experimental results obtained for the FCSs with the T-S fuzzy controllers characterized by the rule bases (19) and (21) are presented in Fig. 3 and Fig. 4, respectively. Some details concerning the weights, the state feedback gain matrices and the systems’ matrices for the FCSs with the T-S fuzzy controllers characterized by the rule bases (20) and (22) are presented in [Precup et al. 2010].
232
R.E. Precup et al.
Fig. 3 Real-time experimental results for the FCS with the Takagi-Sugeno fuzzy controller characterized by the rule base (19)
Fig. 4 Real-time experimental results for the FCS with the Takagi-Sugeno fuzzy controller characterized by the rule base (20)
Experimental Results of Model-Based Fuzzy Control Solutions
233
The best control system performance indices are obtained for the discrete-time T-S fuzzy controllers because the FCSs are actually implemented as digital controllers. The assessment of the exact performance indices including the overshoot and the settling time cannot be done on the basis of the real-time experimental results, and the digitally simulated responses of the FCSs are employed with this regard.
5 Conclusions Our new continuous-time and discrete-time T-S fuzzy models of the process and the new continuous-time and discrete-time T-S fuzzy controllers are important because of the relatively simple process modeling and controller design as well resulting in low-cost T-S fuzzy controllers which can be implemented in other applications [Grzymała-Busse et al. 2005; Wilamowski 2009; Ridluan et al. 2009; Kulikowski 2009; Kouro et al. 2010]. The first limitation of the new model-based fuzzy control solutions concerns the optimality of the FCSs which is guaranteed in the vicinity of the considered particular operating points. Another limitation is the need for numerical algorithms to solve the LMIs (26) and (27). Future research will deal with the modification of the fuzzy modeling approach. Other stability conditions will be derived.
Acknowledgment This work was supported by the CNCSIS and UEFISCSU of Romania and the co-operation between the Óbuda University, Budapest, Hungary, the University of Ljubljana, Slovenia, and the “Politehnica” University of Timisoara, Romania, in the framework of the Hungarian-Romanian and Slovenian-Romanian Intergovernmental Science & Technology Cooperation Programs. This work was partially supported by the strategic grant POSDRU 6/1.5/S/13 (2008) of the Ministry of Labor, Family and Social Protection, Romania, co-financed by the European Social Fund – Investing in People.
References [Grzymała-Busse et al. 2005] Grzymała-Busse, J.W., Hippe, Z.S., Mroczek, T., et al.: Data mining analysis of granular bed caking during hop extraction. In: Proc. 5th International Conference on Intelligent Systems Design and Applications, Wroclaw, Poland, pp. 426–431 (2005) [Kouro et al. 2010] Kouro, S., Malinowski, M., Gopakumar, K., et al.: Recent advances and industrial applications of multilevel converters. IEEE Trans. Ind. Electron 58(8), 2553–2580 (2010) [Kulikowski 2009] Kulikowski, J.L.: Decision making supported by fuzzy deontological statements. In: Proc. Int. Multiconference on Computer Science and Information Technology IMCSIT 2009, Mrągowo, Poland, pp. 65–73 (2009)
234
R.E. Precup et al.
[Li et al. 2010] Li, F.Z., Hu, R.F., Yao, H.X.: The performance of automobile antilock brake system based on fuzzy robust control. In: Proc. 2010 Int. Conf. on Intelligent Computation Technology and Automation, Changsha, China, vol. 3, pp. 870–873 (2010) [Oniz et al. 2009] Oniz, Y., Kayacan, E., Kaynak, O.: A dynamic method to forecast the wheel slip for antilock braking system and its experimental evaluation. IEEE Trans. Syst. Man Cybern. B Cybern. 39(2), 551–560 (2009) [Precup et al. 2010] Precup, R.E., Spătaru, S.V., Rădac, M.B., et al.: Model-based fuzzy control solutions for a laboratory antilock braking system. In: Proc. 3rd Int. Conf. on Human System Interaction, Rzeszow, Poland, pp. 133–138 (2010) [Rădac et al. 2009] Rădac, M.B., Precup, R.E., Preitl, S., et al.: Tire slip fuzzy control of a laboratory anti-lock braking system. In: Proc. European Control Conf., Budapest, Hungary, pp. 940–945 (2009) [Ridluan et al. 2009] Ridluan, A., Manic, M., Tokuhiro, A.: EBaLM-THP - artificial neural network thermo-hydraulic prediction tool for an advanced nuclear components. Nucl. Eng. Des. 239(2), 308–319 (2009) [Wang et al. 2009] Wang, W.Y., Li, I.H., Chen, M.C., et al.: Dynamic slip-ratio estimation and control of antilock braking systems using an observer-based direct adaptive fuzzyneural controller. IEEE Trans. Ind. Electron 56(5), 1746–1756 (2009) [Wilamowski 2009] Wilamowski, B.M.: Neural network architectures and learning algorithms. IEEE Ind. Electron Mag. 3(4), 56–63 (2009)
Remote Teaching and New Testing Method Applied in Higher Education L. Pyzik University of Information Technology and Management, Rzeszów, Poland
[email protected]
Abstract. In this article the e-learning platforms applied in higher education are being looked at. The comparative analysis of certain platforms in respect of their use in a teaching process was made. The analysis was created based on criteria, chosen for the easiest management of the groups of students, easier communication between lecturers and student, as well as for the easy forming of archives and transforming prepared materials in the SCORM/AICC standards. The aim of the article is to facilitate the school management’s choice of right systems. This is depending on the needs, as well as pointing out the possibilities of the e-learning platforms. The marking scale presented in the comparison of the platforms clearly confirms that the best platforms are ZSZN. Three systems: BlackBoard, Moodle and ILIAS, are ranked after ZSZN. Depending on the range of the distance learning systems used, the selection among these three platforms depends on the financial capability of a given higher education institution. BlackBoard is a system that provides an exhaustive offer in both asynchronous and synchronous e-learning. ILIAS and Moodle are open systems that support many plug-ins relating to functions that we may need. The paper offers a comparison of the functionality of the most popular e-learning systems. The comparative analysis also looks at the ZSZN system that has been designed mainly with universities of technology in mind. On the basis of the comparative analysis results, the university management will be able to choose an optimum e-learning system that is best adapted to cater to both their needs and financial capabilities.
1 Introduction In many universities, introducing the learning support with e-learning platform is, at the moment, the most important element leading to achieve the standards of the university in the XXI century. Education that uses the newest, innovative practices expands the range of the university’s activities, therefore adapting it to the needs of present school society. The universities managers face the dilemma of choosing which system they should apply to fulfil their requirements. The diversity of accessible applications makes this choice difficult.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 237–259. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
238
L. Pyzik
In this study, e-learning platforms will be compared based on their usefulness for the remote teaching tasks. The considered systems were only the asynchronous e-learning systems, with the possibility of integration with the synchronous systems.
2 Description of Criteria for Analysis of the Available Systems In this article the most popular e-learning platforms, which are used in higher education, are presented. The comparison was based on the following criteria: 2.1 Managing the Courses and Users In this section we will compare the possibilities of creating courses using the materials from e-learning servers on the systems. The section also includes managing the course users, assigning the groups and exploring the potential of creating the timetable with the ability to access the research library throughout the course. We will also grade the systems capability of using the user’s database, such as MySQL, LDAP, SOAP, RADIUS, Shibboleth or local database. 2.2 Creating Research Material and Marking Tools for the Course This section will allow us to grade the platform on how it publishes research material in different formats. Other elements taken into consideration include presentations, creating e-books, use of multimedia in research materials and the possibility of creating surveys/quizzes etc. A subdivision of this section is the tools for marking the student’s work. We will analyse the possibilities of creating tests, the types of tests, different ways of testing, creating statistics for particular courses, as well as individual statistics. 2.3 Compatibility of the Platforms with Standards The costs of the remote course are high, due to the high demand of text correction, methodical concepts, use of expensive (usually licensed) technology, as well as the distribution (technical infrastructure of the institution) and the overall cost of holding the course (individual costs for lecturers). Therefore, it is considered acceptable for different institutions collaborating together to use the same course, providing they share mutual political views. It is essential to create standards of preparing and holding the remote course. Meeting these set standards means: • Non problematic use of any course in a particular system • Selling the course content to different institutions on the market, (having taught your own users with the self-created e-learning course) • The communication between the course and any of the systems used in a company, thanks to which is easy to monitor both the marks and the behaviours of the users of electronic content
Remote Teaching and New Testing Method Applied in Higher Education
239
• Access standards – disabled people make up a significant percentage of society. To guarantee the best possible access to the advantages of modern technology, some law regulations are being made and the institutions and companies are required to respect those laws. American law (section 508 of federal rehabilitation act) is approbating cases connected with accessing the electronic resources by disabled people, demanding solutions, which will help them, use the electronic resources. Regulation of section 508 demands that “institutions and government agencies prepare, maintain and use electronic technologies in a way that allows disabled employees and citizens to access the information and data in a similar way to those who are not disabled.’’ Due to this demand, six articles were fabricated, containing requirements for tools, products and electronic resources. [Bednarek et al.2007] International organisations are creating standards to make sure those requirements are fulfilled. SCORM "Shareable Content Object Reference Model” is a standard for developing, packaging and delivering high-quality training materials for online training courses. SCORM was developed based on the concept of "using a common e-learning standard to modernize education and training courses". In short, SCORM is a set of specifications for developing, packaging and delivering high-quality education and training content/materials whenever and wherever they are needed. [Chmielewski 2006] The standard AICC was originally developed for the aerospace industry and gradually expanded its scope to become one of the most widely used standards for e-learning. Its present significance is more historic. Yet since there are still a variety of LMS forms and ready-made content or development tools that support this standard in selecting a LMS or development system, it is still worthwhile. Ensuring that it supports AICC. 2.4 Communication in Remote Teaching The form of communication with the students during the conduction of the didactic process is incredibly important. One of the most popular communication systems is chat, with the possibility of creating chat rooms. Systems are also analysed on having the possibility of connecting to videoconference systems, as well as to Internet communicators. This section will also include whether or not the platforms have the possibility of creating Internet forums, electronic mail systems and collaborating between student groups in creating document collections (WIKI) and sharing the files with different students. 2.5 Defining Individual Teaching Paths This section includes considering the prospects of particular platforms to create individual teaching paths based on the analysis of the initial test results, as well as the results of particular teaching stages. The analysis will include possibilities of individualising the whole teaching process and giving the student the possibility of choosing their personalised teaching form.
240
L. Pyzik
2.6 The Cost of the Platform and the Attainability of the Technical Support This section will grade the costs of implementation and the standard and availability of the platforms technical support. It is a very important section economically, which in some cases determines the choice of the platform we are using for remote teaching.
3 Comparison of Platforms Based on Basic Criteria The comparison was split into two groups- free code based software and GPL licence and commercial software. The highest possible mark for each system requirement was 10 points. The requirement for low price and exploitation platforms were marked differently. For fulfilling a requirement each one from the tested platforms could reach a possible 20 points. Each platform was marked on a scale, where the maximum mark was 70 points. The cheapest platform gets the most points, whereas the most expensive one receives the least. In order to make the analysis, five of the most popular LMS platforms used in the higher education were chosen. The analysed platforms are both freeware and commercial. ILIAS “IntegriertesLern-, Informations- und Arbeitskooperations System,” is a German LMS and LCMS class system, which is developing due to the open source program by the Cologne University. Moodle: Modular Object-Oriented Dynamic Learning Environment. This is based on the public GPL licence. Universities and higher education schools often use it [Palka 2010b]. OLAT: (short for Online Learning And Training) is an Internet e-learning platform, created mostly by the MELS team (Multimedia and Learning Services) at the Zurich University [Palka 2010a]. Blackboard: The main solution in Blackboard is The Blackboard Academic Suite package, containing following modules [WWW-1 2010]. Blackboard Learning Platform: platform for managing the didactic process form the side of building the content, managing the enrolment and the process of marking students’ work and delivering the content in both an unsynchronized and synchronized way (tools for lecturing in a Virtual Class mode). Blackboard Content System: module allows an integrated way of managing resources and both didactic and administrative content in grounds of university. The element is equipped for the possibility of building the hard drive space for the students’ group work, subversion files, incorporating the mechanisms of circulation of documents with the authorisation options on a few levels (for example the essay attached by student is firstly given to a person, who manages the group for a first marking. After getting a positive authorisation, the essay is forwarded to a lecturer for a final grade.) Blackboard Community System: module allows creating, in a frame of the system, the society of the students and realisation of group projects during lectures. In this way, lecturers are given the opportunity to define virtual areas for teaching purposes,
Remote Teaching and New Testing Method Applied in Higher Education
241
where the communication and sharing resources are taking place. Blackboard Outcomes: module which both monitors and reports the realisation of the University’s strategic aims on a basis of defined criteria, which is meant to be marked and have an influence on the ranking place of a University, or the mark given by PKA. Blackboard Portfolio: A module that allows the students and University’s employees to build their own space and platforms for sharing knowledge and experiences. In the portfolio, users can create their own repositories with the resources from certain lectures, as well as run forums, blogs for dynamic creating of content and sharing thoughts. WBT for 4 System platform contains two elements. WBTServer is a LMS system, allowing simple management of the entire remote teaching process. It simplifies the coordination of e-learning workshops. Using WBTServer platforms allows fast and simple access to the content, as well as a fast communication with the system’s users. Thanks to the WBTServer structure, it is possible to adjust the functionality of the system to meet the individual needs of the client. WBTExpress is a program designed for creating advanced e-learning courses. Using the tools allows easy and intuitional creating workshops for people, who do not have the knowledge of using computer science. The software contains more than 120 components and ready web page patterns that allow a quick and easy creation of on-line workshops. The tool has very useful software for levelling tests, which allows an assignation of the teaching levels for certain students. It is also good software for creating language tests. ZSZN- Integrated system of remote teaching (Zintegrowany System Zdalnego Nauczania) is a system created by the author and her team [Pyzik 2010b]. The system integrates three independent remote teaching systems: LMS ILIAS, Synchronic DimDim system and virtual machines system. It was created for the purpose of teaching operating systems, but it works well in teaching different subjects as well. Besides the before mentioned ILIAS system, the innovative virtual machines system was introduced, which allows us to organize the virtual laboratories, whilst remote teaching. Virtual laboratory is a facsimile of the stationary laboratory. With the help of virtual machines systems, the lecturer can help the student in a direct way, as if the student would come up to the workstation in a stationary laboratory. Besides the interaction with the student’s machine, the lecturer has a look-in on every student’s workstation. They can monitor what a particular student is doing during the session, examine, help and explain. They can also freeze a student’s machine and make the students watch, what they are doing on the instructor’s machine, which shows up on students’ desktops. The whole is supported by a videoconferencing synchronic system, in which the lecturer has got a “white board” voice communicator, presentation system, main chat and chat rooms for their command. 3.1 Managing the Courses and Students ILIAS-system has got a highly developed system for managing students and courses. It is based on roles, which are assigned to specific resources. Students can be added to each role. Assigning specific roles regulates the access to the courses. Additionally we can assign the students to a particular group and assign the access
242
L. Pyzik
to the courses and resources to this group. System is also equipped with a timetable, private or group. In each resource we can regulate the access to particular attributes of the resource. This way we can assign privileges to particular attributes, for example test. We can give authorization for creating a new test question bank for a user and forbid him from using the rest of the test, despite the roles access to the whole resource. Selective approach to the problem gives the system a huge advantage over the rest. ILIAS is using MySQL local database, but it can also draw the users data, using LDAP, RADIUS, SOAP or Shibboleth. In a 10 point scale the system scores 9. This high marking is given due to the possibility of manoeuvring the resource’s laws by means of roles, as well as giving attributes to both local and global groups. Moodle is a platform in which internal and external authorisation is used for managing the users. We can use servers like LDAP, Radius, IMAP, NNTP, as well as Oracle, MySQLPostgres, MSSQL databases. The system offers many mechanisms for student’s access to research. We can create student groups and manage them flexibly, assigning users directly to the created courses. There is also the potential to create roles and assign them to the resources. The default roles are administrator, creator of the course and the lecturer. Creating new roles allows a flexible regulation of the access to a particular course. Thanks to those possibilities the system scores nine points in this section. The OLAT platform has also got many authentication mechanisms; for example it can use LDAP, CAS and Shibboleth authentications. The platform allows users groups to be created and assigned to specific courses. Unfortunately, this mechanism does not support selective addition of authorisation to attribute to the resource. The system has the potential of giving authorisation by creating roles. The roles are predefined without the capability of adding new objects. In predefined roles we have got authorization to manage the user, group, the author, administrator and the manager of the resource to choose from. Those roles are sufficient for remote teaching systems, but it is missing the option of creating roles with a possibility of specifying the access. The element that distinguishes the system form the others LMS systems is having the possibility of regulation of assigning the hard drive space and limit of data sharing for normal and privileged user, groups, educational resources, folders and course elements. OLAT platform scores seven points, due to the lack of the assigning additional roles, not connected with the predefined ones. Blackboard is a commercial platform designed especially for the needs of higher education. A user can be assigned only one role within Blackboard. Blackboard user roles are: Instructor: The instructor has master access over the Blackboard course functions. Teaching Assistant: Teaching assistants have almost the same access level that instructors do, with the exception that a teaching assistant CANNOT "Add" instructors and TA's to a Blackboard course. Grader: The grader is able to access all areas under assessments. Course Builder: Course builders have access to most content functions. They cannot build
Remote Teaching and New Testing Method Applied in Higher Education
243
assessments or pools, do not have access to the grade book and digital drop box, cannot create calendar entries and course tasks, and cannot access course statistics. Student: Enrolled students have access to all content, communication areas, and assessments. Students however do not have access to the Control Panel. Only those with the role of Student can take assessments and have grades recorded in the Blackboard online grade book. Blackboard uses multiple ways of LAP, CAS, Shibboleth authentication. System does not offer the possibility of defining new roles. It also scores seven points in this category. WBT –Server is equipped with module of easy users (gathered in groups) managing. Among the WBT Authorization mechanisms the server operates the authorization with the use of LDAP, Active Directory, but it also uses the MySQL, MSSQL, Oracle databases. The materials developed in WBTExpress module are published on WBT server and for this module the authorizations are given for users group. The system is a bit poor when it comes to managing users groups and accessing the resources. Due to does conditions the system scored five scores in this section. ZSZN – this system is based on LMS ILIAS systems; therefore it inherited all the functionalities in the domain of managing the users and accessing the resources. Additionally the system has a built-in module for managing synchronic course users and managing the virtual machines. The users account, created in the ILIAS system, is passed on to the DimDim synchronic system, where the user is assigned their prepared sessions and finishes work after the completion of the session. The ZSZN system incorporates the managing the resource access and creating synchronic sessions, as well as creating new virtual machines and managing the virtual laboratory access. In this section the system scores ten points. The results of this section are shown on figure 1.
10
ILIAS Moodle
5
OLAT Blackboard
0
WBT
Fig. 1 Management of users and their access to resources
3.2 Creating Course Research and Tools for Verification of Knowledge ILIAS-system has got a great mechanism for uploading resource. For creating didactic material we have got the potential of creating or attaching the course
244
L. Pyzik
module content and HTML teaching modules. After creating the content the author gains access to an editor, in which he can form the course content using multimedia materials. He also has got the possibility of entering Meta data divided into many categories, starting from general ones up to technical and law. In certain categories there is a possibility of creating external links and references, as well as formatting specific material in many ways. The didactic content can be created in the system by using a very functional text editor and content editor; we can change the order of the chapters and move chapters to different places .The material of the resource can be shown in the system in a form of print, with a highlight of particular chapters, or single pages. In the system we can upload material created with the use of text editors, like OpenOffice Write. Specially designed eLAIX plug-in allow us to export documents to ILIAS. It gives us huge the huge opportunity of creating e-books or didactic material of all kinds. System has also got a welldeveloped structure for uploading and playing multimedia or photo galleries. Another tool is creating tests and setting the access to them. The system uses various banks of multiple choice and normal questions, closed, numeric, mixed questions, positional and similar. We have the possibility of using re-exportation and re-importation of the questions’ bank in the IMS-QTI format. We can also track the results of certain questions comparing them to rest of the group. We have got a wide variety of settings, both temporary and punctual, as well as the setting for level passage in a certain test. Another advantage of the system is the potential of adding surveys, a wide variety of reports and analysis’. The course author can choose questions from many banks, connecting bank and creating new question banks, adding or removing questions from available. In the system settings we can activate registering system and the system of analysing particular test, as well as setting the difficulty level, setting the time and choosing the way the test is being done in (test with prompts, or testing with a possibility of holding the questions, random choice questions, animated test, etc.) In this section ILIAS scores nine from the possible ten points. Moodle, like ILIAS, has a well-developed mechanism of creating didactic content. We can use a built-in editor for creating resources; we can also add file resources in many formats and add question banks or lists. With the systems help we can create books-a simple few page material for studying. A lecture allows the presentation of the lesson content in an interesting format, consisting of many pages. Each page usually ends with a question and a few answers. The stage of work the student works on is fully dependent on personal progression levels. The lesson navigation can be simplified or much more complicated depending on the structure of the lecture material. Light box Gallery: The resource gallery allows the projection of many images via Lightbox2 JavaScript library. In the edited pictures we can change the size, rotate them and the special option enables a cosharing the image. Descriptions: Descriptions are not seen as the strictly active services - these are graphic interfaces that allow pasting text and graphics from other activities on the course page. Quizzes: This module allows the lecturer to
Remote Teaching and New Testing Method Applied in Higher Education
245
create tests in a quiz form, made of the multiple-choice questions, true/false questions and short answer questions. These questions are categorised and stored in a database, which can be reused in a different course or even moved to another course. Quizzes can allow multiple attempts to find the solution of the question. Each attempt is automatically checked and the lecturer can decide whether to comment or give the correct answers out. This module offers the tools, which allow them to grade. Moodle has a whole range of plug-ins for particular functions, such as using popular office packages, attaching graphics, movies and messages from popular social networks, as well as creating docbooks. Unfortunately it lacks the tools that would allow adjusting the test’s difficulty level or mechanisms of personalizing tests. In a ten points scale Moodle scores eight. OLAT: The platform has got good mechanisms of uploading teaching content in the system’s resource. In the system there is a well working editor for uploading the content and drawings. It is doesn’t provide the ability of uploading multimedia, although there is a good content formatting mechanism. Sadly, the tool for creating tests has limited possibilities. We can only create test exercises in a form of single/multiple-answer questions, or as a test with a free space for answer. It is, comparing to previously described platforms, very poor device for creating tests. The lack of adding multimedia in a question hugely limits the possibilities of using this tool in everyday practice. Test questions can be copied or moved to the right place. No possibility of creating question banks, which would be used in many tests or choosing questions from the bank in a purpose of using it in a different tests. Due to very limited possibilities of creating tests, the platform scores four points. Blackboard The module provides an integrated way of managing resources and didactic and administrative content in grounds of university. The element is equipped for the possibility of building the hard drive space for the students’ group work, subversion files, incorporating the mechanisms of circulation of documents with the authorisation options on a few levels (for example, the essay attached by student is firstly given to a person who manages the group for a first marking. After getting a positive authorisation, the essay is forwarded to a lecturer for a final marking). Blackboard platform, as one of a few, recognises multimedia, audio, video, executive and graphic files, as well as spreadsheets and tests. Question banks include a whole range of questions. For our needs we have standard questions (single or multiple-choice), gap fill questions, numeric, questions for choosing the right order, true/false questions. An innovation, which does not exist in other platforms are “Hot Spot” questions, requiring students to point to a field of coordinates in a picture to match the right answer. Through using this method, we can formulate questions, for example, pointing a location on the map, or, in medical tests, questions from anatomy etc. Additionally, there are JUMBLED SENTENCE questions, very useful in creating language tests. The variety of types of questions, having the possibility of creating tests from different question banks, the editors
246
L. Pyzik
equipped in many functional tools for creating the content of didactic material sets the BlackBoard platform in a top of the pack in this section with the score of ten points. WBT Using WBTServer platforms allows fast and simple access to the content, as well as a fast communication with the systems users. Thanks to the WBTServer structure, it is possible to adjust the functionality of the system to meet the individual needs of the client. WBTExpress is a program for creating advanced e-learning courses. Using the tools allows easy and intuitional creating workshops for people who do not have the knowledge of using computer science. Software contains more than 120 components and ready web page patterns that allow quick and easy creation of on-line workshops. The tool has very useful software for levelling tests, which allows an assignation of the teaching levels for certain students. It is also good software for creating language tests. This segregation for two basic elements is hugely inconvenient, due to the necessity of delivering the WBTExpress software to every author of the course. Additionally with the help of WBTServer, you can only manage users; groups and the content of the course can be only modified on the workstations with the WBTExpress installed on it. It is a huge disadvantage when it comes to changes in the course content. The changes can only be made on authors’ workstations. The possibility of creating global question banks, which can by used by other authors or system admins in the test creation, is also limited. Those limitations place the WBT platform on a low position with only four points. ZSZN ZSZN platform inherits all the features connected with managing the teaching resource content, as well as the marking mechanisms form ILIAS platform. Additionally, ZSZN was equipped with a test editor, which works on Windows and Linux. It allows downloading test sets from ILIAS platform, modification and addition and removing questions. The test editor also assures the conversion of those tests to printing test editors formats. The next device that has been designed for this system is a testing and self-learning application. It is based on a virtual machines built-in system. It allows building practical exercises and doing them in either a learning or test mode. Author is given an exercise editor, where he puts in the practical exercise content. Next, using a special solution editor for a particular exercise, it checks the correction of the exercise, which has been done. The result is being shown depending on application work mode. In the test mode it is shown as correct or fail and in teaching mode we are given the mistakes made in a particular exercise. This application is an integral part of the virtual machines system, managed by the system’s main engine. Thanks to this application, we have the possibility of creating test, linked with the solution of the practical exercises. This extinguished ZSZN platform from the other remote teaching platform. In this section it scored ten points.
Remote Teaching and New Testing Method Applied in Higher Education
247
Comparisons of the results in this section are shown on the figure 2.
Fig. 2 Comparison of category 2 systems
3.3 Compatibility of the Platforms with Standards ILIAS is fully compatible with SCORM/AICC. It is equipped with exporting mechanisms in HTML, XML, SCORM and SCORM2004. It can also import resources in those formats. The educational resource can be exported to SCORM, XML or HTML formats. The platform also has the possibility to adjust the look to meet the needs of disabled people. Changing the “skin” allows us to enlarge the font, icons and other elements to the size, which a person with vision problems finds satisfying. Additional advantages includethe keyboard shortcuts, which allow a quick access to the right system parts. Thanks to those features ILIAS scores ten points in this section. Moodle Fully compatible with SCORM/AICC. It is equipped with importing mechanisms in HTML, XML, SCORM and SCORM2004, although there is no possibility of exporting the resource from Moodle system to formats of SCORM, IMS and others. There is a possibility of adjusting the platform to the needs of disabled people. An example is AGH Krakow platform, which supports easy access for people with visual problems. Due to the lack of exporting the materials to different formats the platform scores eight out of ten points. OLAT Platform fulfils the standards of SCORM/AICC, but is not adapted for importing material in SCORM2004. Resource can only be imported in those formats, exporting is not possible. There are also limitations for adjusting the look of the platform, or adjusting the size of the icons, therefore the use of the platform by disabled is also limited. Due to the extent of these limitations platform only scores four points in this section.
248
L. Pyzik
Blackboard The platform has the capability of importing materials in all of the SCORM, IMS and NLN formats. Does not fulfil SCORM standards. The platform exports courses in its own internal format, only to be used on Blackboard platform. Exporting to different platforms in a SCORM format is only possible with a use of additional software. The platform has the potential to change the skin and to adjust for disabled peoples use. Due to its lack of exporting data to SCORM/AICC standard the platform scores eight points. WBT WBT Express is a system which has the capability of publishing created courses in standards: SCORM 1.2 and 2004 and AICC standard. Additionally, in this platform we are introduced to some rare dedicated exporting possibilities to Moddle and Oracle iLearning platforms. Using WBT servers we can also import materials in format of SCORM/AICC and other standards. WBT Express allows creating and managing the style of the created materials, which is necessary for disabled people use. WBT scores ten points in this section. ZSZN Integrated system uses the ILIAS platform mechanisms. Therefore, it is compatible with SCORM/AICC standards. There is a possibility of importing and exporting materials prepared in SCORM standards. It is possible to change styles of particular educational resource, adjusting even singular pages for the needs of people with sight problems. Additionally, we have a possibility of defining keyboard shortcuts, crucial for improving the use of the system. Taking all of these features into account, the platform scores ten points. Comparisons of the results in this section are shown on the figure 3.
Fig. 3 Compatibility with standards
Remote Teaching and New Testing Method Applied in Higher Education
249
3.4 Communication in Remote Teaching ILIAS The platform is capable of setting chat rooms, private or public. It also allows the uploading of Internet forums. In the system settings we can tick anonymous post sending, as well as enable post statistics for particular users. There is also an option for the moderator to activate posts. We can also add new moderators for specific forums. ILIAS system has got a good e-mail mechanism. Unfortunately, the platform has not got any plug-in to connect the system with synchronic e-learning systems. Due to this, it scores six points in this section. Moodle Moodle has got an Internet forum mechanism. Particular people can moderate this forum. It can be created by the administrator as well as by the author of the course. Besides the forum we have a chat, which, in the standard option, offers us new chat rooms, or the ones we have already used. This chat room is set to occur at the same time every day, or every week. Note, that the time in Moodle is always adjusted to the time zone of the user, who’s viewing it. This chat is set so that everyone could view the past chat session from the last 30 days. The system has a built-in electronic mail, which operates all the groups and course users. Additionally, we have a built-in plug-in, which allows us to connect the system with the videoconferencing system of DimDim and Elluminate. Moodle scores ten points in this category. OLAT The Olat system is not very flexible when it comes to the communication tools. E-mail can be sent to selected or all teachers and only the whole group of users. It is a huge disadvantage for the system. It carries no basic communication mechanisms, for example chat or discussion forum. OLAT system has no plug-ins for connection with videoconferencing systems. Those facts are responsible for the low score of only three points in this section. Blackboard Collaboration tools and discussion boards allow users and instructors to engage in synchronous and asynchronous communication. The Discussion Board is a communication medium for posting and responding to messages. Conversations are grouped as threads that contain a main posting and all related replies. An advantage of the Discussion Board is that threads are logged and organized. Forums are used to organize discussions on related topics. Students and Instructors click discussion links to access a forum from the main Discussion Board page. When a discussion is started within a forum it is called a thread. Users can ask questions, draw on the whiteboard, and participate in breakout sessions from the Virtual Classroom. The Session Admin establishes which tools in the Virtual Classroom users can access. The Chat allows the users to interact with each other via a textbased chat. Chat is part of the Virtual Classroom. It can also be accessed separately. It scores ten points.
250
L. Pyzik
WBT The system has got a built-in to the WBT server module. The module allows a student to stay in contact with an advisor to maintain the support anytime. The possibility of contacting an advisor independently from time and place, efficient on-line help, giving support for clients independently of the availability of the advisors, with the Virtual Advisor functionality. System gives the possibility of carrying out multiple consultations at the same time, automatic selection of the most competent advisor with a possibility of manual redirecting to another advisor during the consultation, support for an advisor in their consultancy with the system that generates answers. The lack of the full use of platform with the videoconferencing functions leads to a score of seven points. ZSZN As ZSZN is an integrated system using ILIAS and DimDim systems, it inherits all the functionalities of those systems. A mechanism was designed for attaching DimDim synchronic system to the presented platform, it enables creating and working in a full synchronic mode. It supports audio, video and sharing screen options. It also offers white board and supports presentations in .ppt PDF formats. Thanks to ILIAS platform, the user can rely on discussion forums, additional chat and e-mail system. The merge of asynchrony and synchronic e-learning systems allows us to manage users, groups, and the access to the resource. It also enables us to manage access to the synchronic session and the users connection with DimDim system, for the time of the session. ZSZN connects its synchronic sessions with virtual machines. To every open session there is a virtual laboratory assigned, in which students have their own virtual machines. The lecturer is able to view the machines and can interact on the users machine throughout the session [Pyzik 2010a]. The system is capable of sharing voice, camcorder picture, desktop and resource, as well as interaction on virtual machines. Due to these factors, it scores maximum in this section. The comparison of the platforms is shown on the figure 4.
Fig. 4 Communication
Remote Teaching and New Testing Method Applied in Higher Education
251
3.5 Defining Individual Students Paths ILIAS The platform allows the marks to be reported, according to ECTS (Europejski System Transferu i Akumulacj i Punktow) as well as creating individual certificates for users in PDF file format. The constructor of the test enforces test questions or chosen randomly form a question bank. The system is not able to adjust the questions automatically, based on the progress of the test. The constructors of the test have the possibility of using different marking methods and criteria of the final mark, although a test created this ways remains a passive test. Creating dynamic paths is in this platform is still in its designing stages, but its development will allow creating individual teaching paths. Although we do not have the possibility of creating dynamic paths by using manual setting, we still get the possibility of assigning individual programs to particular students. ILIAS scores seven points in this category. Moodle Moodle platform, like the others, is equipped with test and reporting the marks mechanisms. The report presents data about each quiz’ question in a form of a table. The report analyses and marks the performance of each question. Based on those reports we can distinguish the difficulty levels of particular quizzes and check the progress in students work. Having the statistics for specific students, we can manually set the individual teaching path for them. This conduct has to be set at the beginning of the teaching process by first establishing, based on test and surveys the characteristic features and abilities of the students. It allows assigning of same-level students into groups. Moodle platform processes can only be done manually. It is unequipped with automatic mechanisms to set the teaching paths. It scores seven points in this section. OLAT As soon as a course participant has completed a test, self-test or questionnaire and you have conducted some data archiving you will be able to see the results. Results of self-tests and questionnaires will be stored anonymously. After archiving, you will dispose of the following data: persons (anonymous by sequential number), questions dealt with, given answers, score (self-test). It is the same with test results but all data will be stored in a personalized order (first name, last name, user name). Thanks to the possibility of browsing the reports, the author can manually assign particular resources to the student. Unfortunately it is not possible to create dynamic teaching paths or set different marking systems in tests. The platform scores 6 points in this section. Blackboard When a Test is deployed, four options for Test Feedback appear on the Test Options page. • Score Only – Only the final score is presented. • Detailed Results – Allows users to see their answers, whether they are correct and the final score.
252
L. Pyzik
The correct answers are not presented. • Show correct answers – Allows users to see their answers, the correct answers and the final score. • Detailed Results, Correct Answers and Feedback – Allows users to see their answers, the correct answers, feedback for the questions and the final score. When a Survey is deployed, only two options are available: • Status only – Allows Students to see if the Survey is complete or incomplete • Detailed Results – Allows users to see the answers they submitted. Blackboard is also capable of creating reports, from which the lecturer can verify the assignation to the group and the knowledge level. As previously mentioned platforms, Blackboard has the potential of manual regulation of assigning the resource, based on the test marks. Unfortunately, just like the other platforms, it has not got the capability of dynamic creation of teaching paths. The platform scores seven points in this category. WBT The system is furnished with mechanisms whereby tests and questionnaires may be created. The Multiple Choice Area components allow the author to create exercises where ‘check box’ based answers are randomly drawn inside a specified area and whenever the object is refreshed, it displays them in different order. As this element represents the multiple-choice type of the task, a student can define one or more correct answers by checking proper boxes next to particular entries. WBTServer platform, unlike most of LMS systems, allows the author to carry both: courses and exams. The system is based on the SCORM extensions; therefore it is possible to e.g. define exam rules for drawing test questions or to use the navigation that is built in the course. System have an examination module – the author can define if the course should be considered as an exam and therefore if it should end after a certain period of time with a special report. The system does not have a built-in automatic functionality for defining a learning path; likewise, the manual attribution of learning resources is also difficult because reports from knowledge-testing modules can only show if a given answer has been correct or incorrect. No score thresholds can be set to determine the level of knowledge that a student represents. For these reasons, the platform has been awarded 4 points in this category. ZSZN This integrated system is based on the ILIAS platform, with an added supplementary module for tests and questionnaires available in ILIAS. The added elements include adaptation tests, in an attempt to determine access to relevant didactic materials depending on the results obtained by individual students. The process of attribution is done in an automated and dynamic way [Sztejnberg et al. 2010]. The ideas for introducing additional interrelationships in the testing procedure, in order to combine the advantages offered by adaptive testing based on the theory of answers with mechanical selection of tasks according to an individual learning programme, are shown in the flowchart below (Figure 5).
Remote Teaching and New Testing Method Applied in Higher Education
253
Fig. 5 The general CAT (Computer Adaptive Testing) algorithm, which follows the mathematical strategy to select test questions, inside a personalized e-learning platform
In the proposed solution, the testing procedure begins with the inputting of the test parameters. Each test is described using a vector of parameters (DT), which denote thematic areas covered by the test knowledge measurement. The test parameters are unquantifiable values, expressed either verbally or symbolically. Learners registered in the distance learning system have their individual profile created. The profile inter alia stores information about the areas covered by the learner’s learning path. The profile area vector (DP) stores the symbols of the areas comprised by the individual learning path. It is assumed that the course’s subject matter is subject to personalization and that some of the thematic areas may be omitted in the course of learning while others represent a permanent component of a given educational program. In addition to its name defining the thematic area, each of the areas specified
254
L. Pyzik
in the student’s learning profile determines the status of the area that is taught. The status parameter of the area ascribed to an individual learning path is also unquantifiable. This parameter plays an informative role, which allows for distinguishing between the substantive levels at which a given thematic area has been taught. In the example provided, the thematic area DP(k) comprised by the learning path as specified in the learner’s profile, is marked with S(p) status, where the parameter p assumes a specific symbolic value denoting the following progress levels: b - basic, r expanded, s - specialized, sp- specialized-practical. The area status does not denote its level of difficulty but is a parameter, which identifies the level of specialization of a given subject matter that is taught. The possibilities for distinguishing between thematic areas and progress levels are used in the mechanical part of the inside testing procedure on which the IRT (item response theory). After the test parameters are input, the parameters of the areas from the learner’s profile will be entered. Then, to each of these areas the relevant status in keeping with the profile will be attributed. According to the assumption underlying personalized tests for e-learning platforms, knowledge in an area, which has not been covered by an individual learning path of the examinee, cannot be measured. To do so, the algorithm will on every occasion check if a given test area DT(i) is covered in the learner’s profile, which means that DT(i) is equal to DP(k). If a given area is shown in the individual learning path, then the parameter of the area status S(p) will be checked. After the area and the level of instruction for a given subject matter have been determined, the procedure of measuring knowledge in areas DP(k) with status S(p) will be started, based on the mathematical strategy for question selection, in line with the selected IRT model. To run the procedure, it is necessary to prepare an adequately calibrated test task bank, in keeping with the assumptions used for a relevant IRT model. If the Rasch model is employed, it is necessary to introduce a parameter for task difficulty - bi. To use the 2-PL model, it is necessary to introduce another parameter for differentiating between tasks - ai. If the 3-PL model is to be used, the test task bank needs a third parameter ci, for each of the tasks, which will determine the value of the ‘pseudochance’ parameter. The mechanical part of the testing procedure requires that, additionally, every test question should be described using infinite parameters to distinguish between thematic areas and determine the progress level in the thematic area taught. For an effective implementation of the procedure described here, the test task bank must consist of questions that are suitably detailed and must be of a sufficiently large size. The essence of knowledge measurement in the described procedure is the use of the mathematical strategy for question selection, pursuant to the assumptions of the theory of answers. After the material, which is compulsory for every examinee, has been specified in detail, the algorithm is performed by the CTA based on a mathematical strategy [Sereci 2003]. The questions are selected due to the maximum value of the informative function, after the achievement level has earlier been assessed. The procedure is completed when the assumed accuracy level has been achieved (SE<min). After the condition completing the examination of a given portion of the material is fulfilled, concluding value of the achievement level is determined, so as the test’s informative value and the standard error for a given thematic area. Subsequently, another cycle of measuring achievement in the remaining areas covered by the test is started, with the use of the mathematical strategy, after
Remote Teaching and New Testing Method Applied in Higher Education
255
ensuring that it is. The test ends once the examinee’s knowledge in all the test areas has been checked, provided that these areas have been included in the individual learning program. As the next step, the averaged final value of the examinee’s achievement level is determined (Θ), the value of the informative function of the test in all the areas covered by the test (I), and the standard error (SE) of the examinee’s final achievement level. In a situation when none of the test’s thematic areas has been included in the examinee’s learning path, information on the incorrect attribution of the learner to a given test is provided. The critical point of the proposed solution is how to develop a sufficiently large and structurally correct test question bank, which forms the very heart of any knowledge measurement system. In this solution, the test task bank, which normally contains universal questions, is a thoroughly thematically systematized set of test items. The questions should be calibrated in line with the applicable procedures used for a given IRT model. The mechanical part of the procedure requires that additional task attributes should be introduced in order to determine the vast attributes of questions in order to tailor the test to an individual learning program. It is necessary to introduce a parameter denoting the area affiliation and the level of specialization in a given area, referred to as its status. The quantifiable attributes of the test questions are assessed using parameter estimation procedures for IRT models. The test task bank must necessarily contain the variables, which represent the answer key to every question. For an automated personalization of knowledge measurement in line with an individual learning path, it is necessary to prepare an individual learner profile. As a minimum, it must include the subject matter and status of thematic areas covered by a given course. An additional attribute, which could expand the test’s personalization, is including parameters to define the learning goal or the special educational needs of the examinee in the mechanical part of the analyzed procedure. To individually adapt the test’s thematic scope to the needs of a given examinee, it is necessary to assign the parameters defining the subject matter of the areas covered by the measurement of achievements to the test; such parameters will be compared against the individual learning path and taken into account during the test only when there is a conformity between them. The basic features of the test module modification include the creation of test modules (sets of questions) stating their level of difficulty (herein referred to as the ‘weight’) and the development of ‘flexible’ testing paths based on the modules which have been solved earlier. The extension refers to two aspects of question selection: mechanical – based on the ‘weight’ ascribed to a question/set of questions and mathematical strategy – where the test creator structures the selection of the next question/set of questions in such a way as to ensure that the result of the solved test determines the selection of the subsequent test for the learner. We will refer to the sequence of tests solved by the examinee as the Individual test path (or IST). This is a strictly abstract creation, which has no counterpart in the form of a separate object, as the IST will be created dynamically, based on the results obtained by the examinee. The next step needed to adjust the IST is defining the leap rules – which in fact make up the IST. This operation is done in the newly created tab Settings/Progress level settings. The rules for the subsequent tests are defined by inputting one or several rules in the consecutive lines. In the first case, in the fields Score range (points/score), we enter the range that can be obtained by the examinee when solving the test (Test1); in the
256
L. Pyzik
next field– Specific test - we indicate the specific test (out of a pool of tests which have been defined earlier) which will be made available once the score range from the Score range (score/points) field is exceeded. Here, it should be taken into account that the field specific test will show a list of tests with the predefined Progress rate higher than 0. Another situation when the test sequence is defined the Leap rule (referring to the way the next test is selected) by inputting a numerical value in the field Specific progress level, to define the activation of tests with the specifically ascribed Progress level – predefined in every test created by the test creator. The provided number ranges in the field Score range (score/points) do not have to be mutually exclusive; neither do they have to be complete. Therefore, it is possible to select such ranges freely, e.g. the rules will ‘activate’ more than one test simultaneously or no test at all. Due to the proposed solution, combined with all the attributes/features offered by ILIAS, the system is awarded the highest score of ten points. The results in this category are summarily presented in Figure 6.
Fig. 6 Personalized learning paths
3.6 The Costs of the Platform and Technical Support Availability ILIAS, Moodle, OLAT, ZSZN All these platforms are open-source systems, which are made available under a public GNU GPL licence. This means that there is no fee for their installation, use, technical or software support. Using such a copyright licence makes it possible to modify the software freely to tailor it to the user’s needs. The system has been created and is dynamically developed by many different programmers, and not by a closed team. We can check what the system code contains, we can find and correct errors and develop software. Moodle is the most popular platform used in higher education, with the greatest numbers of plug-ins available to improve the platform’s operation. All these platforms have solid support groups and discussion forums where the administrator or course author will provide a solution to their problems. An additional advantage of all these platforms is their exhaustive system documentation. Owing to the above-mentioned aspects, these platforms are awarded the maximum score of 20 points.
Remote Teaching and New Testing Method Applied in Higher Education
257
WBT and Blackboard These are commercial, paid platforms furnished with professionally prepared documentation and technical support. The price for the system Blackboard for 100 users with an annual licence is about 5,000 $. The licence costs for the WBT system are negotiated separately with each prospective client, with the price range oscillating around $ 2,000 per licence for the server and WBTExpres. Support services are subject to an additional fee, depending on the services’ range. Due to its price, BlackBoard is awarded 0 points, and 5 points for excellent support in this category. The WBT is awarded 10 points in this category. Figure 7 shows a summary of points scored in this category.
Fig. 7 The costs of the platform and technical support availability
The composition of the achieved results by individual platforms is shown in Table 1. Table 1 Comparison of the results ZSZN
WBT
BlackBoard
OLAT
Moodle
Ilias
Requirement 1
8
8
7
7
5
2
9
8
4
10
4
10 10
3
10
8
4
8
10
10
4
6
10
3
10
7
10
5
7
7
6
7
4
10
∑
40
41
24
42
30
50
6
20
20
20
5
10
20
∑ -total
60
61
44
47
40
70
258
L. Pyzik
Figure 8 shows a summary comparison of the platforms discussed, exclusive of platform prices, while Figure 9 also takes the prices into account.
60
ILIAS
40
Moodle
20
OLAT BlackBoard
0 suma
WBT
Fig. 8 Summary comparison based on content-related criteria
Fig. 9 Summary comparison with costs included
4 Conclusion ZSZN, the Integrated Distance Learning System, emerges as one of the best elearning platforms. There is software connecting synchronous e-learning (Dim Dim system) with the ILIAS system, in which the managing the rooms, groups and users is on the ILIAS side. There is an additional plug-in for the ILIAS system, containing the possibility of creating offline tests and sending them to the ILIAS base. Furthermore, there is an opportunity of creating paper versions of every test created in ILIAS. ZSZN also has a built-in mechanism to create individual learning paths for individual students. There is a system for the work of virtual machines designed, co-operating and managed form the ILIAS base. It contains the mechanism of optimisation and assigning the work time in the virtual machine system. All the logins and activities register are kept in an ILIAS base. Due to testing the work on a base of virtual machines, the concept was created to move back from working in a client server mode in an e-learning system, but to start working in a “console” mode. This prospect offers a much quicker access to
Remote Teaching and New Testing Method Applied in Higher Education
259
the searched information; very useful when working with the synchronic elearning in a demo mode. Two available e-learning platforms clearly stand out from the open source platforms on the GPL licence. These platforms are Moodle and ILIAS. Both platforms based on the program open code solution, what allows changes, which adept the system for the final user requirements. Platforms equipped in very intuitional, easy to use interfaces. From the economical point of view, Open source platforms are much more enhanced than commercial platforms. The advantage of the commercial platforms is their full support from the companies, which offer their solutions. Despite that, every little change in the system is being additionally charged for. Open source platforms offer help groups, from which you can find the solutions for the possible problems.
References [Bednarek et al.2007] Bednarek, J., Lubina, E.: Distance education, the didactic background. MIKOM, Warszawa, Poland (2007) (in Polish) [Chmielewski 2006] Chmielewski, J.M.: E-learning. Standardization platforms versus application quality. ABC Jakości nr 2-3, Poland (2006) (in Polish) [Machado et al.2007] Machado, M., Tao, E.: Blackboard vs. Moodle: comparing user experience of learning management systems. In: 37th ASEE/IEEE Frontiers in Education Conf., Milwaukee (2007) [Morrison 2003] Morrison, D.: E-learning strategies. John Wiley & Sons Ltd., Chichester (2003) [Palka 2010a] Palka, E.: OLAT Platform as a tool of distance education. E-mentor 3(35), 36–42 (2010) (in Polish) [Palka 2010b] Palka, E.: The comparison of Moodle and OLAT educational platforms. Ementor 4(36), 27–32 (2010) (in Polish) [Pyzik 2010a] Pyzik, L.: Virtualization for e-learning. E-mentor 1(33), 48–50 (2010) (in Polish) [Pyzik 2010b] Pyzik, L.: The idea of an integrated remote teaching system. E-mentor 2(34), 42–46 (2010) (in Polish) [Sereci 2003] Sereci, S.: Computerized adaptive testing: An introduction. ERIC Document No. ED480083, pp. 685–694 (2003) [Sztejnberg et al. 2010] Sztejnberg, A., Hurek, J.: Improving the computer-based test measurement. Wydawnictwo Uniwersytetu Opolskiego, Opole, Poland (2010) (in Polish) [WWW-1 2010] BlackBoard Instructor Guide, http://library.blackboard.com/docs/r6/6_1/ instructor/bbls_r6_1_instructor/ (accessed April 10, 2010)
Points of View on Magnetic Levitation System Laboratory-Based Control Education C.A. Dragoş1, S. Preitl1, R.E. Precup1, and E.M. Petriu2 1
Department of Automation and Applied Informatics, “Politechnica” University of Timisoara, Timisoara, Romania {claudia.dragos,stefan.preitl,radu.precup}@aut.upt.ro 2 School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada
[email protected]
Abstract. The chapter offers some points of view concerning the education in control engineering on the basis of a Magnetic Levitation System with Two Electromagnets (MLS2EM) laboratory equipment. The syllabus of control engineering includes the treatment of the following issues: plant analysis and modeling, development of low-cost control solutions and algorithms applied on real-time laboratory experiments, and the assessment used in the syllabus of control engineering courses. The presentation is application-oriented focusing on the ML2EM laboratory equipment, and the low-cost control solutions presented here deal with state feedback control, proportional-integral-derivative control and model predictive control. Our syllabus structure is attractive as it allows the proper assessment of students’ knowledge. The real-time experiments highlighted in this chapter accompany the laboratory-based education in control engineering and they ensure the improved understanding of the theoretical aspects taught at the lectures.
1 Introduction The purpose of this chapter is to illustrate how to make easy, accessible, better understandable and increasingly attractive for students different types of (low-cost) control structures in the Control Engineering courses. Based on their proven previous results, three teams from the Bachelor program and one team from the Master program are selected to bridge the gap between theory and practice. The Control Engineering course is dedicated to graduate students who have a system theory and basics of control engineering backgrounds. 14 hours of direct laboratory activity consisting of three weekly laboratory sessions of four hours each for Bachelor students and one laboratory sessions of two hours to sum up the received information are scheduled in the framework of the Control Structures and Algorithms course taught at the “Politechnica” University of Timisoara, Romania [Preitl et al. 2009]. The weekly laboratory sessions are organized as follows:
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 261–275. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
262
C.A. Dragoş et al.
(a) The first session involves the functional analysis of the plant and the modeling techniques that make use of analytical calculations, model order reduction, and the linearization of the plant. (b) The second session discusses the application of two types of disturbance inputs (i.e. sinusoidal and pseudo-random binary signals) to the bottom electromagnet, and develops the following three control solutions: • a state feedback control structures to stabilize the system, • a cascade control structure with state feedback controller in the inner loop and a proportional-integral-derivative (PID) controller in the external loop to ensure the zero steady-state control error, • a cascade control structure with state feedback controller in the inner loop and a model predictive control (MPC) algorithm in the external loop to obtain better performance. (c) The implementation of the control algorithms (the controllers) and the testing of the control structures by digital simulations and by real-time experiments is conducted in the third session. (d) The last session supposes plenary presentation solutions, critical analysis of the experimental results and comparative discussion of different control solutions including a discussion of their advantages and shortcomings. The control engineering syllabus includes lectures on: (a) state feedback and PID control structures and algorithms; (b) MPC algorithms in their RST (Two-Degree-of-Freedom, 2-DOF) form. The main objective of the control engineering laboratory is to highlight the creative initiative of the students, to work in teams, and to reach a spirit of collaboration. The plant taken in consideration is a complete control laboratory system, namely a Magnetic Levitation System with Two Electromagnets (MLS2EM) [Inteco 2009]. The magnetic levitation problem for a metallic sphere maintained in an electromagnetic field is a classical nonlinear and unstable application. The theoretical support is offered by the lectures. A real-time digital control environment with a hardware-in-the-loop magnetic levitation (Maglev) device for modeling and controls education, with emphasis on neural network feed-forward control is used and discussed in [Shiakolas et al. 2004]. A low-cost magnetic levitation kits for introductory undergraduate courses is designed in [Lilienkamp 2004]. Based on their knowledge and on the assembly instructions, the students can built the magnetic levitation equipment and can modify the sensors, the magnets and the power electronics to improve the system performance. A short presentation of the magnetic levitation equipment in analogue and digital mode and some results obtained from the simulation of a webbased laboratory equipment are given in [Naumović and Veselić 2008]. In order to perform the required tasks, the presented approach starts with the above presented actual literature concerning the educational aspects based on magnetic levitation systems. A systematic treatment of the laboratory-based education in control engineering is offered encompassing specific aspects that emerge from the ML2SEM laboratory application.
Points of View on Magnetic Levitation System Laboratory-Based Control Education
263
This chapter deals with the following topics: Section 2 is dedicated to the mathematical modeling of the MLS2EM viewed as controlled plant. The theoretical support on the nonlinearity and linearization aspects are treated and discussed with the students. Section 3 presents next the designed state feedback control structure and a combined PID & state feedback solution is adopted to ensure the zero steady-state control error. The model MPC structure is analyzed as well. Several real-time experimental results are presented in Section 4 to test the control structures. The conclusions are highlighted in Section 5.
2 Mathematical Modeling of MLS2EM The MLS2EM laboratory equipment [Inteco 2009] is a magnetic levitation system consisting of a sensor, an actuator and two electromagnets between which a ferromagnetic sphere is keep in levitation, Fig. 1. The sphere position is determined by a position sensor, and it can be adjusted and set to the desired (reference) value by means of appropriately designed control systems [Preitl et al. 2009]. When both electromagnets are used, the lower can be used like an additional force, and this feature is useful in robust control applications and in robustness analyses.
Fig. 1 The MLS2EM laboratory equipment [Inteco 2009]
The control software works in real-time for MS Windows 2000/XP using Matlab-Simulink and the RTW and RTWT toolboxes [Inteco 2009]. The control software of the MLS2EM includes 4 groups of menu items: 1. The Identification Menu includes initial values of the parameters; the static and dynamic characteristics may change. 2. The Device drivers Menu includes the real-time connection between Matlab and the acquisition board. 3. The Digital simulation Menu includes an open loop control structure and state feedback control structure.
264
C.A. Dragoş et al.
4. The Real-time experiments Menu which includes basically four control structures: state feedback control with the control signal applied to the upper electromagnet (EM1), state feedback control with the control signal applied to both electromagnets (EM1 and EM2), cascade control structure with state feedback control in the inner loop, and PID controller in external loop and MPC structure developed by the students. The diagram of principle of the MLS2EM is presented in Fig. 2, where the two electromagnetic forces Fem1 and Fem 2 , and the gravity force Fg act on the ferromagnetic sphere. The electromagnetic force of the top electromagnet (Fem1) depends on two variables, the sphere position between the electromagnets and the current in electromagnetic coil. This force must be set such that to balance the sphere’s gravity force.
Fig. 2 Block diagram of MLS2EM [Inteco 2009]
The first principle equations that characterize the dynamics of ML2SEM laboratory equipment are [Inteco 2009] x1 = x 2 , x 2 = − Fem1 + g + Fem 2 , m m x = 1 (k u + c − x ), 3 3 f i ( x1 ) i 1 i 1 x = 4 f ( x − x ) (k i u 2 + ci − x 4 ), i d 1 Fem1 = x32
FemP1 x exp(− 1 ), FemP2 FemP2
Fem2 = x 42
FemP1 x − x1 exp(− d ), FemP2 FemP2
f i ( x1 ) =
f iP1 x exp(− 1 ), f iP 2 f iP 2
(1)
Points of View on Magnetic Levitation System Laboratory-Based Control Education
265
where: x1 – sphere position, x1 ∈ [0, 0.016] ; x2 – sphere speed, x2 ∈ ℜ ; x3, x4 – currents in the electromagnetic coil, x3 , x4 ∈[0.03884,2.38] ; u1, u2 – the control signal for the electromagnets, u1 , u2 ∈[0.00498,1] . The numerical values of the MLS2EM plant parameters are presented in Table 1 [Inteco 2009]. Table 1 Numerical values of plant parameters [Inteco 2009] Variables m g FemP1 FemP2 fiP1 fiP2 ci ki
Numerical values 0.0571[kg] 9.81 [m/s2] 1.7521*10-2 [H] 5.8231*10-3 [m] 1.4142*10-4 [ms] 4.5626*10-3 [m] 0.0243 [A] 2.5165 [A]
To develop low-cost control solutions, students linearized the nonlinear model (1) around several operating points, obtaining the state-space linearized model Δ x = A Δ x + b ΔV , Δy = cT Δ x, 0 a A = 21 a31 a41
0 0 0 0 0 a23 a24 , B = , b3 0 a33 0 0 0 a44 b4 x Δ 1 Δx T c = [1 0 0 0], x = 2 , Δx3 Δx4 1
(2)
where Δu1 = u1 − u10 and Δy = y − y0 are the differences of the input and output variables u1 , x and y with respect to their values corresponding to the operating point, u10, and y0 , respectively, Δx = x − x10 = [Δx1 Δx2 Δx3 Δx4 ]T is the state vector, and the superscript T indicates the matrix transposition, and the expressions of the parameters are
266
C.A. Dragoş et al. xd − x10
x10
a 2,1 =
2 2 x30 FemP1 − FemP 2 x 40 FemP1 − FemP 2 e + e 2 2 m FemP 2 m FemP 2
xd − x10
x10
a 2 ,3 = −
− 2 x30 FemP1 − FemP 2 2x F e , a 2, 4 = 40 emP1 e FemP 2 m FemP 2 m FemP 2
.
(3)
a3,1 = −( k i u + ci − x30 )( x10 / f iP 2 ) f i −1 ( x10 ) , a3,3 = − f i −1 ( x10 ) a 4,1 = −( k i u + ci − x 40 )( x10 / f iP 2 ) f i −1 ( x d − x10 ) a 4,4 = − f i −1 ( xd − x10 ) b3 = k i f i −1 ( x10 ) , b4 = k i f i −1 ( xd − x10 )
The linearization is done in the vicinity of the operating points x10 , x20 , x30 , x 40 , is detailed in Table 2 [Dragoş et al. 2010]. Controllability and observability tests must be conducted on the linearized model. They are necessary in the design of the speed estimators. Table 2 Matrices and transfer functions of linearized plant [Dragoş et al. 2010] Operating points
x10
0.007
0.008
0.009
x 20
0
0
0
x 30
0.754
0.285
0.6
A x , bu , cT
H P (s )
1 0 º ª 0 A x = «« 1860 0 − 24 »» , «¬15024 1878 − 150»¼ ª 0 º bu = «« 0 »» , cT = [1 0 0] «¬375.6»¼
(1 + 0.2s )(0.000023s 2 + 0.0034s + 1)
x 40
0.37
0
0
ª 0 « A x = «186.0602 «18125 ¬
º 0 » - 7.6031» , 0 - 186.2891»¼
−0.11
1 0
ª 0 º » , cT = [1 0 0] bu = «« 0 » ¬«468.7966¼»
−0.0153 (1 + 0.065s)(1 + 0.011s + 0.000066s 2 )
ª 0 º 1 0 « » A x = «186.0602 0 - 7.6031» , «18125 » −0.0117 0 186.2891 . ¬ ¼ (1 + 0.051s)(1 + 0.006s + 0.0000289s 2 ) ª 0 º » , cT = [1 0 0] bu = «« 0 » «¬468.7966»¼
Points of View on Magnetic Levitation System Laboratory-Based Control Education
267
3 Control Structures Development The next step after the mathematical modeling of the plant, students had to develop two control structures in order to stabilize the system and to ensure the desired sphere position between two electromagnets [Dragoş et al. 2010]: (a) A state feedback control structure which stabilizes the unstable plant. (b) A PID control structure, in two basic variants: classical PID controller, and non-homogenously acting PID controller [Preitl et al. 2009]. (c) An MPC structure. To obtain bonuses each group of student must used the creativity and the theoretical information to improve the performances of the MLS2EM plant. Therefore, the groups of students can: (a) conduct a sensitivity analysis regarding two different operating points, (b) apply sinusoidal signals (with frequency varying in a given domain) as disturbances to the bottom electromagnet; (c) apply pseudo-random binary sequences to the bottom electromagnet and try to estimate the plant’s state vector. Due to the instability of the MLS2EM, a state feedback control structure (SFCS) can be designed in terms of [Dragoş et al. 2010]. The block diagram of the state feedback control structure is illustrated in Fig. 3, where u1 = u EM1 the control signal, u 2 = u EM2 is the disturbance input, wx is the reference input, ex is the control error, y x is the output, k Tc is the state feedback gain matrix. The matrices of the state feedback control structure, A x = A + b u k Tc k AS , b u , c T and the obtained transfer functions (t.f.s) are detailed in Table 2. The students have to design the SF-CS on the basis of the following reduced order linear plant model (2) neglecting the currents in the electromagnetic coil: Δx1 0 1 Δx1 0 + Δu I , = Δx 2 a 21 0 Δx 2 a 23 Δx Δy1 = [1 0] 1 . Δx 2
(4)
The pole placement method is recommended and used in the plant stabilization, (for example, the imposed poles are p1* = −0.25, p 2* = −240 ) resulting in the state feedback gain matrix k Tc = [ 40 5] (for the given example). Each team must design the controller for three operating points, making use of the calculated data given in [Inteco 2009]. When the control signal is applied to both electromagnets the bottom electromagnet (EM2) can be used as an additional force and the control signal plays the role of an external pulse excitation.
268
C.A. Dragoş et al.
Fig. 3 Block diagram of state feedback control structure (with kAS =1) [Dragoş et al. 2010]
To ensure the desired sphere position, which is not guaranteed by the SF-CS, each group of students designed a cascade control structures considering a state feedback controller (see the previously paragraph) in the inner loop and a PID controller in the external control loop [Dragoş et al. 2010], Fig. 4.
Fig. 4 Block diagram for PID control structure current in the top electromagnet (neglecting the current in the bottom electromagnet) [Dragoş et al. 2010]
Taking into account the t.f. H P ( s ) = c T ( sI − A x )b u ,
(5)
and the performance requirements (zero steady-state control error, a recommended phase margin of 600 and small settling time), the students are requested to design PID controller using the pole-zero cancellation for each operating points. The obtained t.f. of the PID controller is H C ( s) =
kr 1 , ζ r < 1. (1 + 2ζ r Tr s + Tr2 s 2 ) s (1 + sT f )
(6)
The MPC structure is designed by the first group of students to reduce the control error and to improve the performances of the control system dedicated to the MLS2EM. The MPC structure is used successfully in mechatronic system control due to the offered advantages such as robustness, capability to compensate the measurable disturbance, the possibility to obtain a transfer function (t.f.) with reduced order and the existence of few tuning parameters. In this chapter, the MPC structure is used in its RST (2-DOF) form shown in Fig. 5.
Points of View on Magnetic Levitation System Laboratory-Based Control Education
269
Fig. 5 Block diagram for PID control structure current in the top electromagnet (neglecting the current in the bottom electromagnet) [Dragoş et al. 2010]
Since the designed MPC algorithm must be implemented using the ARX model A( q −1 ) y (k ) = B (q −1 )u ( k − 1)
(7)
the students have to discretize the t.f. (5). To design the MPC the students must minimize the one-step quadratic objective function J=
1 [ yˆ (k + 1) − r (k + 1)]2 , 2
(8)
and the obtained control algorithm is u (k ) =
AT * AR + q −1 BS
r (k + 1) −
CS / D AR + q −1 BS
e( k ) .
(9)
The closed-loop system poles are obtained by solving the Diophantine equation (i.e. the characteristic equation) A(q −1 ) R (q −1 ) + q −1 B (q −1 ) S (q −1 ) = PRM (q −1 ) ,
(10)
where PRM (q −1 ) is the imposed denominator of the transfer function of the reference model.
4 Experimental Results Three control solutions are developed on the detailed mathematical model of the MLS2EM laboratory equipment and tested by simulations and real-time experiments. The real-time experimental results are conducted by students on the experimental setup implemented around the laboratory equipment ML2SEM in the Intelligent Control Systems Laboratory of the “Politehnica” University of Timisoara, Romania [Preitl et al. 2009; Dragoş et al. 2010].
270
C.A. Dragoş et al.
To implement control structures on real-time laboratory equipment, each experiment requires a good knowledge of the specific laboratory setup and also, students must be able to collect and interpret experimental data. The experimental approach on the laboratory equipment supposes different scenarios for each control solution. The developed solutions were tested by the students in different simulation scenarios by simulation and experimental testing. The testing scenarios include the evolutions of the sphere position, the sphere speed, the control signals applied to EM1 and EM2, and the currents in the electromagnetic field versus time. Only the real-time experiments are synthesized as follows. The experimental testing was preferred because it helps the students in their understanding of the theoretically notions and of the ways these strategies can be implemented on realworld systems. The experimental results related to the state feedback control system are detailed in [Dragoş et al. 2010]. The students can analyze several aspects which result from these results. The results obtained through by laboratory experiments for the PID controller are shown for the first set-point (reference input) of 0.007 in Fig. 6, for the second set-point of 0.008 in Fig. 7, and for the third set-point of 0.009 in Fig. 8.
Fig. 6 Real-time experimental results for cascade control structure with PID controller designed for set-point 0.007: (a) sphere position, (b) sphere speed, (c) currents, (d) control signal
Points of View on Magnetic Levitation System Laboratory-Based Control Education
271
Fig. 7 Real-time experimental results for cascade control structure with PID controller designed for set-point 0.008: (a) sphere position, (b) sphere speed, (c) currents, (d) control signals
Fig. 8 Real-time experimental results for cascade control structure with PID controller designed for set-point 0.009: (a) sphere position, (b) sphere speed, (c) currents, (d) control signals
272
C.A. Dragoş et al.
It can be observed from the control system response that the students can measure the control system performance indices. The students could notice that the pair of complex conjugated poles in the t.f of the state feedback control system and the nonlinear characteristic of the basic plan lead to oscillations at the beginning of the real-time experiments, especially in the first case, when the oscillations are maintained during the entire simulation period. The results for different set-points were presented in order to make the students aware of the different behaviors of the nonlinear control systems with respect to different values of system inputs. The students are expected to test different set-point values. After the design of PID controller for each operating point, each group of students must improve the performance of the controlled plant. Therefore, the results for the first group concern the design of a MPC structure, and the experimental results are illustrated in Fig. 9. Analyzing the experimental results shown in Fig. 9, students can notice the presence of several disturbances at the beginning of the simulation, but after 5 sec the sphere stabilizes around the reference value w. The control signal presented in the previous cases was applied to the upper electromagnet (EM1) while no voltage was applied to the bottom electromagnet (EM2). The students also experiment and discuss the situations when the control signal is applied to EM1 and other voltage signals playing the role of disturbance inputs are applied to EM2. If sinusoidal signals or pseudo-random binary signals are applied to EM2 the students can notice that some oscillations appear after the disturbances is acting on the sphere. These results are presented in Fig. 10 and Fig. 11, respectively.
Fig. 9 Real-time experimental results for the model predictive control structure designed for set-point 0.007: (a) sphere position, (b) sphere speed, (c) currents, (d) control signals
Points of View on Magnetic Levitation System Laboratory-Based Control Education
273
Fig. 10 Responses of real-time experimental results of PID control structure designed for set-point 0.008 for sinusoidal disturbance: (a) position, (b) speed, (c) currents, (d) control signals
Fig. 11 Responses of real-time experimental results of PID control structure designed for set-point 0.008 for pseudo-random binary signal disturbance: (a) position, (b) speed, (c) currents, (d) control signals
274
C.A. Dragoş et al.
All simulation and experimental scenarios are conducted by the students to bridge the gap between theory and practice. Therefore the laboratory is extended on a half semester. Statistical data with regard to the students’ results obtained in the framework of laboratory activity are presented in [Dragoş et al. 2010]. They are useful to evaluate the systems’ results in two consecutive academic years, 2007-2008 and 2008-2009, and to point out the feedback got from the students. The most motivated students were the ones who implemented control strategies by real-time experiments. They obtained the best results compared to the other groups of students. Other discussions on the final conclusions are done in the classroom with all groups of students. They can be related to the sensitivity and robustness studies relative to the modification of the mass of the sphere (+15%, +25 %, …) while maintaining the sphere’s size. These analyses are required only for the best students.
5 Conclusions This chapter has presented points of view concerning the application-based approach to the teaching of control systems. The control structures and algorithms are tackled. The main new contributions of this chapter with respect to [Dragoş et al. 2010] concern the additional educational points of view and the extended set of real-time experimental results. These points of view highlight the advantages of our laboratory-based education: application-oriented low-cost design and implementation, and generality which makes it applicable to wide areas of control system applications [Grzymała-Busse et al. 2005; Wilamowski 2009; Kulikowski 2009; Kouro et al. 2010; McCarty et al. 2010]. The students must develop, implement and compare low-cost control structures and algorithms on a real and complex system, dedicated to teaching control systems as part of the laboratory during the semester. Based on the identified parameters of the plant and on the nonlinear model, the students linearize the model around several operating points. Two parts of the course were used to design the proposed control structures for the laboratory equipment. The first part includes the control strategies with state feedback control and PID controller and the second includes the MPC strategy in its RST (2-DOF) form. Also, the students applied several disturbances to the bottom electromagnet and measure the control system performance indices with respect to these disturbances. After this course the students must be able to analyze and compare the designed control strategies and also to choose the solution which provides the better performance. These aspects are reflected in the additional knowledge acquired by the students and also in the results obtained at the exams. At the end of the laboratory activity the best students are selected for the diploma theses, and they must try other solutions for the accepted laboratory equipment. For example the robustness can be verified when the radius of the sphere is modified.
Points of View on Magnetic Levitation System Laboratory-Based Control Education
275
Acknowledgment This work was supported by the CNCSIS and UEFISCSU of Romania and the co-operation between the Óbuda University, Budapest, Hungary, the University of Ljubljana, Slovenia, and the “Politechnica” University of Timisoara, Romania, in the framework of the Hungarian-Romanian and Slovenian-Romanian Intergovernmental Science & Technology Cooperation Programs. This work was partially supported by the strategic grant POSDRU 6/1.5/S/13 (2008) of the Ministry of Labor, Family and Social Protection, Romania, cofinanced by the European Social Fund – Investing in People.
References [Dragoş et al. 2010] Dragoş, C.A., Preitl, S., Precup, R.E., Petriu, E.M.: Magnetic levitation system laboratory-based education in control engineering. In: Proc. 3rd Int. Conf. on HSI, Rzeszow, Poland, pp. 496–501 (2010) [Grzymała-Busse et al. 2005] Grzymała-Busse, J.W., Hippe, Z.S., Mroczek, T., et al.: Data mining analysis of granular bed caking during hop extraction. In: Proc. 5th Int. Conf. on Intel. Sys. Des. and App., Wroclaw, Poland, pp. 426–431 (2005) [Inteco 2009] Inteco Ltd. Magnetic levitation system 2EM (MLS2EM), User’s manual. Inteco Ltd., Krakow, Poland (2008) [Kouro et al. 2010] Kouro, S., Malinowski, M., Gopakumar, K., et al.: Recent advances and industrial applications of multilevel converters. IEEE Trans. Ind. Electron 58(8), 2553–2580 (2010) [Kulikowski 2009] Kulikowski, J.L.: Decision making supported by fuzzy deontological statements. In: Proc. Int. Multiconference on Computer Science and Information Technology, Mrągowo, Poland, pp. 65–73 (2009) [Lilienkamp 2004] Lilienkamp, K.A.: Low-cost magnetic levitation project kits for teaching feedback system design. In: Proc. 2004 American Control Conference, Boston, MA, USA, vol. 2, pp. 1308–1313 (2004) [McCarty et al. 2010] McCarty, K., Manic, M., Cherry, S., McQueen, M.: A temporalspatial data fusion architecture for monitoring complex systems. In: Proc. 3rd Int. Conf. on HSI, Rzeszow, Poland, pp. 101–106 (2010) [Naumović and Veselić 2008] Naumović, M.B., Veselić, B.R.: Magnetic levitation system in control engineering education. Autom. Control Robot 7(1), 151–160 (2008) [Preitl et al. 2009] Preitl, S., Precup, R.E., Preitl, Z.: Process control structures and algorithms, vol 1 & 2 (in Romanian). Editura Orizonturi Universitare Publishers, Timisoara (2009) [Shiakolas et al. 2004] Shiakolas, P.S., Van Schenck, S.R., Piyabongkarn, D., Frangeskou, I.: Magnetic levitation hardware-in-the-loop and MATLAB-based experiments for reinforcement of neural network control concepts. IEEE Trans. Educ. 47(1), 33–41 (2004)
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education E. Noyes1 and L. Deligiannidis2 1
Blank Center for Entrepreneurship, Babson College, Babson Park, MA, USA
[email protected] 2 Computer Science Department, Wentworth Institute of Technology, Boston, MA, USA
[email protected]
Abstract. Creative destruction−the creation of new industries and the destruction of old industries−is a very abstract concept. Those teaching entrepreneurship, where creative destruction is a central feature, often struggle to communicate the dynamism of industry evolution where industry disruption can yield innovation, entrepreneurial opportunities and new wealth. This paper examines the application of human-computer interaction (HCI) and specifically information visualization to entrepreneurship education, a specialized area of business education. We create and evaluate different 2-D and 3-D visualizations of industry evolution in the Popular Music Industry between 1951 and 2008 to determine which visualizations correspond to superior comprehension of creative destruction. Particularly, our challenge was to represent the emergence of 13 major markets and 193 submarkets in the context of six decades of music industry evolution and disruption. The results suggest information visualization is a resource for entrepreneurship education and that significant improvements can be made over current idiosyncratic methods of representing industry evolution.
1 Introduction Creative destruction−the creation of new industries and the destruction of old industries−is a very abstract concept. According to the Austrian economist Joseph Schumpeter [Schumpeter, 1934], creative destruction is “the essential fact about capitalism” where new combinations of resources (human talent, physical resources and financial resources) give rise to new industries and wealth [McCraw, 2009]. Schumpeter argues creative destruction is the primary mechanism for economic development for societies and businesses. In his view, entrepreneurs are the dynamic figures who innovatively assemble and recombine vital resources to serve new customer needs, thereby creatively destroying the pre-existing economic order. For example, the explosive growth of Wal*Mart effectively destroyed the preexisting order in the traditional department store industry, effectively reallocating
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 277–294. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
278
E. Noyes and L. Deligiannidis
financial and human resources− and not to mention customers−to fit a new industry landscape [Paruchuri et al. 2009]. Similarly, business-to-consumer e-commerce (Buy.com, Amazon.com and Zappos.com) fostered the creation of new industries and threatened retail incumbents, particularly those slow to enter this new retail domain [Tangpong et al. 2009]. Schumpeter asserts entrepreneurs are heroic figures who−with uncommon foresight, insight and actions−fashion new innovations which allow for the creation of new industries and their constituent new markets [Schumpeter, 1934]. Creative destruction refers specifically to the economic and social impact of entrepreneurs whom wrestle control of high-value resources from existing uses to put them to more profitable uses and thereby reshape the industrial landscape [Christensen, 2001; Christensen et al. 2004; Jovanovic et al. 2006]. So-called Schumpeterian entrepreneurs are revered by entrepreneurship educators and entrepreneurship students alike, in part, because these entrepreneurs spawn new ventures (i.e., experiments in the economic system) which foster the creation of new jobs and promote growth and progress in an economy. By contrast, Schumpeterian entrepreneurs are also feared by established industry players because they generally destroy existing sources of wealth and create new competition for vital resources. Broadly speaking, the concept of creative destruction is rarely taught directly in entrepreneurship education. Namely, students are taught about the importance of analyzing a new venture’s industry environment, but there is rarely detailed discussion on creative destruction or Schumpeter’s foundational ideas. More specifically, the rise and fall of industries as a motor for economic growth is rarely taught as a source, or driver, of entrepreneurial opportunity creation and innovation. Rather, the careers and actions of Schumpeterian entrepreneurs (e.g., Bill Gates in the emerging personal computer industry) are studied outside of a broader generalized model of creative destruction and new industry emergence [Chiles et al. 2007]. Instead, the process of entrepreneurship is commonly instructed as a series of interlinking steps (opportunity recognition, customer and market analysis, resource acquisition and then new venture creation) where representations of changing industry environments are stylized and non-standardized. For example, a specific entrepreneurship teaching case (a fact-based reenactment of an actual entrepreneur’s challenges and decisions) may draw on a wide range of idiosyncratic cross-sectional and sometimes longitudinal industry information to incompletely describe an entrepreneur’s industry environment at the moment of venture founding. Cross-sectional information may include snapshots of estimated market shares among competitors, “Five Forces” analysis, or perceptual maps showing segmentation in a market at specific times. Longitudinal information may include data tables capturing gross revenues or total units sales for an industry (e.g., data charts), but it is the work of the entrepreneurship professor or student to construct and assess useful aides for analysis of the changing industry market structure. Rarely, if ever, are there rigorous scientific efforts to empirically represent longterm trends in industry evolution.
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education
279
While most entrepreneurship students can readily list off industry-changing entrepreneurs−Steve Jobs of Apple, Ray Kroc of McDonalds, Herb Kelleher of Southwest Airlines– few can characterize the starting, intermediate, and late stages of industry evolution for those entrepreneurial stories, let alone the related reshaping or demise of related industries. The presentation and analysis of industry environments in entrepreneurship, both in entrepreneurship education and by actual entrepreneurs seeking startup funds, is highly idiosyncratic with no standard method to represent the terrain of industry evolution. Individually, each representation often offers isolated one-dimensional, and generally cross-sectional, depictions of industry evolution, increasing the odds that entrepreneurship students fail to grasp the underlying dynamics of creative destruction. Rarely, if ever, do representations of industry evolution use rich relational longitudinal data from the industry (e.g., longitudinal representations of shifting strategic alliance networks or R&D collaborations) to offer strategic perspectives on changing industry market structure. Rather, students are pushed in entrepreneurship teaching cases to embrace the precise challenge faced by the studied entrepreneurs with idiosyncratic information about the entrepreneur’s industry. Different rates of change in industry evolution−for example in the software industry compared to the banking industry−are generally handled qualitatively. While incisive interpretations of industry changes and resulting entrepreneurial opportunities are celebrated, there is little rigorous development and testing of representations of changing industry market structure, let alone their impact on comprehension of the entrepreneurial process. In this paper, we detail efforts to develop 2-D and 3-D visualizations of creative destruction in the Popular Music Industry (1951-2008). In key ways, the visualizations we develop and test focus on the comprehension of the broad forces and consequences of creative destruction across six decades. Our 2-D visualization research focuses on the growth, evolution and interrelationships among of 13 major markets in the industry, where our 3-D visualization research describes ongoing efforts to show greater granularity—interrelationships among 193 constituent markets in the industry. The Popular Music Industry is examined as one particular industry context to examine creative destruction and industry evolution. The broader research aim is to develop tools, approaches and evaluation metrics to represent and compare creative destruction across industries as varied as Nanotechnology, Biotechnology, Online Social Networks and Mobile Computing. The educational and research opportunity is to create comparative visualizations, akin to analytical artifacts in Comparative Zoology or Comparative Economics, to highlight the underlying and varying dynamics of creative destruction. Below we provide a short overview of the key benefits and challenges of information visualization when presenting complex phenomena such as industry evolution.
280
E. Noyes and L. Deligiannidis
2 Background Educational research on learning styles shows that 65% of individuals are visual, or visually-dominant, learners (Visual Teaching Alliance, 2010). Well-designed visualizations of complex phenomena can speed comprehension, facilitate the formation of novel hypotheses and questions and, equally importantly, focus and stimulate group dynamics. A successful visualization is a visualization where humans analyze and query the data efficiently and, as a result, it helps them comprehend the data faster and easier. To accomplish this, one can convert data into a visual representation that allow users to dynamically explore the data so that they can comprehend and explore the data. After all, humans possess great pattern recognition skills especially when it comes to visual representations [DeFanti et al. 1989]. Invariably, an effective visualization is one where information is presented in a confined space. Restrictions such as the visual “real estate” of a computer monitor or a sheet of paper can become an obstacle in presenting data in such a way that humans, who are the primary judges of a visualization technique, can observe, query, and comprehend the data, and as a result can make intelligent decisions and carry out better reasoning. Presenting too much data can clutter the visualization medium (a sheet of paper, a poster, a computer monitor, etc). Because of this, many visualization techniques present the data to the user as an overview of the dataset with drill-down capabilities to see details on demand [Card et al.1999]. Relatedly, visualizing large datasets is a topic that has attracted much research. Two areas where large datasets need to be visualized include the Semantic Web, as well as the evolution and growth of the Worldwide Web itself. Because graph visualization does not scale well for large datasets [Frasincar et al. 2006], different techniques have been developed to circumvent this challenge. Most visualization techniques present the data as an overview of the dataset [Taowei and Bijan 2006] and include functionalities that enable the user to zoom in and out to study the detail of the data. Others offer simultaneous macro-micro levels of information, allowing the user to choose varying levels of abstraction. Small multiples--reoccurring visual structures which facilitate a “visual language” within and across visualizations--can provide users grounding points for broader exploration, theorizing, or question framing for a particular visualization. Yet other approaches to visualization give the user full control to explore the dataset one step at a time [Deligiannidis et al. 2007]. With this last technique, users often don’t see the overall visualization but can effectively visualize sub-graphs of the dataset. Generally it is best for a visualization to be built around the chief structure, or domain, of the data. For example, data that contains geographic information generally should be presented over a map [Deligiannidis et al. 2008] [Kapler and Wright 2004]. Even though the idea to utilize more dimensions to visualize a dataset seems attractive (i.e., 3-D versus 2-D), research shows that caution must be
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education
281
taken when adding dimensions since the added complexity—of interpretation and/or navigation— can confuse users. However, thoughtfully implemented and tested for effectiveness, heightened dimensionality can create added flexibility for visualizations and introduce new richness [Deligiannidis et al. 2008]. In our research, we aim to evaluate the trade-offs of 2-D versus 3-D visualizations of industry evolution in the Popular Music Industry. Particularly, we develop, test and experiment with different visualizations to depict a radical change within the industry’s market structure where many new markets were spawned over a period of six decades.
3 Data Source Our data come from allmusic.com, a top industry information provider whose database is the platform for both America Online’s and Yahoo! Music’s e-commerce website. The exhaustive archive on the popular music industry provides data including each artist’s markets, years of musical production in various markets, and, most importantly, lists of artists that have influenced each artist (1951-2008). In total, there are 14,000+ “influenced by” ties for 676 major industry artists. This data allows for a seamless and complete network picture of all major artists in the industry, their artistic influences over the past six decades, and the identification of the emergence of the 13 major markets and 193 sub-markets in the context of this influences structure. Industry Market Structure and Industry Evolution According to the Recording Industry Association of America (2009) the Popular Music Industry in the United States has emerged to be a $5 billion dollar a year industry. Music is created and sold in the following 13 major markets: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Hard Rock Soft Rock Art-Rock/Experimental Folk/Country Rock Rock & Roll/Roots Pop/Rock Psychedelic/Garage British Invasion Punk/New Wave Alternative/Indie-Rock Foreign Language Rock Europop Dance
282
E. Noyes and L. Deligiannidis
To give an example, the Rolling Stones create music which boosts Rock & Roll/Roots market sales while Madonna produces music that lifts Pop/Rock market sales. Collectively, these 13 major markets, or sub-genres, comprise what is formally known as the Popular Music Industry. Although it is funny to think of it as such, the Popular Music Industry is a relatively new industry. The industry was not born until the early 1950’s until the musical innovations of such artists as Fats Domino, Little Richard and Elvis Presley. While the Popular Music Industry is roughly two decades older than the Personal Computer Industry, it is several decades younger than industries such as the Automotive Industry, the Commercial Aviation Industry and the Chemical Industry. Over the last 60+ years, the Popular Music Industry has emerged as a social and economic force creating new wealth for companies in the music business and new creative outlets for the artists serving the industry. Artistic Influences and Evolution of the Industry’s Market Structure The 13 major markets listed above did not always exist as categories in the Popular Music Industry. Rather, they evolved from prior markets and new musical innovations that collectively gave rise to the industry’s market structure. John Lennon once remarked about The Beatles, “At least the first 40 songs we wrote were influenced by Buddy Holly”. His comment suggests that it is hard to assess the originality and contributions of an artist without considering an artist’s musical influences. Similarly, for decades artists have inspired each others to explore and blend new forms of music all impacting the development of the music industry’s market structure.
Fig. 1 Percentage of “Top Ten” Billboard100 Albums Coming from 13 Major Markets in the Popular Music Industry, 1951-2008. Source of data: allmusic.com
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education
283
Musicians are unique in that they cite and give credit to the influence of their artistic “forefathers” and foremothers” when shaping industry-changing musical innovations. Despite the widespread image of the tortured, isolated artist, research shows that highly-social versus socially-isolated artists are more likely to pioneer new markets. Compiling data for all 676 major artists and all 14,000+ “influenced by” ties we assembled complete network picture of all major artists in the industry and their artistic influences over the past six decades. Collectively, first, these data allow us to situate the emergence of new markets and changing industry market structure in the context of the complete influences network (co-mapping industry-level and artist-level data). Second, they allow us to examine each artist’s network position within the complete network of artistic influences and ascertain their status as a “first mover” (or latecomer) in a market. Figure 1, a simply stacked bar chart, is a visualization that shows the percentage of “Top Ten” albums coming from 13 major markets during 1951-2008 (source: allmusic.com). The figure was created by counting top albums from the BillBoard100 for each major market for each year. Taken together, the markets reflect the industry’s market structure, with major market emerging, growing, contacting—and even dying--over time. In aggregate, this stacked bar chart is a rough measure of major recognized musical accomplishments during the evolution of the industry (determined in part by album sales and album radio play). The figure shows the relative prominence of different markets during different decades but it does not show the interrelationships among the 13 major markets and their interrelationships. Correspondingly, as a visualization of industry evolution, this visualization cannot show interrelating factors that shaped the evolution of the industry and particularly how specific early markets influenced the creation of later markets—the layout is largely arbitrary. This shortcoming–an inability to show influences on industry market structure–is much of the motivation to explore other, richer visualizations of this phenomena.
4 (2-D) Visualizations Applying the network assembly methods described above, Figures 2, 3 and 4 below are three informationally-equivalent yet different visualizations that show how markets in the Popular Music Industry have influenced the development of other new markets in the industry (only looking at the 13 major markets). As one can see in Figure 2, all markets were influenced by the initial market Rock & Roll/Roots, which itself was founded in 1951. A timeline runs along the left side of the network graphic to show when the respective new markets emerged. Each arrow shows the direction of influence between markets. An arrow with two heads suggests influence occurred in both directions. In aggregate, these visualizations
284
E. Noyes and L. Deligiannidis
of relationships among the 13 markets give us some sense of how the industry evolved and which markets emerged from other prior markets—information that is not communicated in Figure 1 above. The connections were identified by analyzing all artists in the industry and the frequency with which artists in one market (e.g., Rock & Roll/Roots) created music in other markets (e.g., Pop/Rock) suggesting links between different markets. As one can see, all 13 markets that comprise the Popular Music Industry were in existence by 1970, but each market was influenced differently by the other markets.
Fig. 2 The “wire” graph. A visualization of how markets in the Popular Music Industry have influenced each other. Arrows show the direction of influence between markets. An arrow with two heads suggests influence occurred in both directions
Figure 2 was generated using Graphviz (www.graphviz.org). Graphviz is a graph visualization software package that generates a variety of graphs and implements layout algorithms. As is common with network visualization, we chose to avoid node overlap and minimize edge crossings. The produced graph is highly interconnected and as such the relationships between the 13 markets are difficult to discern. After studying this graph and the underlying data, we believed that it was possible to cluster together some of the categories to produce a more readable graph-at least less cluttered--where the resulting graph would have even fewer edges and minimum edge crossings.
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education
285
Fig. 3 The “boxy” graph. A visualization of how markets in the Popular Music Industry have influenced each other. Arrows can point to an individual market or a group of markets, which means that a market or group of markets, respectively, was influenced. Likewise, an arrow’s origin can be either a market or a group of markets, which means that a market or a group of markets are influences. An arrow with two heads suggests influence occurred in both directions
286
E. Noyes and L. Deligiannidis
Fig. 4 The “curvy” graph. This graph is similar to the “boxy” graph above. Instead of using straight lines and boxes, we use curved lines and ellipses to show influences and clusters
This produced the graph shown in figure 3 which depicts exactly the same information as the graph in figure 2. We determined that there are two clusters of markets (shown in figure 3 with background shading). Each cluster consists of four markets and these markets are heavily interconnected among themselves. In figure 3, one can see for example that a market created prior to a cluster influenced all markets within the cluster. This particular method eliminates the need to
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education
287
draw a relationship between a prior created market and every market in the cluster. One can now instead show a relationship to the entire cluster by a single arrow. In fact, we draw an arrow that starts either from a single market or a group of markets and point to either a single market or a group of markets. A third graph, shown in figure 4, is a modified version of the graph in Figure 3. This graph was produced after our pre-trials. Its main improvements, we believed, were: a) the year of birth is shown more clearly as it applies to an entire group of markets, b) instead of straight lines we used curved lines to form a better flow when reading the graph, and, c) instead of boxes to define group of markets, we used ellipses. Most importantly, we hoped that with ellipses novice users of the visualization would more readily identify the diagram a Venn-diagram, potentially simplifying interpretation.
5 Evaluation Procedure 114 subjects volunteered to participate in the evaluation of our 2-D visualizations. We performed the evaluation in two different colleges and combined all the data for the analysis. The subjects were undergraduate students and their average age was 19.5. We performed the evaluations in classrooms that could accommodate at least 45 students. At the beginning of the evaluation we distributed standard instructions, including very basic context about the music industry and explained to them that they would need to answer 11 questions. The directions instructed participants that it was better to spend more time on a question and get the answer correct, than finishing quickly and getting a question wrong. We divided our subjects equally into three groups. Each group received only one of the visualizations, but all received the identical instructions and answered the same set of questions. The total number of questions was 11 and the questions are shown in table 1. Upon starting, participants were to write down the current time (to the precise second) for when they started the exercise. At the end of the question 9 we asked them to write the exact time again. At the end of the eleventh question we asked the participants to capture the finish time. At the end of the questionnaire we asked each participant to note their academic year (Freshman, Sophomore, Junior, Senior), their gender, and their age. Additionally, we asked them to answer the following question in order to see if they understood the instructions: “The instructions were clear and sufficient to answer all the questions in this questionnaire…” Their response was based on a 5-point Likert scale (1. Strongly disagree … 5. Strongly agree). A few of the questionnaires were unusable because of missing data. A couple others indicated that they did not understand the instructions. Thus, we removed these questionnaires and we were left with three groups of subject each of 25 people. Each group worked on one of the three visualizations. From these three sets, 24 subjects were females and 51 males. These were 36 freshman, 37 sophomores, and 2 juniors.
288
E. Noyes and L. Deligiannidis Table 1 Questionnaire consisting of 11 questions
Q1
Looking at the emergence of the industry as a whole, how many major markets evolved in the Popular Music Industry between 1951 and 2009?
Q2
Looking at the emergence of the industry as a whole, which market had the most influence on the formation of the overall industry?
Q3
Looking at the emergence of the industry as a whole, identify a market that had the least influence on the formation of the overall industry?
Q4
Looking at the emergence of the industry as a whole, what two years were particularly turbulent with respect to musical innovation and the creation of new markets?
Q5
How many markets did the “Folk/Country Rock” market directly influence?
Q6
How many new markets did the “British Invasion” directly influence after 1959?
Q7
How many markets did the “Pop/Rock” market directly influence?
Q8
What are the three most recent influences of the Soft Rock market?
Q9
Which markets directly influenced the development of the Europop market?
Q10
If the originality of new musical market is determined by having the smallest number of identifiable influences (i.e., having few influences) which three of the 13 major markets are most original?
Q11
If the originality of new musical markets is determined by having the highest numbers of identifiable influences (i.e., drawing on and combining many diverse influences) which two of the markets emerging in 1966 are most original?
6 Results of 2-D Visualizations We used a between-subject experimental design where our independent variable was the graph visualization (wire - figure 2, boxy – figure 3, and curvy – figure 4). The dependent variables were the eleven questions. Table 2 shows the descriptive statistics for our measurements. We compared the means of the pooled performance scores using one-way analysis of variance (ANOVA). The score for each question was either “1” if the question was answered correctly or 0 otherwise. From the last column of table 2, we see that there are differences between the graphs and that these differences are statistically significant. The differences were for question 2 (F(2,72)=4.625, p<0.05), question 7 (F(2,72)=10.286, p<0.001), question 8 (F(2,72)=5.3617, p<0.01), and question 9 (F(2,72)=4.5, p<0.05). Then, we created a set of confidence intervals on the differences between the means of the levels of the “Graph” factor with the specified family-wise probability of coverage. Since our design is balanced (same number of observations in all three groups), we used Tukey’s “Honest Significant Difference” (HSD) method to create the confidence intervals. We executed Tukey’s HSD method at 95% confidence level for each of the questions with significant statistical differences (questions 2, 7, 8, and 9). For question number 2, Tukey’s HSD method revealed that the difference is between the “wire” and the “boxy” graphs. Looking at table 2, we see that all
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education
289
the subjects who used the “wire” graph got this question right (significance only at p<0.05). This could be explained because the “wire’ graph is so interconnected and the only node with many arrows coming out of it sits at the top of the graph. For question number 7, Tukey’s HSD method revealed that the difference is between two pairs: wire-boxy (p<0.001) and wire-curvy (p<0.001). Both graphs (boxy and curvy, figures 3 and 4 respectively) outperformed the wire graph. There is no significant difference between the boxy and the curvy graph however. Table 2 Performance measurement based on graph visualization
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11
wire M SD 0.8 0.41 1 0 0.96 0.2 0.8 0.41 0.28 0.46 0.24 0.44 0.04 0.2 0.4 0.5 0.96 0.2 0.68 0.48 0.52 0.51
boxy M SD 0.84 0.37 0.72 0.46 0.92 0.28 0.84 0.37 0.2 0.41 0.2 0.41 0.52 0.51 0.8 0.41 0.68 0.48 0.8 0.41 0.6 0.5
curvy M SD 0.44 0.76 0.88 0.33 0.84 0.37 0.8 0.41 0.32 0.48 0.32 0.48 0.52 0.51 0.72 0.46 0.64 0.49 0.64 0.49 0.6 0.5
F(2,72) 0.2416 4.625 1.0909 0.0845 0.4641 0.48 10.286 5.3617 4.5 0.8211 0.2105
p 0.786 <0.05 0.3414 0.919 0.6306 0.6207 <0.001 <0.01 <0.05 0.444 0.8107
M and SD represent the mean and standard deviation respectively. F and p are from the ANOVA analyses that compare the means of the answers in “wire”, “boxy”, and “curvy” graphs.
Broadly, both of the clustered graphs (boxy and curvy) outperformed the wire graph (see also table 2 for the mean values and standard deviation), however there were no significant differences between the boxy and the curvy graph suggesting they are equivalent. Overall, we believe that the improved performance is due to the clustering of the markets in the graphs.
7 Discussion of the 2-D Visualizations A core finding is that the boxy and the curvy graphs create improvements over the wire graph which suggests that clustering of information improves students’ ability to understand the visualized data. Even though we found some differences between the three graphs, there were no differences between the boxy and the curvy graphs. We tried to provide an improved visualization by varying the boxy graph and producing the curvy graph with no much success since we found no statistical differences between the two. To our disappointment, we found that none of the graphs enhanced our ability to answer all of the questions correctly. This can be seen by plotting the mean values for each question as shown in figure 5. However, broadly this analysis reveals several potential benefits in using visualization for entrepreneurship education and particularly for instruction on industry evolution. The boxy and curvy diagrams, which both use clustering, yield, on
290
E. Noyes and L. Deligiannidis
average, more accurate answers on the question set overall. This suggests, given a choice between these two general visualization types, a simple graph visualization and a clustered/layout optimized one, the latter will improve entrepreneurship learning outcomes with respect to comprehension of industry evolution and new market creation. Moreover, the clustered layouts were also shown to have less time invested for correct answers.
Fig. 5 Plot of means of all the questions
Fig. 6 Plot of time means of all the questions (t1 mean times to complete first 9 questions, t2 mean times to complete the last two questions)
An interesting question, one not examined here, is the possibility that the clustering layouts also improve long-term recall and comprehension about creative destruction. Different visualizations may, we suspect, have advantages in terms of memorability, both for the specific context and for broader insights about creative destruction. Only a small percentage answered questions 5 and 6 correctly. Perhaps we need to investigate alternative visualization techniques that would enable us to answer all the questions correctly. As shown in figure 6, on average the subjects using the curvy graph finished faster in answering the first 9 questions as well as the last 2 questions.
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education
291
8 3-D Visualization with IndustryViewer™ Motivated by our 2-D findings, we developed a 3-D visualization tool called IndustryViewer™ to visualize the interrelationships among the 13 major markets and 193 sub-markets in the industry. Analogous to the 2-D effort, we exploited the artist influences data (14,000+ ties) to array each artist longitudinally and crosssectionally in x-y-z space. The z-axis captures the historical timeline for when individual artists entered the industry, between 1951 and 2008. The x- and y-axes positions were determined through standard multidimensional scaling network algorithms that place actors with the most network ties at the center of a network visualization (i.e., those artists who are most commonly cited as creative influences to other artists). This method yields a result comparable to adding springembedding mechanisms in network simulations which give physical properties (spring-like pressures) to various network ties.
Fig. 7 IndustryViewer™. Front-view of two visualized markets shown in two colors. Recent artists are shown in the foreground, early artists are shown in the background
Our software has two chief components: data loading and data rendering. The data the software reads includes the artist name, their location within the network (i.e., x-y-z point cloud), and the major markets that they create music into. Colors are assigned to artists (nodes) based on their inclusion in certain markets. The colors were obtained by wrapping the 13 known subcategories evenly around an HSV color-wheel, and then calculating the result based on the angle around the wheel, and the hue at that particular angle. Point sprites with additive blending were used to render the cloud, where overlapping particles produce denser, brighter colors. A separate form, the Enabler with category checkboxes, described below, enables and disables major markets from being rendered, which allows the user to view market overlaps in any combination. Unlike the 2-D visualizations of market relationships, IndustryViewer™ is interactive and gives the user controls to explore that dataset and form judgments about the level of interrelationship (overlap) between markets. Figure 7 above shows the visualization environment. It consists of 2 windows: the Enabler where you select the markets to be visualized, and the Visualizer which renders the visualization. The floor and the up vector are drawn for reference.
292
E. Noyes and L. Deligiannidis
Fig. 8 Side-view of two visualized markets shown in two colors. Historical founding artists (i.e., influences) are shown on the left; later artists are shown on the right
To visualize the data properly, the user needs to rotate the graph to “feel” the depth, which is the year in our model. Figure 8 above shows a side view where a visualization user can see when the markets originated and which individual artists (colored points) created music for a given major market. Together, these two views allow a user to explore interrelationships among markets (do they overlap or not?), the birth or timing of different markets (are they early or late in the history of the industry?) and, most importantly, the interrelationships or blending of the markets (how the artists who created music in the respective markets do, or do not, share comparable influences). Figure 9 below shows a floating plane hiding information before a certain year. The user can control the position of this plane (which specifies the year) by using the slider in the Enabler window.
Fig. 9 A movable plane used to show which artists produced music in selected markets before and after a given year
2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education
293
While this browser shows colored points representing individual artists, the future plan is to “wrap” market spaces around participating artists (nodes) in a given market thereby creating separate, distinguishable, and also overlapping volumes. In summary, IndustryViewer builds on the findings with the 2-D visualizations to create an environment to examine the potential strengths and weaknesses of 3-D visualizations.
9 Conclusions This research shows that different two-dimensional visualizations of industry evolution have the potential to impact entrepreneurship learning outcomes. Specifically, two-dimensional visualizations may vary in their ability to facilitate the general comprehension of creative destruction, the specific comprehension of influences in new market creation, and the rates and dynamics of changing industry market structure. While the results presented here suggest several limitations, this is nonetheless a notable improvement over existing unsystematic, idiosyncratic representations currently used in entrepreneurship pedagogy. If a major pedagogical goal in entrepreneurship education is to highlight that understanding the dynamism of industry evolution is essential to the recognition of entrepreneurial opportunities, innovation and wealth-creation, then it is amply justified to study which visualizations of phenomena enable the strongest, fastest and most enduring comprehension of industry evolution. Broadly, the contribution to entrepreneurship education is the rigorous evaluation of applied representations of creative destruction compared to current ad-hoc teaching materials. The contribution to research on information visualization is an elaboration of a business context where key principles of information visualization (overview and zoom, macro-micro levels and small multiples) may aide students’ understanding of industry evolution. Our research also suggests that three-dimensional visualizations of industry evolution may also have the potential to impact entrepreneurship learning outcomes by adding interactivity and user-controls. This is particularly true if benefits exceed drawbacks in adding an additional dimension. We are in the process of finalizing IndustryViewer to enable visualization of the wrapped market volumes defined by the participating nodes. When the system is completed, we will examine the accuracy and speed of comprehension, the memorability of information, and potentially the ability of representations to relate industry dynamics across industries. Our 2-D findings provide a jumping off point to evaluate new results. Future research should evaluate other visualization approaches and their impact on entrepreneurship learning outcomes. This includes examination of dynamics in other fast-changing industries, such as the newly emerging nanotechnology, historical developments in the Personal Computer industry and ongoing developments in the Mobile industry. Broadly, there is an opportunity to systematize the visual treatment of entrepreneurship and innovation phenomena to improve learning outcomes and particularly comprehension of creative destruction by business students studying entrepreneurship.
294
E. Noyes and L. Deligiannidis
References [Card et al.1999] Card, S.K., Mackinlay, J., Shneiderman, B.: Readings in information visualization using vision to think. Morgan Kaufmann, San Francisco (1999) [Chiles et al. 2007] Chiles, T., Bluedorn, A., Gupta, V.: Beyond creative destruction and entrepreneurial discovery: A radical austrian approach to entrepreneurship. Organization Studies 28(4) (2007) [Christensen, 2001] Christensen, C.M.: The innovator’s dilemma. HarperCollins Publisher, NY (2001) [Christensen et al. 2004] Christensen, C., Scott, A., Roth, E.: Seeing what’s next: using the theories of innovation to predict industry change. Harvard Business School Press, Boston (2004) [DeFanti et al. 1989] DeFanti, T.A., Brown, M.D., McCormick, B.H.: Visualization: expanding scientific and engineering research opportunities. IEEE Computer 22(8), 12–25 (1989) [Deligiannidis et al. 2007] Deligiannidis, L., Kochut, K.J., Sheth, A.P.: RDF data exploration and visualization. In: Proc. of the ACM first Workshop on Cyber Infrastructure: Information Management in eScience, Lisbon, Portugal, pp. 39–46 (2007) [Deligiannidis et al. 2008] Deligiannidis, L., Hakimpour, F., Sheth, A.P.: Event visualization in a 3D environment. In: Proc. of Human System Interaction, Krakow, Poland, pp. 158–164 (2008) [Frasincar et al. 2006] Frasincar, F., Telea, A., Houben, G.J.: Adapting graph visualization techniques for the visualization of RDF data. Visualizing the Semantic Web, 154–171 (2006) [Jovanovic et al. 2006] Jovanovic, B., Tse, C.: Creative destruction in industries. NBER Working Paper No. W12520 (2006) [Kapler and Wright 2004] Kapler, T., Wright, W.: GeoTime information visualization. In: Proc. of IEEE InfoVis. (2004) [McCraw, 2009] McCraw, T.K.: Prophet of innovation: Joseph Schumpeter and creative destruction. J. of Economic History 69(1), 324–325 (2009) [Paruchuri et al. 2009] Paruchuri, S., Baum, J., Potere, D.: The Wal-Mart effect: wave of destruction of creative destruction? Economic Geography 85(2), 209–236 (2009) [Schumpeter, 1934] Schumpeter, J.: The theory of economic development. Harvard University Press, Cambridge (1934) [Tangpong et al. 2009] Tangpong, C., Islam, M., Lertpittayapoom, N.: The emergence of business-to-consumer ecom-merce: new niche formation, creative destruction, and contingency perspectives. J. of Leadership & Organizational Studies 16, 131–140 (2009) [Taowei and Bijan 2006] Wang, T.D., Parsia, B.: CropCircles: topology sensitive visualization of OWL class hierarchies. In: Proc. of 5th International Semantic Web Conference (2006)
Employing a Biofeedback Method Based on Hemispheric Synchronization in Effective Learning K. Kaszuba and B. Kostek Multimedia Systems Department, Gdansk University of Technology, Gdansk, PL {katkasz,bozenka}@sound.eti.pg.gda.pl
Abstract. The following Chapter presents a new approach to effective learning by employing a biofeedback method based on hemispherical synchronization. The application proposed uses a wireless EEG (electroencephalography) recording system of the user’s brain waves and powerful signal processing and classification to produce a reliable feedback. Alpha and beta brain rhythms are analyzed by applying DWT (Discrete Wavelet Transform) and by calculating the statistics for each analyzed window. EOG (electrooculogram) artifacts are eliminated from the signal through adaptive filtration in the time-frequency domain. Three different learning methods are implemented in the proposed application: mind map, flash cards and non-linear notes. Several tests are performed with the users. Based on the brain feedback information and the user’s learning profile test results, an optimized learning method is chosen for an individual user. Information about hemispherical synchronization provides vital information for system adjustments. The results obtained show a difference between traditional learning and one using a feedback loop, indicating that synchronized hemispheres improve learning abilities. In conclusion the critical evaluation of the method is given.
1 Introduction Currently biofeedback–based applications are still an uncertain area of research. It seems to be almost impossible to obtain a reliable feedback loop and provide comfortable conditions for the user. Since medical applications force the user to avoid any movement and minimize external stimuli, they cannot be classified as userfriendly. On the other hand, commercial applications do not guarantee quality feedback because recorded signals are corrupted with muscle activity and other movement artifacts. Applying a powerful pre-processing module to filter those artifacts results in worsening the time resolution of the system but still this is the only possible method to produce a compromise between both approaches.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 295–309. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
296
K. Kaszuba and B. Kostek
In this Chapter such a method is presented as a basis for effective learning application. Information about hemispherical synchronization is used for creating a feedback loop. First of all, cortical activity during the synchronization process is described. Knowledge on hemisphere activity during various exercises provides necessary information on how to divide signals into independent data sets. The next Section describes problems that occur when creating a user-friendly biofeedback application. A few of such solutions are given and critically summarized. The optimum solution is then described and a focus is placed on electrooculogram artifact reduction. Methods of signal acquisition and pre-processing are studied and detailed knowledge about the results is provided. The core of this study consists in feature extraction and classification methods. Section 5 focuses on the parameterization process, with an emphasis on band-pass filtration and discrete wavelet transform (DWT). To obtain the classification result of a highly efficient hemispheric state three algorithms were tested and presented The next two Sections focus on a presentation of the engineered application. Implemented methods of an effective learning process are described and a full description of feedback information is given. The proposed application should create an individual learning profile, while testing and updating is based on feedback information. All results are summarized with a critical conclusion. The proposed method is verified and its advantages and disadvantages are given.
2 Cortical Activity in Hemispheric Synchronization A state of hemispheric synchronization occurs extremely rarely during ordinary daily activities. Typically people use both the left and right hemisphere alternatively and usually one hemisphere is dominating. In the case of a learning process most people use only the left hemisphere. During the day the synchronization state may naturally appear only after just waking up and then before falling asleep. Still, those few seconds are not sufficient for hemispheric synchronization to be used in an application. However, by proper stimulation and training it is possible to force both hemispheres to cooperate [Sanei and Chambers 2008]. It should be remembered that the cerebral brain signal observed by EEG typically falls in the range of 1–30 Hz (activity below or above this range is likely to be artifactual, under standard clinical recording techniques). Rhythmic bands comprise of: delta, theta, alpha (two or three alpha sub-bands), beta (low beta and high beta waves), gamma and Mu wave patterns (gamma and Mu waves are higher than 30 Hz). All the above mentioned brainwave frequency are present in brain, however the dominant frequency in the EEG pattern determines the state of the brain. Alpha rhythm, often referred to as alpha activity, reflects the speed of cognitive processes and memory performance, on the other hand beta waves are correlated to a state of concentration.
Employing a Biofeedback Method Based on Hemispheric Synchronization
297
To achieve synchronization state it is crucial to activate different cognitive brain areas at the same time. Since the left hemisphere is known to be responsible for such fields as logical thinking, speech, mathematical skills and number coding or literality (understanding of the exact word meaning) and the right hemisphere focuses on the tasks of visualization, holistic thinking, metaphors and emotions, the essence of synchronization is the ability to join these features together. The simplest example of an effective learning exercise using hemisphere synchronization is a widely known method which associates numbers from 1-10 with different pictures. Associations are presented in Table 1. As can be seen, all symbols correspond with the shapes of each number. While the left hemisphere memorizes numbers, the right one preserves information about shapes and their symbolical representation. Such an approach could be used for the quick and effective learning of number chains [Kemp et al. 1991]. Table 1 Association of numbers and object shapes [Sanei and Chambers 2008] No.
Symbol
1
Candle
2
Swan
3
Apple
4
Chair
5
Hook
6
Cherry
7
Scythe
8
Snowman
9
Balloon
10
Sword and shield
3 Biofeedback in User-Friendly Application Until recently biofeedback applications were mainly used in medical clinics along with a traditional EEG recording headset. Such solutions use from 21 to over hundred electrodes located on the subject’s sculp. Sensors are attached to specialist equipment via flexible wires and secured on the head by applying them with an electrolyte interface. The user must remain completely motionless during the usage of such applications. No external stimulus should be present. The development of modern technologies provided new wireless systems of the EEG recording. In such a solution the number of sensors varies between one to fourteen. Typical wireless systems can provide up to four electrodes, usually located on the subject’s forehead. Such an approach does not guarantee good quality of the EEG
298
K. Kaszuba and B. Kostek
signal since recorded data may be heavily corrupted with the EOG artifacts such as eye-blinking, sweating or head movement [Sanei and Chambers 2008]. Many of the currently developed biofeedback applications do not distinguish between a real EEG signal that comes from brain activities and the EMG – a signal produced by muscle electrical activity. Often in such applications the user performs mental activity along with performing the activity itself, e.g. when thinking about lifting a virtual object the user should also lift his hand or head. As a result no brain signal is actually analyzed since noise generated by muscle movements is too big. The developed application for effective learning uses a four wireless electrode system. Advanced filtration techniques are used to acquire a compromise between the quality of the feedback and the comfort of the user. Due to a powerful adaptive filtration module there is no need for a user to remain motionless or avoid blinking. It is still suggested to avoid external stimuli such as talking to other people or rotating on a chair. Since the system uses only four electrodes, the main idea is to observe whether activities of each hemisphere appear without determining the exact cortical activity area.
4 Biofeedback Signal Acquisition and Pre-processing The recordings were performed for four individual subjects – three men and one woman with an average age of 24 [Kaszuba et al. 2010b]. The experiment took place in the recording studio of Multimedia Systems Department (MSD) of the Gdansk University of Technology to provide good isolation from the external environment. The four wireless electrode system called Starlab Enobio was used to provide maximum comfort to the user. Active electrodes were located on the subjects’ foreheads in positions applied for 10-20 standard electrode settings for 75 electrodes, i.e.: AF7, Fp1, Fp2, AF8 and the reference sensors were placed on A1 and A2 locations. Positions of electrodes are presented in Fig. 1 [Sanei and Chambers 2008]. Data acquisition was performed according to the following protocol: 1. mathematical calculation – eyes open – the beta waveband was expected to dominate, 2. 20 minutes listening to classical music – eyes closed – stimulation of alpha waves, 3. mathematical calculation - eyes open - hemispherical synchronization expected to be observed. The AF7 and AF8 electrodes were treated as the referential EOG signal, while recordings from Fp1 and Fp2 electrodes were treated as the corrupted EEG signal. With such an approach it was possible for the authors to apply adaptive filtration to eliminate EOG artifacts from the acquired raw signal.
Employing a Biofeedback Method Based on Hemispheric Synchronization
Fp1
299
Fp2 AF8
AF7
A1
A2
Fig. 1 The location of electrodes in the EEG recording system
Fig. 2. illustrates details of the adaptive filtration process. Recorded EEG/EOG Reference EOG EOGref
EEG/EOG
swt(EEG/EOG)
Stationary wavelet transform Decomposition level: 8 Function: Symlet 3 swt(EOGref) swt(EEG/EOG) RLS Adaptive Filtration
Signal substraction swt(EEG)
swt(EOG)
Inverse wavelet transform EEG Fig. 2 Diagram of the adaptive filtration process
300
K. Kaszuba and B. Kostek
The RLS (Recursive Least Square) adaptive filtration was used in the timefrequency domain. For both EEG and EOG signals, stationary wavelet transform with the symlet 3 function and decomposition level 8 was performed. Employing the approach proposed by Samar et al. [1999] made possible for the authors of this Chapter to extract the EOG signal from the corrupted EEG and then to subtract it from the EEG signal producing the stationary wavelet transform of the EEG data. The final step was to apply an inverse wavelet transform and to reconstruct the signal. The operation results in two channels of pure signals which were then used in the brain source localization method as proposed by Senthil et al. [2009]. For the source localization the minimum norm estimate (wMNE) was used. Using only two electrodes in the wMNE method only gives raw information about the brain activity. The results of this operation were only used to confirm the assumptions presented in the recording protocol. The recordings for which expectations were not fulfilled were excluded from the training data set. The computation was performed using the open source Brainstorm 3 application [Mosher et al. 2005]. Fig. 3. presents the fMRI visualization of data for which synchronization occured.
Fig. 3 fMRI view for source localization of data in which synchronization appeared
Employing a Biofeedback Method Based on Hemispheric Synchronization
301
It may be easily noticed from Fig. 3 that the light blue and yellow areas could be observed in both hemispheres which means that synchronization occurred [Khemakhem et al. 2009].
5 Signal Parameterization and Classification It was crucial for the designed biofeedback application to choose the information that would be used as a feedback signal. In this case information whether the hemispherical synchronization occurred was treated as such. The hemispherical synchronization was analyzed by applying Discrete Wavelet Transform with the symlet 3 function to alpha (8-12Hz) and beta (12-30Hz) bands and to the entire frequency band (0.5-44Hz) in a 1 s time window without overlapping. In addition, the beta rhythm was divided into a lower beta range (12-16 Hz) and a higher beta range (16-30 Hz). Such an approach [Samar et al. 1999], [Sanei and Chambers 2008] may guarantee that a positive concentration state and stress presence are analyzed separately and can be distinguished from each other when hemispheric synchronization is occurring. For each time frame approximation wavelet coefficients (CA) and detailed wavelet coefficients (CD) are calculated. The approximation coefficients carry no valid information since they describe this part of data which were band pass filtered from the signal. From the set of coefficients four statistical values are calculated: mean, maximum, minimum and variation. This results in a set of 20 features different for each channel in every 1s time period. Three independent classifiers were trained by the authors for the obtained data set. The simplest chosen classifier was the One Rule classifier [WWW-1 2009] that picks only one feature from the parameter vector and on its base produces the most probable rule for classification. The minimum error attribute was chosen as a criterion in this classifier. The second classifier used was the LADTree [WWW-1 2009] – a classifier which combines a multi-class decision tree method with the logistic regression algorithm. The regression algorithm is used for establishing the split for a tree. The last classifier that was chosen to be implemented in the developed application is the Logistic Model Tree (LMT) [WWW-1 2009], which is also a combination of a decision tree algorithm and regression method, however in this approach each node of the tree contains a full regression model for a specific feature [ Kaszuba et al. 2010a]. All features were globally normalized using the min-max normalization procedure for training classifiers. The test results performed with the cross-validation procedure showed a high efficiency for each of the chosen methods. Tables 2-4 presents the confusion matrix for each classifier [Kaszuba et al. 2010b].
302
K. Kaszuba and B. Kostek Table 2 The confusion matrix for the LMT classifier in a cross-validation procedure synch
no synch
synch
0.996
0.004
no synch
0.045
0.955
Table 3 The confusion matrix for the LADTree classifier in a cross-validation procedure synch
no synch
synch
0.996
0.004
no synch
0.034
0.966
Table 4 The confusion matrix for the One Rule classifier in a cross-validation procedure synch
no synch
synch
0.999
0.001
no synch
0.071
0.929
6 An Application for Effective Learning Since it is commonly known in rich literature on hemispheric synchronization that such a state helps in very efficient learning, the next step of this study was to develop an application that can use this information as a feedback signal in a learning process. An effective learning process can usually be observed when both hemispheres work simultaneously. In terms of signals a decrease in the beta band and an increase in the alpha band is the most desired state to achieve as it constitutes an evidence of hemisphere synchronization. Depending on individual features some people learn best by listening, while others prefer watching pictures or just reading and writing. Therefore several learning methods that take those facts under consideration were reviewed and examined, first. Such methods usually force alternating brain areas (from left and right hemisphere) to work at the same time. For effective learning application three different methods were implemented. They included mind map creation, flash cards and non-linear notes. The mind map is a solution which allows turning a learning topic into a net structure containing a single problem at each node along with visual and sound symbols that helps to memorize knowledge. Nodes are arranged in such way that the central one contains the most general topic, while those further from the center refer to the more detailed information. The flash card method helps to convert each problem into a set of questionanswer statements. Each card contains a question and an answer together with an appropriate graphical symbol and optional sound information.
Employing a Biofeedback Method Based on Hemispheric Synchronization
303
The non-linear notes method is mainly addressed to people who prefer learning via reading and writing techniques. The solution is based on working with text – highlighting fragments, adding a keyword to a glossary, creating extra notes and questions on the margin of the notes. Figs. 4- 8 present the logging and learning application panels.
Fig. 4 Start panel of the application
The developed application creates an individual profile for each user. In the profile information about learning topics, an individual learning profile and the feedback information are stored. There are four learning profile available: aural, visual, reading/writing and kinesthetic. The user can easily add a new learning session to start memorizing a new subject or to edit an already existing session. The duration of each session for different subjects depends on three features: time left till the chosen deadline, priority of the topic and the attitude the user has towards the topic. Time is calculated in minutes according to the formula: 30 min if tleft > 14 d 60 min if t > 14 d and t < 7 d left left 20 min if p is very high − 20 min if a is very positive 90 min if tleft < 7d and tleft > 4 d − 10 min if a is positive 10 min if p is high + + t session = 60 min+ 105 min if tleft >= 4 d − p rather low 10 min if is 120 min if t >= 3 d 10 min if a is negative left 20 min if p is low 20 min if a is very negative 150 min if t left >= 2 d 180 min if t >= 1 d left
where: tleft – is time given in days left till the selected deadline. tduration – is the duration length in minutes. p – is the priority of session a – is an attitude to the subject learnt d – stands for days
(1)
304
K. Kaszuba and B. Kostek
Fig. 5 Session management panel
Fig. 6 Learning by the mind map method
Employing a Biofeedback Method Based on Hemispheric Synchronization
305
The type of the feedback could be chosen. Information about the lack of synchronization could be given via visual annotation or signalized by an audio signal. In addition, the default audio feedback signal should support the synchronization state, therefore classical music may be used as such a signal.
Fig. 7 Learning by the flash card method
Fig. 8 Learning by the non-linear notes method
306
K. Kaszuba and B. Kostek
7 Learning Optimization for an Individual User The most interesting feature of the application is its ability to update a learning profile based on the feedback information. When a user is working for the first time with the application he/she must fill in the profile test and according to the test results the optimum learning method is suggested. Still it is strongly recommended to try all possible methods during the first session, since the application is monitoring brain states and can produce statistics about hemispherical synchronization for each learning method. Those statistics are then used to determine the optimum learning method for a user. An example of such statistics is presented in Fig. 9.
Fig. 9 Statistics generated by the application
All statistics are calculated as absolute values. While creating a new session the user is obliged to enter the deadline - the date when the session should end, the priority of each session and personal attitude towards the topic. Based on those information the length of the learning session is produced. The length of the session gives information about the amount of possible feedback signals. Positive and negative feedback signals are counted for each method. Absolute values are obtained according to the following formula: stats i =
xi
x
100%
i
i
where: i - number of the methods chosen (mind map, flash cards, non-linear notes) statsi – synchronization statistics for the ith method xi- duration in seconds when synchronization occurred.
(2)
Employing a Biofeedback Method Based on Hemispheric Synchronization
307
Statistics are then stored in the database for up to one week. Those long term statistics are used for the profile update. Another valid feature that should be pointed out is that the application itself can suggest when a break in the learning process should be done. This helps avoiding the situation when the user keeps on repeating or reading the same piece of information endlessly without memorizing it. The learning session could end (be interrupted) with two scenarios. First, when its calculated time elapses and second, if there is no synchronization in a 15minute period. If the second case occurs the suggestion of short break is indicated to the application user.
8 Case Study To eliminate possible code errors and validate provided learning methods the software was tested on one person. It should be mentioned that the hardware part of the application engineered has several limitations. For example it is impossible to obtain reliable feedback when the temperature is higher than 30oC due to sweating signals are too noisy in this case, and valuable information may be lost. When environmental conditions are well-controlled, the application was able to produce a real-time feedback. The eye blinking and extra movements have low influence on the classification results. The headset used usually required about ten minutes to calibrate the electrodes, still for some measurements this time was extended even twice. This drastically reduced the comfort of the application usage. Since the examined user was classified as the one with kinesthetic learning profile, it was possible to check effectiveness of all three implemented methods, described in Section 6. During test the following scenarios were examined: • learning with all three methods with audio and visual feedback together with break indication; • learning with all three methods with audio and visual feedback without break indication; • learning with only visual feedback. Hemispherical synchronization state was observed during learning with the application engineered, still it was not an easy task to obtain this state. This caused that breaks were indicated too often. An additional information about concentration state should be supplied to classifier to avoid such a situation. The best results were observed with the presence of the audio feedback. Giving an extra stimulus when the lack of the synchronization was stated, helped the user to focus and recover the synchronization state. On the other hand, putting headphones on the EEG headset was rather uncomfortable and during longer learning session, it tired the user.
308
K. Kaszuba and B. Kostek
Learning with the methods implemented have shown better memorizing of material. Knowledge gathered with mnemotechniques was much easier recalled even after two weeks after the end of the training session, while material which at the same time was learned with traditional methods tended to be forgotten more easily. The statistics generated during the week the session took place confirmed the profile test results. Fig. 10 presents the trend in which the statistics changed during one week. Still as all of the examined methods were strictly dedicated to the kinesthetic profile these results should only be treated as the demonstration of the system functionality.
Fig. 10 Trend of statistics during one week
9 Conclusion Advanced pre-processing methods may provide a stable signal that can be used for the classification of an efficient synchronization state. Using one signal as a reference and filtering it from useful data gives an opportunity to create fully functional user-friendly software that may help in the learning process. Applied techniques of learning can increase the effectiveness of the learning process. Applying a feedback loop based on information about the synchronization state gives good results on some subjects while it may fail with others. In particular, people with a kinesthetic learning profile cannot fully take advantage of this version of the application engineered. In this case the method should be applied together with at least a 21 electrode EEG system and a full localization of active cortical areas to obtain more reliable and efficient results. The application ability of self-adaptation for an individual user is definitely its biggest advantage that should be developed in future versions of biofeedback software. Signalization of a break in the learning session seems to be a very good solution, still some additional methods of diagnosing concentration may be added, since it often occurs that the user cannot obtain a synchronization state in the first 15 minutes of the work.
Employing a Biofeedback Method Based on Hemispheric Synchronization
309
Acknowledgment Research funded within project No. POIG.01.03.01-22-017/08, entitled "Elaboration of a series of multimodal interfaces and their implementation to educational, medical, security and industrial applications". The project is subsidized by the European regional development fund of the and by the Polish State budget.
References [Kaszuba et al. 2010a] Kaszuba, K., Kopaczewski, K., Odya, P., Kostek, B.: Biofeedbackbased brain hemispheres synchronizing employing man-machine interface. Inter J. of Artificial Intelligence Tools, Intelligent Decision Technologies: An International Journal 6, 59–68 (2010) [Kaszuba et al. 2010b] Kaszuba, K., Kopaczewski, K., Odya, P., Kostek, B.: Brain hemispheres synchronization using biofeedback techniques. Gdańsk University of Technology Faculty of ETI Annals 8, 195–201 (2010) [Kemp et al. 1991] Kemp, B., Varri, A., da Rosa, A., Nielsen, K.D., Gade, J., Penzel, T.: Analysis of brain synchronization based on noise-driven feedback models. Annual International Conference of the IEEE Engineering in Medicine and Biology Society (13), 2305–2306 (1991) [Khemakhem et al. 2009] Khemakhem, R., Zouch, W., Hamida, B.A., Taleb-Ahmed, A., Feki, I.: Source localization using the inverse problem methods. IJCSNS International Journal of Computer Science and Network Security 0(4), 408–414 (2009) [Mosher et al. 2005] http://neuroimage.usc.edu/brainstorm (accessed November 11, 2010) [Samar et al. 1999] Samar, V.J., Bopardikar, A., Rao, R., Schwarz, K.: Wavelet analysis of neuroelectric waveforms: A conceptual tutorial. Brain and Language 66, 7–60 (1999) [Sanei and Chambers 2008] Sanei, S., Chambers, J.A.: EEG signal processing. Centre of Digital Processing, Cardiff University, UK (2008) [Senthil et al. 2009] Senthil, K.P., Arumuganathan, R., Sivakumar, K., Vimal, C.: An adaptive method to remove ocular artifacts from EEG signals using wavelet transform. J. Applied Sciences Research, 741–745 (2009) [WWW-1 2009] http://www.cs.waikato.ac.nz/ml/weka/ (accessed October 5, 2010)
Comparison of Fuzzy and Neural Systems for Implementation of Nonlinear Control Surfaces T.T. Xie, H. Yu, and B.M. Wilamowski Department of Electrical and Computer Engineering, Auburn University, Auburn, AL, USA
[email protected],
[email protected],
[email protected]
Abstract. In this paper, a comparison between different fuzzy and neural systems is presented. Instead of using traditional membership functions, such as triangular, trapezoidal and Gaussian, in fuzzy systems, the monotonic pair-wire or sigmoidal activation function is used for each neuron. Based on the concept of area selection, the neural systems can be designed to implement the identical properties like fuzzy systems have. All parameters of the proposed neural architecture are directly obtained from the specified design and no training process is required. Comparing with traditional neuro-fuzzy systems, the proposed neural architecture is more flexible and simplifies the design process by removing division/normalization units.
1 Introduction Traditional methods, such as PID (Proportion-Integration-Differentiation) algorithm, are relatively helpful to design linear control systems, but they are in trouble if the system has nonlinear properties [Farrell and Polycarpou 2008]. Unfortunately, most systems in practice are nonlinear. For some nonlinear systems, by adding a reverse nonlinear function to compensate for the nonlinear behavior of the system, the input-output relationship would become approximately linear. In those cases, the nonlinear problems can still be solved by the well developed linear control theory. Otherwise, it is necessary to apply an adaptive change to satisfy the nonlinear behavior of the systems. These adaptive systems are best handled with fuzzy systems and neural networks [Wilamowski 2002; Wilamowski and Binfet 2001]. In this paper, various fuzzy and neural systems are studied. The proposed neural architecture, using the concept of area selection in neural networks, is introduced and compared with classic fuzzy systems and traditional neuro-fuzzy systems. The comparison is based on the function approximation problem. The purpose of the problem is: using the given 25 points (Fig. 1b) to approximate the 1600 points (Fig. 1a) in the same range. All the required points satisfy the Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 313–324. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
314
T.T. Xie, H. Yu, and B.M. Wilamowski
relationship as described by equation (1) and the approximation will be evaluated using the SSE (sum-square-error) of the 1600 points. f(x,y)=exp(-0.12(x – 9)2 – 0.12(y – 5)2)
(a)
(1)
(b)
Fig. 1 Surfaces obtained from equation (1): (a) desired surface (40×40=1600 points); (b) given surface (5×5=25 points)
2 Fuzzy Systems There are two commonly used architectures for fuzzy systems development. The one is proposed by Mamdani [Mamdani 1974; McKenna and Wilamowski 2001], as shown in Fig. 2 and the other in Fig. 5 is proposed by Takagi, Sugeno, and Kang (TSK)[Takagi and Sugeno 1985; Wilamowski and Binfet 1999]. 2.1 Mamdani Fuzzy System As shown in Fig. 2 below, Mamdani fuzzy systems mainly consist of three parts: fuzzifiers, fuzzy rules and defuzzifiers.
Fig. 2 Architecture of Mamdani fuzzy system
Comparison of Fuzzy and Neural Systems for Implementation
315
In order to design Mamdani fuzzy systems, the first step is to do fuzzification on inputs, which means to convert the analog inputs into sets of fuzzy variables using fuzzifiers. For each analog input, several fuzzy variables, with values between 0 and 1, are generated and the sum of them should be 1. There are various types of fuzzification methods, such as triangular, trapezoidal, Gaussian, sine, parabola, or any combination of them. Triangular and trapezoidal membership functions are the simplest and in most practical cases, acceptable results can be obtained with these two approaches. More membership functions can be used for higher accuracy; however, too many membership functions causes frequent controller action (known as “hunting”), and may lead to system instability. In the given problem in Fig. 1, we will use 10 membership functions (5 for each direction) and both triangular and trapezoidal membership functions are used, as shown in Fig. 3.
(a)
(b)
Fig. 3 Membership functions used for fuzzification: (a) x-direction; (b) y-direction
After fuzzification, the following step is to perform fuzzy rules on fuzzy variables. Fuzzy logic rules have MIN and MAX operators, which can be treated as the extended Boolean logic. For binaries “0” and “1”, the MIN and MAX operators in fuzzy logic rules perform the same calculation as the AND and OR operators in Boolean logic, respectively (Table 1); for fuzzy variables, the MIN operator is to get the minimum value and the MAX operator is to get the maximum value (Table 2). Table 1 Fuzzy and Boolean logic rules for binaries Binaries 0 0 0 1 1 0 1 1
MIN 0 0 0 1
AND 0 0 0 1
MAX 0 1 1 1
OR 0 1 1 1
Table 2 Fuzzy logic rules for fuzzy variables Fuzzy variables 0.3 0.6 0.7 0.3 0.5 0.4 0.1 0.8
MIN 0.3 0.3 0.4 0.1
MAX 0.6 0.7 0.5 0.8
316
T.T. Xie, H. Yu, and B.M. Wilamowski
The last step is defuzzification, which converts the results of “MAX of MIN” operations to an analog output value. There are several defuzzification schemes used and the most common is the centroid type of defuzzification. For Mamdani fuzzy system, the result surface of the given problem could be obtained as
Fig. 4 Result surface obtained using Mamdani fuzzy system; SSE= 6.3555
2.2 TSK Fuzzy System Fig. 5 shows the architecture of TSK fuzzy system, and it also consists of three parts: fuzzification, fuzzy rules and normalization. The fuzzifiers and fuzzy rules are almost the same as are used in Mamdani fuzzy system. The difference is that, unlike the “MAX of MIN” defuzzification in Mamdani fuzzy systems, the TSK fuzzy systems do not require MAX operators; instead, a weighted average is applied directly to the results of MIN operators. The TSK fuzzy architecture is much simpler than Mamdani architecture, because the output weights are proportional to the average function values at the selected regions by MIN operators.
Fig. 5 Architecture of Mamdani fuzzy system
Fig. 6 shows the result surface using TSK fuzzy system; one may notice that it is more accurate than the result obtained by Mamdani architecture (Fig. 4).
Comparison of Fuzzy and Neural Systems for Implementation
317
Fig. 6 Result surface obtained using TSK fuzzy system; SSE= 2.2864
3 Neural Networks Neural networks are considered as another way to implement nonlinear controllers [Narendra and Parthasarathy 1990]. A neural system is made up of neurons, between with weighted connections. For a given neuron, the relationship between the sum of weighted inputs and the output is presented by an activation function. The activation function is monotonic, and it can be sigmoidal, linear or other shapes [Wilamowski and Yu 2010; Yu and Wilamowski 2009]. It has been proven that neural networks can be used for any function approximation. For the given problem in Fig. 1, Figs. 7, 8 and 9 show the result surfaces using different number of neurons with fully connected cascade (FCC) networks. Each neuron uses unipolar sigmoidal activation function and neuron-by-neuron (NBN) algorithm is used for training.
(a)
(b)
Fig. 7 (a) Two neurons in FCC network; (b) Result surface with SSE= 5.1951
318
T.T. Xie, H. Yu, and B.M. Wilamowski
(a)
(b)
Fig. 8 (a) Three neurons in FCC network; (b) Result surface with SSE= 0.9589
(a)
(b)
Fig. 9 (a) Four neurons in FCC network; (b) Result surface with SSE= 0.0213
It could be seen that, with only three neurons (Fig. 8), neural networks can get more accurate results than those from fuzzy systems above. However, neural networks require training/optimization process and it is complex. The neural network training tool “NBN 2.10” used in this paper is downloaded from website: http://www.eng.auburn.edu/~wilambm/nnt/index.htm.
4 Neuro-Fuzzy Systems The neuro-fuzzy systems inherit properties from both fuzzy systems and neural networks. They attempt to further improve fuzzy systems by replacing fuzzifiers, MAX and MIN operators with weighted sum approaches [Masuoka et al 1990]. Compared with traditional neural networks, it has the advantage that all the parameters are designed and no training process is required.
Comparison of Fuzzy and Neural Systems for Implementation
319
4.1 Traditional Neuro-Fuzzy System The neuro-fuzzy system in Fig. 10 consists of four layers. The first layer is used for inputs fuzzification, the same process as in classic fuzzy systems. The second layer performs fuzzy variables multiplications, instead of fuzzy logic operations. The multiplication may be helpful to smooth the result surfaces, but also causes more computations. The third and fourth layers perform weighted averages which are similar to the normalization process in TSK fuzzy systems.
Π Π Π
Π
Fig. 10 Architecture of traditional neuro-fuzzy system
Fig. 11 shows the result surface using the traditional neuro-fuzzy system. Even though a smaller approximation error is obtained, the neuro-fuzzy architecture is not recommended because the computation becomes more complex than the classic fuzzy systems.
Fig. 11 Result surface obtained using traditional neuro-fuzzy system in Fig. 10; SSE= 1.9320
The architecture in Fig. 10 attempts to present a fuzzy system in a form of neural network. However, it is different from neural network, because the units inside perform signal multiplication or division, rather than activation functions as neurons do.
320
T.T. Xie, H. Yu, and B.M. Wilamowski
4.2 Neuro-Fuzzy System without Normalization In neural systems, a single neuron can separate the input space by line, plane or hyper-plane, depending on the input dimensionality. In order to select a region in n-dimension space, more than (n+1) neurons should be used. For example, in order to select a rectangle area in two dimensional space (Fig. 12a), at least 5 neurons are required and the neural network can be designed as shown in Fig. 12b.
(a)
(b)
Fig. 12 Area selection using neural networks: (a) desired rectangular area; (b) neural network implementation with step function as activation functions
With this area selection concept, fuzzifiers and fuzzy logic rules (MIN and MAX operators) used for region selection can be replaced by simple neural network architecture, similar as shown in Fig. 12b.
Fig. 13 Two-dimensional input plane separated vertically and horizontally by 6 neurons in each direction, obtained with 25 selection areas
Comparison of Fuzzy and Neural Systems for Implementation
321
For the given problem, there are two analog inputs and each input has 5 membership functions (Fig. 3). The two fuzzifiers and fuzzy logic rules can be represented by 12 neurons (line a to l) in the first layer and 25 neurons (area 1 to 25) in the second layer, as shown in Fig. 13 and Fig. 14.
Fig. 14 Architecture of the proposed neuro-fuzzy system for the given problem in Fig. 1
Fig. 14 shows the architecture of the proposed neuro-fuzzy system [Xie et al 2010]. The first layer has 12 neurons, and each neuron presents a straight line, from a to l. The weight values on the inputs are all 1 and the thresholds of neurons depend on the intersection of the lines. The activation functions of the neurons in the first layer are shown in Fig. 15a, and can be mathematically described by 1 x − a f ( x) = b − a 0
x≥b
(2)
a<x
By setting the weight values on the inputs of the second layer to 1 or -1, the activation function can be combined to implement the triangular and trapezoidal membership functions, as shown in Fig. 15b and Fig. 15c.
(a)
(b)
(c)
Fig. 15 (a) activation function of each neuron in the first layer; (b) implementation of trapezoidal membership function; (c) implementation of triangular membership function
322
T.T. Xie, H. Yu, and B.M. Wilamowski
The second layer has 25 neurons, and each of them presents a selected area as shown in Fig. 13. All the neurons have threshold 3 in the second layer. Fig. 16 below gives the outputs of particular neurons in the first and second layers.
(a)
(b)
Fig. 16 (a) output of the neuron b in the first layer; (b) output of the neuron 17 in the second neuron
The third layer has only one neuron with linear activation function. The weight values on the inputs of the third layer are set as the expected values in the corresponding areas. With this architecture, using the area selection strategy in Fig. 13, the result surface for the given problem can be obtained as shown in Fig. 17.
Fig. 17 Result surface obtained using the proposed neuro-fuzzy system in Fig. 14; SSE= 1.9320
One may notice that both neuro-fuzzy architectures in Fig. 10 and Fig. 14 obtain the same error, but the later one is much simpler since there is no normalization process.
Comparison of Fuzzy and Neural Systems for Implementation
323
5 Discussion and Conclusions This paper studied the design process of both fuzzy and neural systems, and gave a practical function approximation problem as an example. The comparison results are presented in Table 3. Table 3 Comparison of different architectures for the given problem in Fig. 1 Architectures
SSE
Normalization
Training
Mamdani fuzzy
6.3555
NO
NO
TSK fuzzy
2.2864
YES
NO
Traditional neuro-fuzzy (Fig. 10)
1.9320
YES
NO
Proposed neuro-fuzzy (Fig. 14)
1.9320
NO
NO
2 neurons in FCC networks
5.1951
NO
YES
3 neurons in FCC networks
0.9589
NO
YES
4 neurons in FCC networks
0.0509
NO
YES
From the comparison results, it can be concluded that fuzzy systems get rough results, but are easier to design; while neural networks can get very precise approximation, but require training process (optimization). Both neuro-fuzzy architectures (Fig. 10 and Fig. 14) got the same errors in the function approximation problem; however, the architecture in Fig. 14 does not require normalization process and is much simpler than the traditional architecture in Fig. 10. In the case when only design rules have to be used and optimization is not desired, the neuro-fuzzy architecture in Fig. 14 can replace the classic fuzzy systems and traditional neuro-fuzzy architecture in Fig. 10.
References [Farrell and Polycarpou 2008] Farrell, J.A., Polycarpou, M.M.: Adaptive approximation based control: unifying neural, fuzzy and traditional adaptive approximation approaches. IEEE Trans. on Neural Networks 19(4), 731–732 (2008) [Mamdani 1974] Mamdani, E.H.: Application of fuzzy algorithms for control of simple dynamic plant. IEEE Proceedings 121(12), 1585–1588 (1974) [Masuoka et al 1990] Masuoka, R., Watanabe, N., Kawamura, A., et al.: Neurofuzzy system-fuzzy inference using a structured neural network. In: Proc. of the Int. Conf. on Fuzzy Logic & Neural Networks, Hzuka, Japan, pp. 173–177 (1990) [McKenna and Wilamowski 2001] McKenna, M., Wilamowski, B.M.: Implementing a fuzzy system on a field programmable gate array. In: Int Joint Conf. on Neural Networks, Washington DC, pp. 189–194 (2001) [Narendra and Parthasarathy] Narendra, K.S., Parthasarathy, K.: Identification and control of dynamical systems using neural networks. IEEE Trans. on Neural Networks 1(1), 4–27 (1990)
324
T.T. Xie, H. Yu, and B.M. Wilamowski
[Takagi and Sugeno 1985] Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application to modeling and control. IEEE Trans. on System, Man, Cybernetics 15(1), 116–132 (1985) [Wilamowski 2002] Wilamowski, B.M.: Neural networks and fuzzy systems. In: Bishop, R.R. (ed.) Mechatronics Handbook, vol. 33(1), pp. 32–26. CRC Press, Boca Raton (2002) [Wilamowski and Binfet 1999] Wilamowski, B.M., Binfet, J.: Do fuzzy controllers have advantages over neural controllers in microprocessor implementation. In: Proc. of 2nd Conf. on Recent Advances in Mechatronics, Istanbul, Turkey, pp. 342–347 (1999) [Wilamowski and Binfet 2001] Wilamowski, B.M., Binfet, J.: Microprocessor Implementation of fuzzy systems and neural networks. In: Int. Joint Conf. on Neural Networks, Washington DC, pp. 234–239 (2001) [Wilamowski and Yu 2010] Wilamowski, B.M., Yu, H.: Improved computation for Levenberg Marquardt training. IEEE Trans. on Neural Networks 21(6), 930–937 (2010) [Xie et al 2010] Xie, T.T., Yu, H., Wilamowski, B.M.: Replacing fuzzy systems with neural networks. In: Proc. IEEE Conf. on Human System Interaction, Rzeszow, Poland, pp. 189–193 (2010) [Yu and Wilamowski 2009] Yu, H., Wilamowski, B.M.: Efficient and reliable training of neural networks. In: Proc. 2nd IEEE Human System Interaction, Catania, Italy, pp. 109–115 (2009)
Hardware Implementation of Fuzzy Default Logic A. Pułka and A. Milik Institute of Electronics, Silesian University of Technology, Gliwice, Poland {apulka,amilik}@polsl.pl
Abstract. The chapter presents hardware implementation of the model of commonsense reasoning system based on a new formalism Fuzzy Default Logic (FDL). It briefly recalls main definitions and algorithms of FDL technique in a form of software oriented procedures. Basic building blocks used for implementation are presented. Then the software algorithms of the FDL model are transformed into hardware blocks. The entire hardware structure has been implemented in FPGA Xilinx Virtex5 device. The obtained results, examples of experiments and applications in system verification platform are discussed and further research tasks formulated.
1 Fuzzy Default Logic – Software View Deliberations on nonmonotonic reasoning [Brewka 1991] and answer set programming [Balduccini et al. 2006; Gelfond and Lifschitz 1988] and on the other hand, on fuzzy logic and generalized theory of uncertainty [Apt and Monfroy 2001] lead us to the formulation of Fuzzy Default Logic [Pułka 2009]. This approach combines techniques of modeling and handling cases with incomplete information with various types of imprecise information and vagueness. 1.1 Fuzzy Default Rules The Fuzzy Default Rule (FDR) is the following inference rule:
α : β1 , β 2 ...β N
(1)
Φλ
where: α, β 1…βN are wffs (well formed formulas) in a given propositional language L and Φλ is a Fuzzy Hypothesis of the vector form:
{[
( )] [
( )] [
( )]}
Φ λ = h1λ , Tw h1λ , h2λ , Tw h2λ , ., hmλ , Tw hmλ
(2)
where: hiλ (i = 1...m) are wffs in propositional language L, and Tw(hiλ) denotes Trustworthiness; i.e. one of the modality of generalized constraints in the Zadeh’s
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 325–343. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
326
A. Pułka and A. Milik
sense [Zadeh 2006] (bivalent, probabilistic, fuzzy, veristic etc.). To put it simply the trustworthiness can be treated as a membership function or probability. Moreover, we assume that prerequisite (like in [Reiter 2001]) is represented by strong information (facts), while the possible uncertainty or missing of information is represented by justifications β 1…βN. Two assumptions reflect the nonmonotonicity of the inference system: NaF (Negation as a Failure) and CWA (Closed World Assumption). This scheme reduces the problem of inference path propagation and tracing for trustworthiness. If we would like to have a FDR based fully on ignorance and/or vagueness, the prerequisite is an empty set (α ≡ ∅). The fuzzy default rules are interpreted similarly to their classical predecessors (Reiter’s default rules [Reiter 2001]). The main difference is in the form of the hypothesis (FH), which consists of different aspects (views, extensions) of the same problem and each of these sub-hypothesis has its own Tw coefficient. Trustworthiness reflects the significance of a given solution, which usually is subjective and can be modified. Elementary hypotheses (components) of a given FH Φλ, i.e. are h1λ, h2λ,.., hmλ are mutually exclusive. This initial process of the employment of a set of FDR rules could be summarized by the following software algorithm: Algorithm 1 Fuzzy Default Rule forall fuzzy_default_rule(α, β, FHλ) if α true then if β cannot be proved then assert (temporarily assume) every hypothesis hiλ ∈ FHλ
1.2 The Credible Set Extraction Many examples taken from real life prove that usually, during the reasoning process, depending on circumstances, we prefer some conclusions than others. This is reflected in answer set programming by preference relation [Gelfond and Lifschitz 1988]. In case of FDL mechanism we use the cutoff mechanism and generate (extract) so-called Credible Set (CS) Hλ of fuzzy hypothesis Φλ: H λ ⊆ Φλ
and
∀
hiλ ∈ H λ
( )
Th hiλ ≥ cut_off
(3)
In other words, the CS is a subset of Φλ consisting of those elements hiλ that are considered during the further inferring process. Algorithm 2 Credible Set Extraction (basic version1) forall asserted hypothesis hiλ ∈ FHλ remove each one that does not meet the condition Tw(hιλ) > cut off level; 1
Certainly, the above mechanism could be more complicated and based on other coefficients.
Hardware Implementation of Fuzzy Default Logic
327
1.3 Hypotheses Reduction and Assessment Procedures The Hypotheses Reduction (HR) at a given level, we will call the transformation (simplification) of all Credible Sets inferred at this level. This reduction (logical sum of Credible Sets) generates all good (possible for further considerations) hypotheses and reduces the number of trustworthiness per a single hypothesis. After the reduction we obtain all concluded hypotheses that could be considered for the final assessment, and each hypothesis contains just one trustworthiness.
{ }
HR ∪ H iλ = [hi , Tw(hi )] i
∃H
λk k
( ( )
( ( )
λ hi ∈ H kλk and Tw(hi ) = opt Tw hi k
(4)
where: opt Tw h λk denotes optimal value of the trustworthiness for a given i element (hypothesis) selected from all Credible Sets. Selection of optimal function (opt) is another, very interesting problem and it allows the user to interact with the inferring process. The optimal function can be flexible, i.e. it can have different meaning for various kinds of trustworthiness (bivalent, probabilistic, veristic, and fuzzy – [Zadeh 2006]). The optimal function can be also strict (the same for every trustworthiness), which means that it corresponds to one of the following cases: maximum (optimistic approach), mean (no priorities), minimum (worst case analysis), max-min (fuzzy approach) etc. The hypotheses reduction process could be controlled by different functions, which are based on altered criteria. This may allow obtaining different solutions as the accepted hypothesis [Pułka 2009]. In this sense the assessment of hypotheses gives a lot of flexibility to the user and is application dependant. We can also split the hypotheses reduction process into two steps: first, selection of the optimal value of the trustworthiness for each hypothesis and then, final assessment and selection of the best one solution. The criteria for each step could be different. 1.4 Multistage Structure, Feedbacks and System Stability In [Pułka 2009] we presented the logic system Fuzzy Default Logic (FDL) as the theory Δfuzzy for modeling of commonsense reasoning, which splits the inferring process into stages (steps) Δsfuzzy and at every stage a given hypothesis is generated. The stage Δsfuzzy is represented by a quadruple: axioms, simple relations between the knowledgebase elements (classical logic relations), fuzzy default rules and constraints [Apt and Monfroy 2001]. The stage Δsfuzzy is responsible for generation of the hypothesis hs. Formally, we have:
Δfuzzy = { Δs1fuzzy , Δs2fuzzy ,... ΔsNfuzzy}; Δskfuzzy{ A, Facts, FDRs, C }
hsk
(5)
328
A. Pułka and A. Milik
Fig. 1 Inferring viewed as a multistage process 0
Very important is the problem of the inference engine stability [Gelfond and Lifschitz 1988]. To guarantee this stability we have to investigate the rules of the knowledgebase in detail. Instability of the inference engine may lead to dead-locks and infinite loops, the system is not able to generate any hypothesis or more frequently generates inconsistent conclusions, i.e. its extension contains mutually exclusive elements.
Fig. 2 The idea of hierarchical hardware implementation of FDL
First of all we have split the entire process into independent stages (Fig. 2), where each stage is responsible for the generation of a given (single) hypothesis. Secondly, each stage is coherent in the sense of [Pułka 2009]:
λ
∀
hi ∈ H
∃
λ
hiλ ∩ α ϕ ≠ ∅ ¬α ϕ : β 1 ϕ , β 2 ϕ ...β N ϕ λ ϕ ϕ ϕ ∩ ≠ ∅ h β , β ... β i 1 2 N Φϕ
(6)
This means, that no hypothesis is turned back to the input of the stage, on which it is generated (there is no direct loop-back within a single stage). This cut is possible and much demanded, it just reflects initial preferences of the knowledge-base constructor and this solution alludes to the ideas of answer set programming [Balduccini et al. 2006]. Feedbacks are allowed only outside a given stage, so it is possible to turn back a given hypothesis and apply it to the input of
Hardware Implementation of Fuzzy Default Logic
329
the previous stages via the external memory (Fig. 2). Moreover, some properties of the hardware architecture, especially the synchronized logic and sequential processing of stages stabilize the entire structure. Thirdly, there are some general questions, like: what does the consistency mean in the presence of vague information? or what is the sense of inconsistency in the multi-valued system, i.e. the system where a given solution is mapped into the set consisting of more than two elements? Prof. Zadeh [Zadeh 2006] has recognized this problem as a strange situation, and fuzzy logic naturally contradicts the term inconsistency. So, in our case, the modification of the hypothesis (now it is a combined, complex fuzzy hypothesis) and splitting the inference process into the initial generation phase (application of fuzzy default rules), the credible set extraction and the assessment and selection of the final conclusion, are the additional elements eliminating the danger of the instability of the system.
2 Hardware Structures The prototype PROLOG implementation of a fuzzy default logic system has been discussed in [Pułka 2009]. It is very natural and well suited in the programming environment designed for AI purposes. However, the PROLOG language based implementation on sequential machines is far from optimal. Moreover, the inferring process based on backtracking is consistent, but time and resource consuming. Because of these reasons, we have decided to convert the software directed implementations and algorithms into the hardware. This transformation allows us to obtain faster and more parallel (concurrent) solution, which could be application specific implementation. Fig.2 introduces the idea of the entire implementation divided into independent stages. Each stage is responsible for generation of a given sub-solution (hypothesis) and the feed-back is executed via external memory (discussed in the previous section). In such a case the hypothesis coded in a binary form and represented by its identifier is applied to the input of complex prerequisite (see Fig. 4). Actually, this function is completed by a comparison between the current identifier and the flag stored in a rule, and the prerequisite signal (condition) is asserted in case of the match (see section IV). Fig. 3 presents the main building blocks of the single stage. The entire model has been constructed as a regular and structural hierarchy, to obtain repeatable implementation. Let’s have a look on the subsequent elements of this hierarchy. 2.1 The Implementation of a Fuzzy Default Rule Fuzzy default rules are the basic building blocks of our implementation, they provide the results of the commonsense reasoning in the form of bunches of trustworthiness. Fig. 4 presents the hardware structure modeling this inference rule. The register file represents the fuzzy hypothesis and each register stores the value of the appropriate trustworthiness. The register file is controlled by a prerequisite gate generating enable (locking/unlocking) signal, which decides weather a given rule is used or not, i.e. considered in the inference process. Fig.5 presents a detailed structure of the complex prerequisite look-up table, which works as a comparator comparing input conditions (α1… αm) with the flag.
330
A. Pułka and A. Milik
Additional element of the structure is the strength gate generating a special signal that could be used during the hypothesis assessment process. This signal indicates if the justification is based on solid bases or negation as a failure assumption (NaF). We have decided to use 2-bits for coding each justification. One bit indicates the state of information (true or false), while the other is responsible for the status (known or unknown), so we have used 2 RS flip-flops per a single justification, IS– FF (Information Status Flip-Flop) and LS–FF (Logical State Flip-Flop), respectively (Fig.6). In case of the unknown status of a given justification, the logical state of the LS-FF is interpreted as a preferred (default) state under the NaF assumption.
Rejestr Cut_off Q1
FDR7
RTw7
Q1 ENB
ENB
Q1
Q8
Q1
Q8
Q1
Q8
Register Rh8
A
H
A
H
A
H
ENB
Register Rh2
Register HR-code
ALU
Register Rh1
Tw7(hk)
Tw1(hk)
Tw0(hk)
Tw7(h2)
Tw1(h2)
Tw0(h2)
Tw1(h1)
Hypotheses Reduction
Q8
Operation code (min, max, mean etc.)
A
Tw7(h1)
RTw2
H
RTw0
B NE
Tw0(h1)
Credible Set Extraction
sorting
Q8
Q1
Final hypothesis assessment
The code of the selected hypothesis
H
A
ENB
Register result-code
Fig. 3 Structure of a single stage FDL implementation.
Q8
FDR1
FDR0
ENB
A
H
INPUT
Hardware Implementation of Fuzzy Default Logic
331
Fig. 4 Structure of FDR – basic building block f4 f3 FLAG
f2 f1
1
2
3
4
Complex prerequisite
Q R
LS-FF
CLR
false
SET
true
S
Q CLR
R
IS-FF
S
SET
Q
known unknown
Q
Fig. 5 Structure of the complex prerequisite LUT
Fig. 6 The implementation of a single justification
332
A. Pułka and A. Milik
Q8 Q1
B4 B3 B2
B1 B0
B7 B6 B5
A
B7 B6 B5
Register Cut_off
B NE
FDRN
H
FDR1
FDR0
B 4 B3 B2
B1 B0
Credible Set Extraction B NE
B NE
Q1
TwN(hk)
reset Q8
H A
Q8 Q1
Tw0(h1)
reset H A
Fig. 7 The hardware completion of Credible Set
2.2 Hardware Structure of Credible Set Equation (3) defines the basic methodology of obtaining the credible set of a given fuzzy hypothesis. This can be carried out with a set of comparators, but such a solution is relatively expensive and slow. To minimize the hardware resources and reduce the time of the signal processing, we have decided to use a special coding technique of the cut off level register that resembles ‘one hot’ encoding. It gives us only K different cut-off levels (where K is the length of the cut-off register), but taking into account the fact that we deal with vagueness and imprecise information this solution is quite sufficient. In other words if we decide to put the cut-off level on i-th position, this bit and each bit more significant, i.e. i+1, i+2 and so on is asserted, while the less significant bits than i-th one are set to ‘0’. Now only a simple logical operation: ‘nor’ of the ‘binary and’ between bits of a given trustworthiness and the content of the cut-off register allows us obtaining the reset (Fig.7). If the reset is not active (the trustworthiness meets the cut-of level) a given tw_register is loaded with the trustworthiness value. For example, if the length of both registers (the trustworthiness and the cut-off) is 8 bits and the cut-off level corresponds to value “11110000”, we have: reset = not(Tw7 or Tw6 or Tw5 or Tw4)
(7)
So the reset signal is active (‘1’) only if each of the three bits of the trustworthiness (7th, 6th, 5th and 4th) is ‘0’, in other words this means that the value of the trustworthiness is less than “00010000”.
Fig. 8 The idea of hardware-based HR process
Hardware Implementation of Fuzzy Default Logic
333
2.3 Evaluation of Hypotheses Reduction The arithmetic-logic unit (ALU) is responsible for process of the hypotheses reduction. The input to the ALU is based on trustworthiness values selected by CSE stage (Fig.8). This is a programmable unit that is able to find the minimal, maximal or average value of the trustworthiness for a given hypothesis. Only nonzero values are considered during calculation that is important for minimum and average calculation. It is assumed, that circuitry collects 8 different values. This property greatly improves the process of the calculation of the average value, thanks to the fact that the division by power of 2 is performed as a simple shift by n positions. In case of hardware implementation it corresponds to the appropriate connection of wires and the result is obtained immediately. Because we take into account only non-zero items (actually positive values), a full division operation is required just after the completion of collecting data from all units. In general the maximal value of sum of trustworthiness is correlated with the number of non-zero items. This observation allows reducing number of cycles in division process. The number of items (in this case most significant bit different from 0) is correlated with the sum value. In general number of cycles in division procedure can be reduced to number of bits of trustworthiness word (in our case it is 8). Entire processing time is equal to passing all trustworthiness coefficients to the unit (maximum one item per clock cycle). If average value is required, division operation is performed. This operation is dependant of width of trustworthiness word. In experimental case when number of Tw coefficients is 8 and its length is 8 the entire process is performed in 8 clock cycles for minimal and maximal value while average value is calculated in 16 clock cycles. 2.4 Hypotheses Assessment – Hardware Implementation of the Sorting Procedure The sorting procedure is supposed to find the best candidates for further processing. The number of elements for further processing is limited to 4 – 8 items from the entire set. The simple sorting algorithm based on insertion has been applied. This sorting approach is based on finding a correct place for insertion of a new item in the result set. Depending on the place of insertion of the current element the successive items are shifted one position. In software-based approaches the tree search methods are usually applied. This allows finding the insertion place in shortest time with number of necessary comparisons reduced to log2n [Wirth 1976]. After finding the correct place items are copied to appropriate position.
Fig. 9 The 4-word sorting system implementation: a) The sorting by insertion example; b) The iterative sorting system implementation
334
A. Pułka and A. Milik
Hardware implementation of the sorting procedure with limited number of selected items (4-8) allows taking advantage of the properties of parallel processing. That’s why the idea of an iterative circuit implementation is justified. However, we have to remember that the main disadvantage of combinatorial iterative circuits is the propagation time, which increases with number of processed items (e.g. ripple carry adder). The operation is based on finding a place where the input data should be stored. All items from the selected position are shifted one position to allow inserting a new element. The item from the last cell (with lowest credibility is lost). The sorting unit (SU) consists of identical cells connected together to form a desired number of the selected hypothesis for further processing. The function of the cell depends on information from the previous cell, delivered data and the internal state. At the beginning of each selection cycle INIT signal is asserted to flush the result register by setting EM(pty) registers. Valid input data is delivered together with LD = 1. When a cell is empty it loads the data from the input, if the previous cell does not request data shift (SH_I = 0), or from the previous cell output. A cell changes its contents if the previous cell contains valid data and new data item is delivered. The content of a given cell changes in two cases: when the value of a new delivered item is grater than current when the shift is requested. The block diagram of the circuit is presented in Fig. 9. In opposite to combinatorial iterative circuits our SU doesn’t increase propagation (response) time with increase of number of stages. This is an effect of implemented sorting algorithm that places registers and reduces the length of the combinatorial path. Our SU can be further optimized by considering data flow through the unit and available components inside the FPGA device. The optimization of the data selector unit reduces complexity of combinatorial path inside the block. This reduces a propagation time and required resources (Fig. 10b).
Fig. 10 Block diagram of the elementary sorting unit (SU) cell (a) and its optimized structure (b)
3 The Implementation Experiments The entire hardware model has been defined as a top-down hierarchical structure. This structure is very regular and subsequent stages are similar, so the implementation is not very difficult and the model is readable. Every stage consists of the described elements: FDR block, cut-off register, credible set extractor, arithmeticlogic unit, trustworthiness registers. The only differences between stages are the number of elements, the type of signals (external inputs or hypothesis generated within other stages) and additional elements in case of complex prerequisites.
Hardware Implementation of Fuzzy Default Logic
335
Presented model of hardware blocks has been assembled into final unit that is able to calculate hypothesis based on input conditions. To speed up calculation we take benefits of pipelined parallel architecture. Hardware implementation allows concurrent executing of several operations. Additional benefits are local data distribution and access time. The access time to variables is minimal while they are stored in registers, which are distributed within the entire unit. The calculation process can be divided into three phases: trustworthiness evaluation, average values calculation (optional step dependant of evaluation mode), and hypotheses sorting. The evaluation phase considers the trustworthiness values and gathers minimal, maximal or sums up them for the average value calculation. The average values calculations step is executed conditionally if the average mode is requested. In this step a total sum of valid trustworthiness is divided by its number. This operation is performed in parallel to all of them. The last step of the calculation is sorting of the obtained results in a special sorting queue. 3.1 FDL Unit Hardware Structure The FDL unit block diagram has been shown on the Fig.11. This pipelined unit can perform concurrent calculations on over 8 hypotheses. The architecture can be extended to required number of hypotheses. The FDL units can form a chain that is able to generate the final hypothesis. Calculations are started by asserting CALC_RQ input. The completion of the calculation process is notified by the pulse on CALC_DONE output. This output can be connected to CALC_RQ input of next block that will start then calculation process. When the obtained hypothesis is not satisfactory other possible hypothesis can be selected from the sorting queue by asserting NEXT_HYP input. The assertion of NEXT_HYP input not only selects other hypothesis from the sorting queue but also sends the calculation request to the underlying stage.
Fig. 11 Hardware structure of FDL unit
336
A. Pułka and A. Milik
The most efficient hardware implementations are those, where the number of considered hypotheses equals to integer power of 2 (i.e. 8, 16…64). The calculation time depends on two main factors: the number of hypotheses that are evaluated and the length of the trustworthiness word. The last factor is the only important for average value calculations when the division of the result is performed. Calculations are performed in pipelined fashion and data are passed from one stage to another for partial processing. The pipelined approach not only speeds up the calculations but it also reduces the hardware overhead required by multi-path data multiplexing. 3.2 Simulation Example The simulation process allows to validate all assumption made toward architecture and verify data processing. The example of simulation waveforms obtained with ALDEC Active-HDL is shown in Fig.12. This picture presents results of the calculation process obtained for the AVERAGE mode. We can observe the data circulation through the ALU stages during the hypothesis evaluation (Q=2) process. The only exception is the division operation, which is performed just after EVALUATION step (Q=4). In the SORT step (Q=8) data is shifted out from ALU units to the sorting queue (see Fig.12). The structure based on pipeline processing requires precise scheduling of data-flow through the processing units. The architecture has been repetitively refined during its implementation process. The behavioral models for early verification of concept are refined toward mature hardware implementation. Then, synthesis results are investigated and compared with theoretical expectation. This iterative approach allows refining and optimizing many blocks of the entire circuit. Some of them were several times redesigned to better fit the architecture and extend functionality.
Fig. 12 Calculation process with option AVERAGE
Hardware Implementation of Fuzzy Default Logic
337
3.3 Hardware Synthesis Results The architecture was carefully written using HDL languages with the synthesis oriented approach. Elements that are replicated in pipeline instances are carefully designed to fit appropriate hardware resources in target families of programmable devices and to reduce all possible overhead. To allow optimization in synthesis tools the description suggests synthesis tools merging or sharing logic or arithmetic components. The FDL unit has been implemented in wide range of Xilinx FPGA devices. We have investigated wide range of devices starting from Virtex/Spartan 2 representatives to Virtex 5 family. The obtained implementation results are gathered in Table 1. We can find that the resource requirements are different for families which offer similar architecture structures. This problem is caused by differences in routing abilities of device families and the usage of logic components as so called route through items. The unit performance depends on operation mode. In MIN/MAX mode only 16 clock cycles are required to compute an output data per stage while AVERAGE mode requires 25 clock cycles. The calculation performance allows evaluating up to 3 or 2 million hypotheses per second for Spartan 2 family while Virtex 5 exhibits 2 times better performance. Table 1 Logic resources required by a single FDL block Family Version
LUTs
FFs
fMAX [MHz] 51.8
Spartan 2 XC2S200
775
207
Spartan 3E XC3S500E
876
207
80.4
Virtex 4 XC4VLX25
718
207
91.1
Virtex 5 XC5VLX50T
575
207
99.2
We have found that several FDL stages can be implemented within a single chip (Table 2). For comparison purposes it has been assumed that maximal logic capacity of device is reduced to 80% of nominal space. In the largest device, used in our experiments (Xilinx Virtex 5), we can place up to 31 FDL stages. Table 2 Implementation abilities
Family Version
Spartan 2 XC2S200 Spartan 3E XC3S500E Virtex 4 XC4VLX25 Virtex 5 XC5VLX50T
# of FDL stages per chip
# of Slices
(80% of util.)
Total
FDL
2352
446
4
4656
473
8
10752
396
22
7200
185
31
338
A. Pułka and A. Milik
Fig. 13 The application of the Hardware FDL System in the verification process (the main idea)
3.4 Application Example I: Hardware Supported Verification As an application example of the presented methodology, we can point out a hardware supported verification system. [Pułka 2010] describes methodology of application FDL mechanism to planning the platform-based verification strategy for the SoC working with AMBA bus. If we replace the software ATG (automated test generator) with the hardware component we can fasten the simulation process, moreover we can work with real hardware prototype. Fig. 13 presents the overall idea of such a system. Shaded blocks correspond to hardware components, while the Verification Procedures Block contains software procedures responsible for verification given system properties, ie. VP = {A1, A2,…AM} (where: Ak = {Akj1,m1, Akj2,m2, … AkjL,mL} and a given assertion Akjm covers a k-th subset of states transitions, i.e. a set {Skj, …Skm} – see Fig. 15).
S10
S11
S12
S1k
S20
S21
S22
S2k
S30
S31
S32
S3k
Sn0
Sn1
Sn2
Snk
Initial state
Fig. 14 The model of state transitions within the complex system
Fig. 14 models possible state transitions within the complex system. This diagram presents n different basic trains of states’ transitions with possible transfers between these sequences. Every transition for a given system has its probability
Hardware Implementation of Fuzzy Default Logic
339
(possibility, verity etc.), which is based on statistical experiments. The width of lines for the first stages (transitions from state ‘Initial state’) symbolically denotes possibility of a given transition. For the next stages the transition lines are identical, to avoid the illegibility. However, the possibilities of transitions are different. It should be noticed that these trains describes various behavioral schemes of the system and it is allowed that two or more states within a given chain describe the same system state, i.e. Sij = Sik and j ≠ k.
Fig. 15 State transitions train covered by the assertion: Akjm
Δ1fuzzy{ A, Facts, FDRs, C } A = ∅; Facts = {C, D, E} ; C = ∅ and the FDR set contains 3 rules: (1) (C :B/ {[S1,0.4], [S2,0.35], [S3,0.3]}). (2) (D :B/ {[S1,0.2], [S2,0.3], [S3,0.2]}). (3) (E :B/ {[S1,0.1], [S2,0.15], [S3,0.2]}). 0.4 Sk
0.35 0.3
C=‘1’
0.1
S1
0.15
S2
Sk
S3
E=‘1’
0.2
0.2
S1 S2
Sk
0.3 0.2
S3
D=‘1’
S1 S2 S3
B=‘X’ Fig. 16 State transitions selection idea based on the FDL mechanism
The FDL mechanism is used to select the most important verification scenarios that allow efficiently check the system correctness. Fig. 16 presents very simple example how the FDL technique is applied into the verification scheme. We assume that the system considers only transitions with trustworthiness above the level 0.1 (the other transitions are not presented) we obtain the following credible sets for states S1, S2 and S3: Tw(S1) = {0.4, 0.2, 0.1}; Tw(S2) = {0.35, 0.3, 0.15} and Tw(S3) = {0.3, 0.2, 0.2}. Now, using the different CS selection keys we can find various solutions: S1 for the maximum value, S2 for average case and S3 for worst case analysis. Then we have to move to another stage and select the next transition and so on, to complete the entire train of states. Fig. 17 contains the overall verification system architecture.
340
A. Pułka and A. Milik
Fig. 17 The overall verification platform architecture
This methodology is currently tested on a prototype platform. Complex FPGA structures allow placing the prototype system model and the verification tool together within a single chip. 3.5 Application Example II: Hardware Supported SAT-Solver The next application example of the presented methodology concerns the hardware supported CNF formulas satisfiablity checking, known as SAT-solving problem [Hyojung Han et al. 2010]. The problem can be formulated as answer the question: if there exists an assignment of variables for a given CNF formula for which the formula is true? The satisfiability still belongs to one of the most important problems in the field of logic verification, synthesis and technology mapping. The proposed implementation of the FDL methodology into SAT-solving requires interleaving the FDL stages with the formula reduction stages (Fig. 18). Each FDL stage is responsible for a single variable selection (assignment to true or false), which is treated as a conclusion, while the reduction stages simplify the analyzed CNF formula and alternatively indicate possible conflicts (inconsistencies) forcing the loopbacks. In this sense, the FDL methodology as a form of non-monotonic reasoning is well suited to this purpose (SAT solving). The FDR rules analyze a given CNF formula and their hypotheses may depend on the frequency of a given literal appearance, type of the variable (simple or inverted), possible decompositions of the formula into excluded subsets etc. The assumed final hypothesis (variable with its assignment) should lead to the maximal prune of the searching space (reduction of the next CNFi-1 formula). If we take into account the frequency of the variable appearances the trustworthiness reflects this quantity.
341
Register result-code
Stage M
hM s
h2s : variable x2 = ‘1’ Register result-code
Stage 2
Register result-code
Stage 1
h1s : variable x1 = ‘1’
Hardware Implementation of Fuzzy Default Logic
Fig. 18 The application of FDL logic in the SAT-solving procedure
Currently we are testing the methodology with data coming from the software implementation. We plan to extend the application with integration of the FDL engine with the tested formulas on a single chip.
4 Summary The chapter has proposes the hardware implementation of the formal inference rules within the programmable logic hardware structures. The research is focused on construction the efficient and very quick hardware structure that mimic the behavior of an intelligent agent. The other solutions in the field are only partial and limited; they implement heuristics based only on neural networks or fuzzy rules. In our opinion, this approach is more sophisticated, but clear, readable and gives full control over the inference process. The structure is very regular, which simplifies the design and implementation processes and allows keeping control over the hypotheses generation. The structure is multistage, i.e. the reasoning process is split into stages responsible for generation different conclusions and the order of these stages reflects the preferences (this solution corresponds to answer set programming [Balduccini et al. 2006]). Such a solution on one hand stiff, on the other hand prevents us from generation of unstable conclusions contradicting one another. Moreover, if we assume reconfiguration of the system, we can very easy flip flop the contents between stages and change preferences. The proposed hardware solution enables the parallel evaluation of fuzzy default rules (FDRs) and eliminates very annoying problems of database searching and backtracking. In software based solutions each inference FDR rule requires detailed analysis of the contents of the data and knowledge bases and investigation of every instance of a given predicate (PROLOG clause). This problem is particularly evident when the program has to proof the negation of a given fact. These
342
A. Pułka and A. Milik
operations consume program stack and additional clock cycles. In case of hardware implementation, each rule is implemented within a single cell executed almost immediately with a delay of 10 to 19.1 ns (depending of the target structure), so the hardware performance is many times faster. The same situation is with the revision of beliefs process, when the feedback to the previous stage is forced. In hardware, the process of hypotheses invalidation and re-evaluation of rules is completed very quickly without necessity of keeping control on the inference tree and considering the dependencies between the subsequent conclusions. The main drawback, of the presented approach is so far the dedicated (limited) application of a given, implemented deduction structure. However, in our opinion, this problem can be soon overcome in a mixed hardware-software solution, where the universal structure is loaded with a set of the detailed rules from the external memory, i.e. the contents of the structure cells are replaced with appropriate data. The research and implementation of prototypes is ongoing. We are still working on refinement of the architecture implementation for better resource utilization and propagation time reduction. Possibly careful usage of pipeline technique in long combinatorial data paths will allow further increasing the overall performance of the unit. Finally, we have pointed out two (of many others) possible applications of the methodology, which are currently tested.
References [Apt and Monfroy 2001] Apt, K.R., Monfroy, E.: Constraint programming viewed as rulebased programming. Theory and Practice of Logic Programming 1(6), 713–750 (2001) [Balduccini et al. 2006] Balduccini, M., Gelfond, M., Nogueira, M.: Answer set based design of knowledge systems. Annals of Mathematics and Artificial Intelligence 47(1-2), 183–219 (2006) [Brewka 1991] Brewka, G.: Cumulative default logic: in defense of nonmonotonic inference rules. Artificial Intelligence 50, 183–205 (1991) [Gelfond and Lifschitz 1988] Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: Kowalski, R.A., Bowen, K.A. (eds.) Proc. of the Fifth Int. Conf. and Symp. on Logic Programming, pp. 1070–1080. The MIT Press, Seattle (1988) [Hyojung et al. 2010] Hyojung, H., Somenzi, F., Hoonsang, J.: Making deduction more effective in SAT solvers. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29(8), 1271–1284 (2010) [Pułka 2009] Pułka, A.: Decision Supporting System Based on Fuzzy Default Reasoning. In: Proc. of the Human Systems Interaction Conf., Catania, Italy, pp. 32–39 (2009) [Pułka 2010] Pułka, A.: System on Chip Verification Strategy Based on FDL Mechanism. In: Proc. of the Int. Conf. on Signals and Electronic Systems, Gliwice, Poland, pp. 355–358 (2010) [Reiter 2001] Reiter, R.: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. MIT Press, Cambridge (2001)
Hardware Implementation of Fuzzy Default Logic
343
[Xue et al. 2009] Xue, J., Sun, L., Liu, M., Qiao, C., Ye, G.: Research on high-speed fuzzy reasoning with FPGA for fault diagnosis expert system. In: Proc. of the 2009 IEEE Int. Conf. on Mechatronics and Automation, Changchun, China, pp. 3005–3009 (2009) [Wirth 1976] Wirth, N.: Algorithms + Data Structures = Programs, 1st edn. Prentice-Hall, Englewood Cliffs (1976) [Zadeh 2006] Zadeh, L.A.: Generalized theory of uncertainty (GTU) – principal concepts and ideas. Computational Statistics & Data Analysis (51), 15–46 (2006)
Dwulit’s Hull as Means of Optimization of kNN Algorithm M.P. Dwulit1 and Z. Szymański2 1
Department of Computer Science, Warsaw University of Technology, Warsaw, Poland
[email protected] 2 Department of Computer Science, Warsaw University of Technology, Warsaw, Poland
[email protected]
Abstract. The paper includes a description of a novel method for reducing the size of a training set in order to reduce memory requirements and classification complexity. Our method allows the condensing of the training set in a way that it is both training set consistent (classifies all training data points correctly) and decision-boundary consistent (the decision boundary does not changes after applying our method) for NN classifiers. Furthermore, the algorithm described here allows the utilization of a parallel computing paradigm in order to increase performance.
1 Introduction Supervised machine learning methods are commonly used in tasks which require ‘intelligent’ data processing. There is a broad range of such methods. The differences between them include computational complexity, generalization ability, and classification performance. Among different approaches and different solutions one method stands out due its simplicity. Nearest neighbor (NN) classification is addressed widely in literature. The K-nearest neighbors (KNN) is an example of a supervised learning algorithm which extends NN classification rule to K nearest neighbors. It is one of the most widely used and well known classification algorithms. The KNN algorithm is used in broad range of classification problems such as: medical diagnosis, image processing, predicting of properties of amino acids sequences. The popularity of KNN is based on its simplicity and effectiveness. However the KNN algorithm has two significant drawbacks. The first is the proper selection of considered neighbors denoted as K parameter. Second are high classification complexity and high memory requirement needed during classification. Problem of proper selection of considered neighbors denoted as K parameter has its background in the finite nature of training set [Fayed et al. 2009] and requires dense sampling in order to obtain good classification performance for some applications. When a small number of neighbors is considered (small k value) and
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 345–358. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
346
M.P. Dwulit and Z. Szymański
the data contains noise, the classifier might have poor generalization ability. To address the issue one can increase number of points in the neighborhood. However, to large neighborhood might also cause adverse effects. In this case classifier is sensitive to the ratio of positive and negative labeled examples in the data set. In extreme case when the number of neighbors is equal to the number of samples in the training set all classified samples will be assigned to the most frequent class in the training set. The locality of operation is lost. This problem was addressed in [Vincent and Bengio 2002]. The authors proposed an approach which stabilizes the classification while considering a small number of neighbors. This same problem was addressed in [Zhang et al. 2006]. The authors proved that with an increase of the number of considered neighbors, classification approaches Support Vector Machine (SVM) classification and in the extreme case it is equal to SVM classification. However the complexity of the solution proposed by them is unpractical for large k. Second disadvantage of KNN is the computational complexity during classification phase and the memory size requirement needed during classification. KNN algorithm requires dense sampling in order to obtain good classification. Direct consequence of that requirement is necessity of keeping a large size training set in memory and searching that set in each step of classification for K nearest neighbors. There are two well known approaches to this problem: editing and condensing. Editing reduces the training set in order to increase the generalization capabilities of a new classifier. Condensing, on the other hand, is focused on minimizing the number of samples in the training set while keeping the decision boundary [Fayed et al. 2009]. The problem of reducing the training set size was addressed by many researchers. Hart proposed Condensed Nearest Neighbor Rule (CNN) [Hart 1968]. CNN creates a consistent subset S T, where the classifier builds on S and gives the same classification results as a classifier built on T. CNN algorithm starts from random sample which is added to S. Then, classifier based on S is built. Next, all samples in T, which were classified incorrectly by the classifier, are added to S. The steps are then repeated. CNN terminates when no sample from T is added to S. The main drawback of CNN is that the algorithm is order dependent and it may keep points which are far from the decision boundary. Gates has proposed a modified version of CNN which he called Reduced Nearest Neighbor Rule (RNN) [Gates 1972]. RNN’s first step applies CNN in order to find a consistent set S. Then all the points in S are revisited and removed to check if their deletion does not create any misclassification of the samples in T. Wilson and Martinez analyzed five algorithms [Wilson and Martinez 2000], basing on decremental removing procedure (DROP), which reduced the size of the training set T: DROP1, DROP2, …, DROP5. Decremental removing procedure is based on associate patterns. The associate patterns for a pattern p are the patterns which have p as one of their K nearest neighbors [Fayed et al. 2009]. The decision about removing pattern p from the consistent subset S is based on its effect on the classification of its associate. DROP1 algorithm is identical to the RNN with one exception. The accuracy of classification is checked on a consistent subset S instead of T. DROP2 is based on DROP1. However, it is updated for additional
⊂
Dwulit’s Hull as Means of Optimization of kNN Algorithm
347
steps in order to eliminate instances which have a high probability of being noise. DROP2 considers the classification effect of removing an instance from S on all instances in the whole training set T, instead of considering just instances in S. DROP3, DROP4 and DROP5 contain additional steps for processing noise. The main difference between them is the place of noise filtering and when the DROP2 algorithm is used. Fayed and Atiya describe a condensing algorithm based on nearest neighbors chains [Fayed et al. 2009]. The nearest neighbor chain is a sequence of nearest neighbors from alternating classes. One of the properties of a chain is that the distance between neighboring patterns in the chain does not increase. The authors based their algorithm on this property. The patterns which are farther in the chain are closer to the decision boundary and a cutoff for patterns is set deciding which patterns are kept in the consistent set S. The empirical results show that the size of the training set is effectively reduced. Bhattacharya proposed two methods [Bhattacharya et al. 1992]. One method is based on Voronoi diagrams and the other is based on Gabriel graphs. The Method based on Voronoi diagrams is especially interesting because it yields a consistent set which is both training set consistent (classifies T correctly) and decision boundary consistent (the decision boundary is the same for condensed set S and for set T) for NN classifier. The algorithms first step is to construct a Voronoi diagram. In the second step all the patterns are visited and marked if at least one neighbor in the Voronoi diagram of the current sample has a different label than the current sample. In the third step all points which are marked are copied to the consistent set S. The main drawback of the algorithm based on a Voronoi diagram is high complexity. In this paper we describe geometric structure – Dwulit hull, which we successfully apply to address both problems. Dwulit hull applied to the problem of selecting proper neighbors allows creating simple, easy for implementation classification rule. We also use Dwulit hull to reduce size of the learning set. The new condensing algorithm is able to create condensed training set which is training set consistent and decision boundary consistent. One of the drawbacks of the proposed approach is the complexity of computing Dwulit hull in high dimensional space. We deal with that problem by proposing approximation of Dwulit hull [Szymanski and Dwulit 2010]. The performance of modified algorithm was verified on well known, real word data sets. The preliminaries are described in Section 2. In section 3 we described the proposed algorithm. Section 4 contains results of empirical testing and examples of created decision boundary. Results are discussed in section 5. Finally conclusions are drawn in section 6.
2 Preliminaries Our method utilizes definitions and ideas used in computational geometry. In this section we define all necessary geometrical structures and present their graphical representations. In addition we give the definition of inversion and list the terms which allow intuitive understanding of the proposed algorithm. First we describe a
348
M.P. Dwulit and Z. Szymański
convex hull. A convex hull along with inversion is used to compute Dwulits hull. Dwulits hull is a subset of Delaunay triangulation, which is a dual graph to a Voronoi diagram. It can be shown that Dwulit’s hull computed for any point in a set contains all its neighbors in a Voronoi diagram constructed for the set. Our algorithm is based on this property. Convex Hull [O'Rourke 1998]. Computing a convex hull is the most prominent problem of computational geometry. Computing the convex hull problem may be stated as follows: For a given finite set vex hull of P.
P ⊂ R d find a polytope formed by a con-
{x1 ,..., xn } is the sum of the α1 x1 + ... + α n xn where α i ≥ 0 for all i and sum of α1 + ... + α n = 1 .
Definition 1. Convex combination of points
Definition 2. Convex hull of a finite set
P = {x1 ,..., xn } of points in R d is de-
fined as: n n conv( P ) = α i xi α i = 1 i =1 i =1
(1)
Fig. 1 Convex hull
Voronoi Diagram [Goodman 1997]. A Voronoi diagram (V) is one of the most important geometrical structures in computational geometry. The definition of a Voronoi diagram utilizes the concept of proximity to a point or to a discrete set of points, in order to partition space into regions. The most intuitive explanation is shown by Figure 2. Figure 2 shows a Voronoi diagram for three isolated points.
Fig. 2 Voronoi diagram defined by three isolated points
Dwulit’s Hull as Means of Optimization of kNN Algorithm
349
If we consider Euclidean n-dimensional space, and a finite set of pointsܲ ൌ ሼݔଵ ǡ ǥ ǡ ݔ ሽ, which is a subset of Euclidean space, the Voronoi diagram partitions Euclidean space in such a way, that the region for each site xi consists of all points closer to the xi then to any other site in P. Definition 3. The Voronoi region of a site
xi ∈ P denoted by V (xi) is the set of
all points p for which the Euclidean distance from xi is equal to or smaller than the distance to any other site in P.
V ( xi ) = { p xi − p ≤ x j − p ∀i ≠ j}
(2)
Definition 4. Voronoi diagram of P is a collection of all Voronoi regions. Delaunay Triangulation [Goodman 1997]. Delaunay is a geometric structure, which fulfills the following condition: the circumference of every full-dimensional simplex contains no sites in its interior. Another distinguishing feature of Delaunay triangulation is that it is dual of the Voronoi diagram and that the collection of all Delaunay faces partitions a convex hull in to cells.
Fig. 3 Delaunay triangulation
Definition 4. Delaunay face of a subset DF(T) is defined for a subset T of P whenever there is a sphere through all the sites of T with all others sites in P exterior. Definition 5. Delaunay triangulation of set DF(P) which is a subset of Euclidean space Rd is a collection of Delaunay faces. Inversion [Blair 2000]. Let Sr(x0) be a sphere centered at x0 with radius r. S r ( x 0 ) = {x ∈ R d | x − x 0 = r}
(3)
Inversion in Sr(x0) is mapping g: R d − {x 0 } → R d defined as: g ( x) = x0 + r 2
x − x0 x − x0
(4) 2
350
M.P. Dwulit and Z. Szymański
Inversion in Sr(x0) is an involution in R d − {x0 } space:
g ( g ( x)) = x
(5)
Inversion removes the center of inversion from the space and that is an undesired feature in our application. Therefore we will define the following mapping as inversion: g ( x i ) | x i ∈ P, i ≠ k Inv( P, x k ) := x k | x k ∈ P, i = k
(6)
Figure 4 shows seven points on the plain. Figure 5 shows the same seven points after inversion with the center of inversion in x0. It is worth noticing that x and g(x) are on this same ray emanating from the center of inversion x0, and that the product of the distances of x and g(x) is equal to r2.
Fig. 4 Discrete space of seven points
Fig. 5 Scaled discrete space of seven points after inversion in sphere with center in p0 and radius 1
Further in the paper, without loss of generality, we assume that the radius r is equal 1. Therefore we can simplify formula (4) to the following: g ( x) = x 0 +
x − x0 x − x0
(7) 2
Dwulit’s Hull as Means of Optimization of kNN Algorithm
351
Inversion theory lists four terms which describe properties ofinversion in Euclidean space and allow intuitive understanding of inversion [Blair 2000]. Theorem 1. • The inverse of a hyperplane passing through the center of inversion is a hyperplane itself. • The inverse of a hyperplane not passing through the center of inversion is a sphere passing through the center of inversion. • The inverse of a sphere passing through the center of inversion is a hyperplane not passing through the center of inversion. • The inverse of a sphere not passing through the center of inversion is a hyperplane passing through the center of inversion. Dwulit’s Hull DH(P,xk) of a finite, discreet set of points P with center in xk where 0 ≤ k ≤ n is a geometric structure which corresponds to a convex hull of set
Q x0 DH ( P, x k ) := Inv(Conv( Inv( P, x k )))
(8)
Geometric interpretation of Dwulit’s hull is shown in figures 6 and 7. Figure 6 represents a convex hull in two-dimensional inversed space with the center of inversion in x0. Figure 7 shows the corresponding Dwulit’s hull in the original space.
Fig. 6 Convex hull in inverse space
352
M.P. Dwulit and Z. Szymański
Fig. 7 Dwulit’s hull
It can be shown that DH(P,xi) contains all the neighbors of xi from Vor (P ) . Simple geometric interpretation of that fact is that each facet limiting convex hull lays on a hyperplane in inversed space. The hyperplane by itself is the image of an empty sphere (in original space) passing through the center of inversion (Theorem 1). Therefore all points defining the facet in inversed space are the images of points which lay on that empty sphere defining Delaunay face. The center of that sphere is a vertex of the Voronoi diagram.
Fig. 8 Voronoi Region V ( x ) in discrete space of seven points 0
Figure 8 illustrates the relationship between Dwulit hull and Voronoi diagram. It shows a discrete space of seven points, the center of inversion x0, circles passing through x0 and neighboring points populating Voronoi vertexes (centers of those circles), and the Voronoi edges limiting Voronoi cell containing x0.
3 Condensing the Training Set through Inversion Procedure Proposed here algorithm is simple. The input is the training data set - dataL. For each point xi in the data set, the k nearest neighbors are selected. Then the Dwulit’s hull of selected neighborhood is computed. Finally the homogeneity of all points in Dwulit’s hull is checked. If points in Dwulits hull and point xi are not
Dwulit’s Hull as Means of Optimization of kNN Algorithm
353
homogenous, the algorithm stores xi as the point defining the decision boundary and that point becomes the element of consistent set S. CONDENSING ALGORITHM Require: dataL k
/* Labeled samples set */ /* Number of considered neighbors in single step */
Ensure: S
/* Condensed set */
S←ϕ KN ← ϕ DH ← ϕ for each point xi in dataL do KN = SelectKNearest(dataL, xi, k) DH = ComputeDwulitsHull(xi,KN) if CheckIfSamplesAreHomogenic(xi,DH) then S=S xi end if end for
∪
Pseudo code contains three procedures which require a word of comment. First is the SelectKNearest(dataL, xi, k) procedure, which returns a set of k nearest neighbors to xi out of dataL set. Second is ComputeDwulitsHull(xi,KN) procedure which computes Dwulit’s hull of given KN set, where xi is an inversion center. Finally procedure CheckIfSamplesAreHomogenic(xi,DH) returns True if all points in DH set have the same label as the center of inversion xi and returns False otherwise.
4 Performance Bhattacharya [Bhattacharya et al. 1992] has proven that the Voronoi editing algorithm is decision boundary consistent and the reference set consistent. Our method is in principle the same. The only difference is the way that the neighbors in the Voronoi diagram are computed. The Voronoi editing algorithm in the first step computes the Voronoi diagram for the whole set. In contrast our algorithm in each step limits the number of considered points to k nearest neighbors and then computes neighbors in the Voronoi diagram out of that set. In the extreme case when the k number of considered neighbors in each step is equal to the size of the training set results are the same for both methods. Therefore we not need to show
354
M.P. Dwulit and Z. Szymański
empirical confirmation of that fact. However we need to show how a change of k parameter influences performance of our algorithm. For that purpose we use the well known Ripley data set [Ripley, 1996]. Ripley set consists of 1250 data points belonging to two non-linearly separable classes (positive – labelled as +1 and negative – labelled as -1). 416 points were assigned to the learning set (shown in Fig. 11) and 834 points to the test set. Table 1 Influence of the k parameter on condensing performance Training set
k
Condensed set
3
Total accuracy
Precision
Recall
(%)
(%)
(%)
68,21
67,37
78,42
Model size
79
Condensed set
9
88,58
87,77
89,93
130
Condensed set
15
89,36
89,02
89,98
140
Condensed set
21
89,42
89,19
89,90
142
Condensed set
27
89,33
88,96
89,99
144
Condensed set
33
89,38
89,03
90,02
144
-
89,38
89,03
90,02
144
Original Set
Table 1 shows the performance of KNN classifier for the training set processed by our algorithm. For all tests the k was constant and equal to 9 when KNN classifier performance was measured. The k parameter in the table denotes the number of neighbors considered in each step during calculating of the Voronoi neighborhood by our algorithm. The performance of KNN classifier grows rapidly with k, when k is relatively small. However there is a point when the performance saturates. Figure 9 shows the relation between k during condensing and the performance of KNN classifier. The performance was measured by computing three indicators: Total accuracy = Precision =
TP + TN ⋅ 100% TP + TN + FP + FN
(9)
TP ⋅ 100% TP + FP
(10) Recall =
TP ⋅ 100% TP + FN
(11)
Dwulit’s Hull as Means of Optimization of kNN Algorithm
355
Where: TP - correctly classified samples from positive class TN - correctly classified samples from negative class FP - incorrectly classified negative samples FN -incorrectly classified positive samples Similarly to the classification performance the size of the condensed training set (model size) grows rapidly with k when the k is relatively small. Size of the model saturates with the growth of k parameter. Figure 10 illustrates that fact.
Fig. 9 Classification performance as a function of number considered neighbors k
Fig. 10 Model size as a function of number considered neighbors k
Figures 11 through 14 show the training set before and after condensing for different values of k parameter. Figure 11 represents the original selected training set consisting of 416 points selected randomly out of the Ripley training set. Figure 12 represents the training set of 79 points, which is result of condensing with number of considered neighbors k=3 in each step. When k parameter is so small, the algorithm selects only these points, which are lying close to the decision boundary. However there are not enough points to decide where which class should be.
356
M.P. Dwulit and Z. Szymański
Therefore classification performance by the KNN classifier is unsatisfactory. With the increase of the considered neighbors number the classification performance increases. So does the size of the condensed training set. For k=9 the training set size is equal 130 points. Figure 13 shows all points selected out of original training set in to the condensed training set. Even though the algorithm still selects points lying close to decision boundary it is possible to estimate where which class should be. Confirmation of that fact is the achieved by KNN classifier classification result (table 1).
Fig. 11 Original training set consisting of 416 points
Fig. 12 Condensed training set created for k=3
Fig. 13 Condensed training set created for k=9
Fig. 14 Condensed training set created for k = 21
Finally, figure 14 shows the condensed training set created by proposed here algorithm for k=21. Condensed training set consist of 140 points. Even though we have increased the considered neighborhood more than twice there was only 10 points added to the condensed training set. Extending the training set had small influence on the KNN classifier performance either (Table 1). Figure 14 shows how each noisy sample is surrounded by members of the proper class. All correctly labeled samples bound the decision boundary.
Dwulit’s Hull as Means of Optimization of kNN Algorithm
357
5 Discussion So far we have described set of geometrical structures and the algorithm by itself. The main advantage of our solution in comparison with Voronoi Editing algorithm is computational complexity. The worst case for Voronoi editing algorithm in dspace is d 3 2 O(d n
log n)
(12) In contrast, the algorithm described here has complexity
O(nk
1 d 1− 2n d
) (13)
where k is the number of considered neighbors in single step. Our approach is based on the observation that for a large training set the number of direct neighbors in Voronoi diagram is much smaller than the size of the whole training set T. Our empirical test confirms this fact. Additionally, if points are in general position then Voronoi edition algorithm produces minimal-size decision-boundary consistent set when NN rule is applied [Bhattacharya et al. 1992]. Using the same proof we may say that when the number of neighbors considered in each step is large enough our algorithm has this same property.
6 Summary In this paper a new condensing method is proposed. Similarly to the Voronoi edition procedure our method may be both training set consistent (classifies all training data set points correctly) and decision-boundary consistent (decision boundary does not change) for the NN classifier. By parameterization of the proposed procedure we give users the tool to choose between accuracy and computational complexity. Furthermore the parallel computing may be applied in order to increase speed of execution because each step may be computed independently.
References [Bhattacharya et al. 1992] Bhattacharya, B.K., Poulsen, R.S., Toussaint, G.T.: Application of proximity graphs to editing nearest neighbor decision rules, Simon Fraser University, pp. 1–25 (1992) [Blair 2000] Blair, D.E.: Inversion theory and conformal mapping. American Mathematical Society, Providence, Rhode Island (2000) [Fayed et al. 2009] Fayed, H.A., Atiya, A.F.: A novel template reduction approach for the K-nearest neighbor method. Trans. Neur. Netw. 20, 890–896 (2009)
358
M.P. Dwulit and Z. Szymański
[Gates 1972] Gates, G.W.: The reduced nearest neighbor rule. IEEE Trans. on Inform. Theory 18, 431–433 (1972) [Goodman 1997] Goodman, J.E.: Handbook of discrete and computational geometry. CRC Press, Boca Raton (1997) [Hart 1968] Hart, E.: The condensed nearest neighbor rule. IEEE Trans. on Inform. Theory 14, 515–516 (1968) [O’Rourke 1998] O’Rourke, J.: Computational Geometry in C, 2nd edn. Cambridge University Press, Cambridge (1998) [Ripley 1996] Ripley, B.D.: Pattern recognition and neural networks. Cambridge University Press, Cambridge (1996) [Szymanski and Dwulit 2010] Szymański, Z., Dwulit, M.P.: Improved nearest neighbor classifier based on local space inversion. In: Proc. of the 3rd International Conf. on Human System Interaction, Rzeszow, Poland, pp. 95–100 (2010) [Vincent and Bengio 2002] Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, vol. 14, pp. 985–992. MIT Press, Cambridge (2002) [Wilson and Martinez 2000] Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 257–286 (2000) [Zhang et al. 2006] Zhang, H., Berg, A.C., Maire, M., et al.: SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In: Proc. of the 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 2126–2136 (2006)
OWiki: Enabling an Ontology-Led Creation of Semantic Data A. Di Iorio, A. Musetti, S. Peroni, and F. Vitali Department of Computer Science, University of Bologna, Italy {diiorio,musetti,speroni,fabio}@cs.unibo.it
Abstract. While the original design of wikis was mainly focused on a completely open free-form text model, semantic wikis have since moved towards a more structured model for editing: users are driven to create ontological data in addition to text by using ad-hoc editing interfaces. This paper introduces OWiki, a framework for creating ontological content within not-natively-semantic wikis. Ontology-driven forms and templates are the key concepts of the system, that allows even inexpert users to create consistent semantic data with little effort. Multiple and very different instances of OWiki are presented here. The expressive power and flexibility of OWiki proved to be the right trade-off to deploy the authoring environments for such very different domains, ensuring at the same time editing freedom and semantic data consistency.
1 Introduction The explosion of social software tools has changed the way most users access the World Wide Web. Even inexpert users can now publish their content with a few clicks and do not need to master complex technologies or to use professional tools. This content is primarily targeted to being consumed by human users, as in YouTube videos, Twitter messages, FaceBook posts and so on. The creation of semantic web content – available for automatic search, classification and reasoning – is much more difficult and time-consuming. Technical competencies and domainspecific knowledge, in fact, are still required in authors. The shift of the Web from a human-understandable to a machine-understandable platform, as envisioned by the Semantic Web community [Berners-Lee et al. 2001], is far from being complete. The term “lowercase semantic web” [Munat 2004] has been coined to indicate research efforts aiming at bridging the gap between simplified authoring and semantic web data. Such lowercase semantic web is not an alternative of the uppercase “Semantic Web”, but rather an intermediate step towards the same goal. While the Semantic Web aims at bringing full-fledged reasoning capabilities to intelligent software, the lowercase “semantic web” aims at encoding semantic data that can be accessed by everyday software and, above all, can be created by unsophisticated users. Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 359–374. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
360
A. Di Iorio et al.
Semantic wikis play a leading role in such a scenario. Semantic wikis are enhanced wikis that allow users to decorate pages with semantic data by using simplified interfaces and/or specialized wiki syntaxes. They provide users with sophisticated searching and analysis facilities, and maintain in full the original open editing philosophy of wikis in everything but the form in which the content is created. Many semantic wikis, though, are essentially designed for scholars and experts in semantic technologies, and are still difficult to be used by average wiki contributors with no technical expertise. Forms are often used to mitigate this issue, providing an intuitive interface that is well known to most computer users. Unfortunately semantic forms as they exist currently do not guarantee the simplicity and ease of use that average users may expect. In most cases, in fact, these forms use generic text fields that are not fit to the domain the wiki is used for, or that only include a limited set of interface widgets. In this paper we introduce a methodology and a tool to simplify the process of authoring semantic web data through wikis, and we present two very different environments where that tool was delivered. The overall approach relies on ontologies and allows even inexpert users to create semantic data with little effort. The tool, called OWiki, uses Semantic Web technologies to handle the wiki knowledge-base and MediaWiki forms and templates to deliver intuitive interfaces to the final users. Forms and infobox templates are automatically generated from ontological data, and are completely customizable to change the nature, structure and constraints of the form widgets. The paper is structured as follows: Section 2 gives an overview of the main approaches to semantic wikis; the following one, Section 3, presents the OWiki approach, focusing on its ontologies and their application in this context, while Section 4 goes into the details of a use-case and the internals of the system are discussed in Section 5. Two different instances of OWiki are presented in the last part of the paper, before drawing some conclusions.
2 Related Works: Semantic Wikis and Ontologies The integration and interaction between ontologies and wikis is a hot research topic. Semantic wikis can be organized into two main categories according to their connections with the ontologies: “wikis for ontologies” and “ontologies for wikis” [Buffa et al. 2006]. In the first case, the wiki is used as a serialization of the ontology: each concept is mapped into a page and typed links are used to represent object properties. Such a model has been adopted by most semantic wikis. SemanticMediaWiki [Volkel et al. 2006] is undoubtedly the most relevant one. It provides users with an intuitive syntax to embed semantics, i.e. RDF statements, within the markup of a page. SemanticMediaWiki allows users to freely edit the content without any limitation. The more the information is correctly encoded the more semantic data are available, but no constraint is imposed over the authoring process. Although the syntax is very simple, SemanticMediaWiki authors still have to learn some new markup and above all to manually write correct statements.
OWiki: Enabling an Ontology-Led Creation of Semantic Data
361
SemanticForms [Koren 2008] is an extension of SemanticMediaWiki that addresses such issue by allowing users to create semantic content via pre-defined forms. SemanticForms generates forms from templates – i.e. predefined structures dynamically filled by content and rendered as tables within MediaWiki articles whose fragments and data have been previously typed. The generation process exploits an embedded mapping between each datatype and each type of field (radio-buttons, checkboxes, textareas, etc.). Users do not need to manually write statements anymore, as they are only required to fill HTML forms. On the other hand, these forms have a fixed structure that cannot be customized and are still difficult to be mastered by the administrators of the wiki. The idea of the second category – “ontologies for wikis” – is to exploit ontologies to create and maintain consistent semantic data within a wiki so that sophisticated analysis, queries and classifications can be performed on its content. IkeWiki [Schaffert 2006] was one of the first wikis to adopt this approach. Its deployment starts by loading an OWL ontology into the system that is automatically translated into a set of wiki pages and typed links. IkeWiki strongly relies on Semantic Web technologies: it even includes a Jena OWL repository and a SPARQL engine used for navigation, search and display of the semantic content of the wiki. UFOWiki [Passant and Laublet 2008] aims at integrating wikis, ontologies and forms too. UFOWiki is a wiki farm, i.e. a server that allows users to setup and deploy multiple semantic wikis. The overall content is stored in a centralized repository as RDF triples that express both the actual content of each page and its metadata. Users are also provided plain-text editors and forms to modify the ontology within the wiki. These forms are generated on-the-fly starting from the mapping between classes and properties of the ontology and types and fields of the form. Administrators set this mapping through a graphical and intuitive interface. The ontological expressiveness of UFOWiki is another aspect worth remarking. While most other wikis only create assertions whose subject is represented by the subject of the wiki page containing that assertion, UFOWiki allows users to associate sub-forms to classes of the ontology and to handle these parts as separate resources. The result is a more fine-grained control over the ontology and its populating process.
3 OWiki: Ontology-Driven Generation of Templates and Forms for Semantic Wikis OWiki is Gaffe-based [Bolognini et al. 2009] extension of MediaWiki that supports users in creating and editing semantic data. The basic idea of OWiki is to exploit ontologies and MediaWiki editing/viewing facilities to simplify the process of authoring semantic wiki content. In particular, OWiki exploits MediaWiki templates, infoboxes and forms. A template is set of pair key-value, edited as a record and usually formatted as a table in the final wiki page. Templates are particularly useful to store structured information: very easy to edit, disconnected from the final formatting of a page, very
362
A. Di Iorio et al.
easy to search, and so on. Templates are defined in special pages that can be referenced from other pages. These pages include fragments with the same structure of the template but filled with instance data. The template-based component of a page is also called infobox. OWiki exploits ontologies to represent the (semantic) knowledge-base of a wiki and creates templates to display that ontology through the wiki itself. This integration and interaction can be summarized in two points: • each class of the ontology is associated to a template-page. Each property is mapped into a key of the infobox; • each instance of that class is represented by a page associated to that template. Each line in the infobox then contains the value of a property for that instance. Data properties are displayed as simple text while object properties are displayed as links to other pages. The actual mapping process is more complex and allows users to select concepts and properties to be displayed, to infer properties and their values. Further details will be provided in Section 3.1. OWiki templates are actually transparent to users. In fact, each template is associated to a form that allows users to create and edit the relative instances. Users do not modify the templates directly but they only access specialized form fields. The crucial point is that even forms are generated on-the-fly from ontological data. OWiki also includes a GUI ontology describing widgets and interface elements. The concepts and relations of the domain ontology are mapped into form elements that are delivered to the final user. During the installation phase OWiki creates a basic set of forms by merging the domain ontology with the GUI one. At the editing phase, the system shows a very basic form and saves it as a special page (template). This page can then be organized as a new form by adding dynamic behaviours, moving buttons, changing the field order and so on. Before describing the internal architecture of the system, it is worth spending few more words about the way OWiki uses ontologies. In fact, the extensive usage of ontologies makes it possible (i) to make OWiki independent on the domain it is used for, (ii) to easily customize forms and templates. 3.1 Using Ontologies to Model the Domain The domain of discourse – i.e., all the topics each page talks about – is handled by an OWL ontology, called domain ontology, whose role is to express data that will be later transformed into the view/edit interfaces shown to the final users. The classes of the ontology are divided in two groups: those strictly related to articles and pages visualized by the wiki – called page-domain classes – and the others that define additional data around the former ones – called data-domain classes. Each page-domain individual will be transformed into a wiki page containing text content (the content is now stored in the MediaWiki internal database) and all semantic data directly related to that individual. Figure 1 shows a page about a particular beer (available at http://owiki.web.cs.unibo.it) that contains a text
OWiki: Enabling an Ontology-Led Creation of Semantic Data
363
description of it in the central page area, while in the right box there are all metadata about this particular beer. OWiki automatically builds this page from the ontological data of the domain ontology.
Fig. 1 An example page about a Beer in OWiki
This mapping process is not ‘blind’ (one class into one page, without any check) but performs some intermediate controls and processing in order to provide users more flexibility. In particular, three strategies adopted by OWiki are worth discussing more in detail: 1. properties flattening 2. properties inheritance by class subsumption 3. classes and properties filtering 3.1.1 Properties Flattening The process of property flattening consists of adding properties to page-domain individuals (that will be directly mapped into wiki pages) that are not specified in the class that individual belongs, but are derived from other (not page-domain) individuals related to the current one. Let us go back to the beer example in order to explain the need and advantages of this process. While some metadata, such as “Beer Alcoholic content” or “Beer Brewed by”, are proper to any beer directly (and they are defined by OWL data or object properties having the class Beer as domain) that is not true for others metadata, such as “Winner Award” and “Winner Awarded on”. These properties are indirectly related to the beer and they are (correctly) defined on other classes of the ontology. In fact, there exists a class Awarding that represents an event, concerning a particular prize, in which a beer participated to in a specific time. The final users, on the other hand, are probably not interested in viewing/editing a wiki page about an Award and in manipulating directly that connection. It is more immediate and intuitive to just find a “Winner Award” property in the page (or better in the template) of a given beer and to update that property through the corresponding form.
364
A. Di Iorio et al.
The following excerpt (in Turtle syntax) shows the information stored in the domain ontology to handle the above-mentioned example: :carlsberg a :Beer ; :hasAwarding :awardingEuropean2007 . :awardingEuropean2007 a :Awarding ; :hasAward :europeanBeerAward ; :hasYear “2007” . In fact, the values shown in the Carlsberg page are not directly extracted from the Carlsberg ontological individual: they are taken from the awarding event the Carlsberg beer participated to. OWiki allows users to mark properties to be “flattened” in order to simplify the final interfaces for the users, by reducing the overall number of wiki pages, and to automatically enrich selected page-domain individuals. Such enrichment needs to retrieve related data from non-page-domain individuals, by navigating the RDF graph represented by the model. 3.1.2 Properties Inheritance by Class Subsumption When a user asks for adding or editing an individual of a particular class, OWiki shows a series of form widgets derived from all the properties explicitly defined for that class, i.e. those properties having it as domain. Requiring ontology designers to define explicitly all properties for all classes is time-consuming and errorprone. OWiki is also able to infer, as part of a class C, all those properties that are not directly linked by any domain assertion to C, but that can be inferred by following the super-class hierarchy starting from C itself. Those properties are derived and eventually shown in the forms. In fact, ontology designers define classes organizing them in hierarchies through sub-class relationships. Of course, these subsumptions, such as Animal subsumes Person, are usually defined when a class shares some common principles or characterizations with another class – for example, the fact of having a genre, that concerns animals as well as persons. Moreover, they represent the main mechanism to infer whether individuals of a particular class (e.g., Person) belong implicitly to a much wide and less-specific class (e.g., Animal). By means of subsumptions, an ontology engineer is free to define shared properties in a particular high-level class without specifying them again in each relative sub-class, simply assuming that those properties are implicitly inherited from the high-level class itself. Let us to introduce an example: :Animal a owl:Class . :hasGenre a owl:DatatypeProperty ; rdfs:domain :Animal ; rdfs:range xsd:string .
OWiki: Enabling an Ontology-Led Creation of Semantic Data
365
:Person a owl:Class ; rdfs:subClassOf :Animal . :Cat a owl:Class ; rdfs:subClassOf :Animal . Here we are saying that either animals, cats and persons may have a genre specified. The model defines explicitly that for the former kind of individuals (i.e., Animal), while it is inferable implicitly for the latter kinds of individuals (e.g., Cat and Person) because of the subsumptions that exist among those classes. OWiki is able to understand all those inherit properties, therefore visualizing them in the template and form of each page-domain class they are related to, and without requiring the ontology designers to explicitly defined all of them. 3.1.3 Classes and Properties Filtering A complete mapping of all properties and all classes of the domain ontology into form fields (and their corresponding infoboxes) would not be very helpful for the users. The resulting interfaces, in fact, would use in exactly the same way either data that could be processed transparently to the users, or that are worth viewing but not editing, or that are actually editable. Consider, for instance, a wiki page about a Person: the creation-date is an information relevant to the system that should not be shown to the users; the URI of the same page, on the other hand, should be displayed in order to identify that page but should not be editable by the user; the name of the person (as well as many other properties) should be both included in the Person infobox and editable in the corresponding form. OWiki handles all these cases by allowing users to decide (i) which classes of the domain ontology are to be converted in wiki pages, (ii) which properties of those classes are to be stored in the system but never displayed to the users, or only viewed but never edited, or fully editable. Such process is called property filtering and exploits OWL annotation properties - metadata about the ontology properties themselves. In fact, users are allowed to classify both object properties and data properties of the domain ontology in: #infoboxHidden (never displayed), #fieldHidden (viewed in the template but not in the forms), #fieldLocked (viewed in the form too but not editable), #editable (that is the default value). The OWiki engine parses such annotations and decides how to map ontological data into actual wiki pages. The use of these annotations contributes to speed up and simplify the view and edit of ontological data: final users, in fact, are required to only fill the required data, while the system handles all others. The distinction between viewable and editable data makes the final users’ interfaces even more intuitive and clear. 3.2 Using Ontologies to Model the Interface OWiki exploits ontologies to also model end-user interfaces. In particular, the system includes a GUI ontology identifying all the components of web forms. The system instantiates and merges that ontology with the domain one in order to
366
A. Di Iorio et al.
generate the final forms. The definitive version of the GUI ontology is still under development but the core concepts and relations are stable and already tested in the current prototype. Separating the GUI ontology from the domain one has a two-fold goal: (1) generating a declarative description of the interface widgets that can be reused across multiple domains not being bounded to specific data and (2) allowing users to customize final interfaces by only changing the association between content and interface widgets. Note that the GUI ontology can be designed once for all, while the domain one requires different expertise for different application scenarios. The GUI ontology defines two types of graphical elements: controllers and panels. Panels are containers for other elements (that can be panels, in turn) used to organize the overall interface, while controllers are single widgets allowing users to actually fill metadata. Controllers can be grouped in two classes that, as expected, correspond to the data-properties and object-properties defined in the domain ontology. 3.2.1 Simple Controllers: Filling Data Properties Simple controllers model the basic form elements, provided to the users for inserting data-properties. Some of them are: Textfield, a field of text for those properties whose value is a string or a number, Drop-down list, a menu for selecting a value among a finite set (whose values are defined in the domain ontology), a ComboBox, that combines the previous two widgets, CheckBox and RadioButton, as used in HTML, and so on. Notice that this set is very extensible, so that newer and more specific needs can be covered, and the most recent developments of client-side programming make such extension easier. 3.2.2
Complex Controllers: Connections among Ontology Entities and Among Wiki Pages
Complex controllers reflect connections among ontology entities – object properties – and allow users to create links between the final wiki pages and to decide which content to be shown. We identified three complex controllers – notice that this set is extensible too – that correspond to three different needs of the users: 1. ConnectField model links to another wiki page. These controllers are usually associated to the majority of the object-properties in the domain ontology (see Section 3.1 for more details about selecting classes from that ontology). When the user will edit a wiki page corresponding to a given class, ConnectFields will provide auto-completion features: the system will suggest a set of linked pages she/he can choose from (or create a link to a completely new resource). These links are in fact derived from the relations in the domain input ontology.
OWiki: Enabling an Ontology-Led Creation of Semantic Data
367
2. ObjectContainers are used for properties flattening (as described in Section 3.1.1). This widget allows users to edit the properties of an individual, whose corresponding page actually does not exist in the wiki, by setting that property in the form of another class including it. The overall process is transparent to the users, which actually use the same interface to modify the properties of two (or more) individuals. 3. InclusionLists are used to combine the information of multiple individuals into the same wiki page. If in the domain ontology there exist two individuals connected by a specific object-property (temporarily called “includes_content_from”), the wiki page corresponding to the first one will include the content of the wiki page corresponding to second one. When editing that page, the user won’t see the whole content of the second page but just an InclusionList controller allowing her/him to select the document to include. Notice that such inclusion is very different from a common link, derived from a different class of object-properties, which is displayed as a link when either viewing or editing a page. These controllers proved to be very useful for creating and authoring compounded documents (even from external content) as they just require users to select the material to be included without pasting and editing it manually.
4 Studying OWiki through a Use-Case The main goal of OWiki is to simplify the creation of semantic data through and within wikis. The complexity of such metadata authoring process, in fact, is hidden behind the application in order to not force users to learn new interfaces and tools. They can easily create semantic data by exploiting forms and templates that are automatically generated from ontological data. In this section we explain with much details this generation process, clarifying how ontological data are converted into (customized) interfaces. Basically, the overall OWiki process consists of three steps: 1. ontology import and forms generation; 2. forms customization; 3. templates and data generation. 4.1 From Ontologies to Forms The first step consists of importing the input domain ontology into the wiki. Let us consider a sample application we will discuss throughout the following sections: an OWiki demo installation describing beers, breweries, ingredients, etc. Figure 2 shows some classes of a domain ontology suitable for such an application.
368
A. Di Iorio et al.
Fig. 2 A graphical representation of the OWiki domain ontology about beers
Classes and properties are mapped into wiki pages following the schema briefly described in the previous section: each concept is mapped into a page and properties are expressed through templates. In particular, data properties become lines of templates infoboxes and object properties become typed links. Notice that such ontology uses the facilities discussed in the previous section to define which classes need to be mapped into wiki pages (through the “hasOWikiPage” property) or which properties of each class need to be viewable or editable (through the annotation properties shown in the right-bottom corner of the picture). Notice also that the class Awarding does not have the “hasOWikiPage” property set: that class, in fact, in not meant to be mapped into a wiki page, but will be only used to “flat” properties according to the strategies described in Section 3.1.1. The OWiki conversion process also produces forms to edit the ontological content. Forms are dynamically built by analyzing the class properties of the imported ontology and by mapping each property in the proper element of the GUI interface. Notice also that the overall status of the OWiki installation is consistent at the first deployment, assuming that domain input ontology was consistent. The process is in fact a straightforward translation of classes and relations into pages and links. In the example, the class Beer defines three properties: name, beer_type and alcohol_content. According to the type of these properties OWiki generates text fields or radio buttons. The default element is a text field that allows any type of value. Since in the input ontology the only possible values of the property beer_type are Ale, Lager and Pilsner, the system adds to the form a RadioButton element specifying those values.
OWiki: Enabling an Ontology-Led Creation of Semantic Data
369
For object properties OWiki chooses between two types of widgets according to their range: whether the range class is a descendant of the oWiki class, the system adds a ConnectField to the form; otherwise it adds an ObjectContainer. The InclusionList controller is not used in this example. Since the Beer class has the object property brewed_by with the Brewery class specified as range, for example, the system add to the form a widget that allows to include a link to a corresponding brewery page. This widget will also provide auto-completion features built on top of the relations expressed in the input ontology. Finally, the relations between the individuals of the classes Beer, Award and Awarding will be processed to add information to each beer that won a prize: that information is “flattened “ from the awarding event individual, as explained in Section 3.1.1. A point is very important: there is a default mapping between classes of the domain ontology and elements in the GUI ontology based on the type of the properties. The name of a property or its meaning in a specific domain is not relevant. There is actually a configuration file that specifies, for each type, which widget to use and how to configure it. In the previous case, for instance, there was an association between enumerations and radio buttons. That mapping is deployed whenever a class has a property which may only have a finite set of values, regardless of the actual domain ontology. In fact, a change in the OWiki configuration file would be reflected in using a different widget for the same property. 4.2 Forms Customization and Filling Furthermore OWiki includes a configuration interface that allows users to set a domain-specific mapping between the input (domain and GUI) ontologies, and to configure the overall organization of the form and its formatting properties. The first time a user edits a page, OWiki shows a basic form. The author can then organize a new form adding dynamic behaviours, moving buttons, changing fields order and so on. Figure 3 shows a simple example of a customized form: while the original form only listed a set of plain text-fields, this one is organized in panels and uses radio-buttons, images and dynamic widgets. Customization can happen at different level. The user can change colour, font, background of the text to increase the appeal and impact of the form; she/he can change the position and the order of the elements to increase the importance of certain data; she/he can change the optionality of the elements, their default values, and so on. The current implementation requires users to customize forms by editing an XML configuration file, through the wiki itself. Even if such an approach is not optimal, the internal architecture of the system relies on a strong distinction between the declarative description of the form (through the GUI ontology) and its actual delivery. That makes possible to implement a user-friendly and graphic environment to create and customize forms. One of our future activities is the implementation of such an editor within the OWiki framework.
370
A. Di Iorio et al.
Fig. 3 A customized form generated by OWiki
4.3 From Semantic Data to Templates and Views Automatically-generated forms are finally exploited by the wiki users to actually write the semantic data. As described in the previous section, data are stored as templates and templates are manipulated by forms in a transparent manner. Let us consider again the Beer class of the example. OWiki generates a form to create instances of that class showing three main components: • a text field to insert the name of the beer • a radio-button to select the type of the beer. Values in the radio button are directly extracted from the domain ontology. • a text field to insert the brewery, that suggests breweries by exploiting information in the domain ontology. These components can even be organized in multiple panels. Once the user fills the form OWiki saves a template with the proper information. Infobox templates, in fact, are used to display metadata and to cluster information about the same document. Each infobox line corresponds to a field in the form that, in turn, corresponds to a parameter and its value in the domain ontology. As expected, the data properties of a class are displayed as simple text while the object property are displayed as links to other documents. The page corresponding to the Carlberg beer in the example, that is an instance of the class Beer and has been edited via the corresponding form, will contain the following (partial) infobox: {{Infobox Beer | hasoWikiNamePage=Carlsberg | Beer_brewedBy=[[Brewery:Carlsberg|Carlsberg]] | Beer_beerType=Lager | Beer_hasAlcoholicContent=2.5° - 4.5° | Hops_hasName=Galena | … }}
OWiki: Enabling an Ontology-Led Creation of Semantic Data
371
Notice that the property Beer_brewedBy contains a link to the page Carlsberg that is now an instance of the Brewery class. Relations in the input ontology are then mapped into the links between pages. And the Carlsberg instance follows the same approach, being it described by the infobox: {{Infobox Brewery | hasoWikiNamePage=Carlsberg | Brewery_hasAddress=Valby 11 DK - 2500, Copenhagen | Brewery_brews=[[Beer:Carlsberg|Carlsberg]] }} Some final considerations are worth about the consistency of OWiki. First of all, note that OWiki forms only work on the instances of the underlying ontology, without any impact on the classes and relations among them. The consequence is that, assuming that users do not corrupt infoboxes (that are anyway available in the source code of a wiki page), the overall ontology keeps being consistent. The OWiki instance is in fact consistent by construction with the domain and the GUI ontology and it is populated via forms in a controlled way. Thus, we can conclude – going back to the distinction between “wikis for ontologies” and “ontologies for wikis” proposed in the related works section – that OWiki currently belongs to the second group and does not properly use the wiki to build and update ontologies. In the future we also plan to investigate a further integration between the wiki and the ontology – and a further integration between the textual content of a wiki page and the relative infoboxes – in order to also use OWiki as a full-fledged simplified authoring environment for ontologies.
5 The Architecture of OWiki OWiki is an integrated framework composed of three modules, delivered with different technologies: • a MediaWiki extension. It is a module integrated in MediaWiki written in PHP that adds OWiki facilities; • an Ontology manager. It is a Java web service that processes OWL ontologies to produce forms for editing metadata. This manager uses internally both Jena API (http://jena.sourceforge.net) and OWLAPI (http://owlapi.sourceforge.net); • an Ajax-based interface: a client-side module that allows users to actually insert data through the forms generated by the OWiki engine. The PHP OWiki module follows the same architecture of any MediaWiki extension: some scripts and methods are overridden to provide new features. In particular, the module implements a revised editor that initializes OWiki environment variables, sets the communication with the client and sets data necessary to store forms in the MediaWiki database without interfering with existing data. To manipulate ontologies, OWiki implements a web service that uses internally the Jena API. Jena is integrated with the Pellet reasoner (http://pellet.owldl.com), that is exploited to extract information about the instances in the ontology. Ranges
372
A. Di Iorio et al.
of some properties, as well as their values, are in fact derived from subsumptions or other relations expressed in the ontology itself. The web-service actually generates templates from the ontological data, that are later sent to the PHP module and stored in the MediaWiki installation. The connection between the PHP and Java modules, and the core of the overall framework is the OWiki client. The client is a javascript application, based on Mootools (http://mootools.net), in charge of actually generating and delivering forms. It is strongly based on the Model-View-Controller (MVC) pattern and its internal architecture can be divided in four layers: • The Connection Layer manages the overall environment, the initialization phase and the communication between all other layers. • The Model Layer (Model of MVC) manages the data to be displayed on the page. It is composed of a factory that creates wrappers for each type of data and instantiates data from the ontology. • The LookAndFeel (View of MVC) manages the final representation of the form, containing atomic and complex widgets, manipulators and decorators. • The Interaction Layer (Controller of MVC) implements the logic of the application, the communication with the web-service, the generation of semantic data and the end-user interaction.
6 OWiki in Real-Life Scenarios OWiki is successfully used for two instantiations of community-driven semantic wikis: PoiStory (http://www.poistory.it) and the ACUME2 Editor (http://acume2.web.cs.unibo.it). This section briefly describes these projects sketching out the role of OWiki. More details are in [Di Iorio et al. 2010]. PoiStory is an enhanced wiki environment that allows users to write, share and print customized touristic guides. A set of OWL ontologies drives the overall content management process, according to the schema described in Section 3. In fact, the main ontology within PoiStory models all concepts of the domain (Points of Interest, touristic data, locations, itineraries, guides, etc.) that are manipulated by OWIki: the instances of the domain ontology are mapped into wiki pages, their properties are mapped into fields of infoboxes and indirect properties are flattened. When editing a page, the PoiStory editor shows a textarea, where users can freely modify the content, and a form, where users can add ontological data that will be displayed in the infoboxes. These forms are automatically generated by OWiki too. Notice also that PoiStory users can customize forms (if they have access permissions to do so) and change both object properties (adding typed links to pages corresponding to objects in the ontology) and data properties (changing atomic values in the form) in a transparent manner. The ACUME2 Editor is a customized wiki platform for the collaborative development of the ACUME2 epistemological grid. Such a grid is a table of definitions of terms created by researchers and connected each other. The goal is to build an infrastructure where researchers from various disciplines can communicate without ambiguities. The epistemological grid in fact provides a free-text description
OWiki: Enabling an Ontology-Led Creation of Semantic Data
373
of the term for each discipline and each term, and generates a network of labeled references to other terms and concepts. The ACUME2 Editor is an instantiation of OWiki. The domain-ontology in this context models concepts and terms and their connections. Such ontological data are automatically converted into wiki pages, infoboxes and forms (to edit properties) by OWiki. The result is a friendly environment that researchers could use to further extend the grid producing a rich and cross-disciplinary graph of relations among terms and concepts.
7 Conclusions One of the criticisms to the Semantic Web is that the process of creating ontologically sound content is still complex and in the hand of expert users. Semantic wikis are a valid solution to address such issue: they in fact exploit the free and open editing model of wikis to make users create a semantic knowledge-base easily. OWiki makes a further step towards the same goal: allowing users to write ontological content within not-natively-semantic wikis too, though automatically generated forms and templates. Forms, templates and ontologies are strictly connected in OWiki: templates are automatically generated from ontological data (definition of classes and properties), forms are automatically generated from templates, and eventually ontological data (individuals and property values) are populated through these forms. Such a strong connection does not exist between the free-form textual content of a page and the structures (templates and forms) that store ontological data within OWiki. The latter are actually used to create and manipulate semantic content, while the wiki text is mainly used to add notes and further explanations. The next step of our research will be to integrate these two editing approaches, letting authors use both plain wiki textareas and templates/forms to edit the same semantic data in a fully synchronized environment.
Acknowledgment The authors would like to thank Silvia Duca and Valentina Bolognini for their previous works on GAFFE, that originated OWiki.
References [Berners-Lee et al. 2001] Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American, 34–43 (2001) [Bolognini et al. 2009] Bolognini, V., Di Iorio, A., Duca, S., et al.: Exploiting ontologies to deploy user-friendly and customized metadata editors. In: Proc. of the IADIS Internet/WWW 2009 conference, Rome, Italy (2009) [Buffa et al. 2006] Buffa, M., Gandon, F., Ereteo, G., et al.: SweetWiki: A semantic wiki. J. of Web Semantics 6(1), 84–97 (2008)
374
A. Di Iorio et al.
[Di Iorio et al. 2010] Di Iorio, A., Musetti, A., Peroni, S., Vitali, F.: Crowdsourcing semantic content: a model and two applications. In: Proc. of the 3rd Int. Conf. on Human System Interaction, Rzeszów, Poland (2010) [Koren 2008] Koren, Y.: Semantic forms (2008), http://www.mediawiki.org/wiki/Extension:Semantic_Forms [Munat 2004] Munat, B.: The lowercase semantic web: using semantics on the existing world wide web. Evergreen State College, Olympia (2004) [Passant and Laublet 2008] Passant, A., Laublet, P.: Towards an Interlinked Semantic Wiki Farm. In: 3rd Semantic Wiki Workshop Co-located with ESWC 2008, Tenerife, Spain (2008) [Schaffert 2006] Schaffert, S.: IkeWiki: A Semantic Wiki for Collaborative Knowledge Management. In: Proc. of the 15th IEEE Int. Work on Enabling Technologies: Infrastructure for Collaborative Enterprises, Manchester, UK, pp. 388–393 (2006) [Völkel et al. 2006] Völkel, M., Krötzsch, M., Vrandecic, D., et al.: Semantic wikipedia. In: Proc of the 15th Int. Conf. on World Wide Web, Edinburgh, Scotland, pp. 585–594 (2006)
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case A.P. Rotshtein1 and H.B. Rakytyanska2 1
Jerusalem College of Technology – Machon Lev, Jerusalem, Israel
[email protected] 2 Vinnitsa National Technical University, Vinnitsa, Ukraine
[email protected]
Abstract. In this paper, a problem of MIMO object identification expressed mathematically in terms of fuzzy relational equations is considered. The identification problem consists of extraction of an unknown relational matrix, and also parameters of membership functions included in fuzzy knowledge base, which can be translated as a set of fuzzy IF-THEN rules. In fuzzy relational calculus this type of the problem relates to inverse problem and requires resolution for the composite fuzzy relational equations. The search for solution amounts to solving an optimization problem using a genetic algorithm. The resulting solution is linguistically interpreted as a set of possible rules bases. The approach proposed is illustrated by computer experiment and examples of diagnosis and prediction.
1 Introduction Generation of a system of fuzzy IF-THEN rules from readily available experimental data is the necessary condition for nonlinear object identification on the basis of fuzzy logic. Fuzzy relational calculus [Di Nola et al. 1989] provides a powerful theoretical background for knowledge extraction from data. Some fuzzy rule base is modelled by a fuzzy relational matrix, discovering the structure of the data set [Higashi and Klir 1984; Pedrycz 1984]. Fuzzy relational equations, which connect membership functions of input and output variables, are built on the basis of fuzzy relational matrix and Zadeh’s compositional rule of inference. The identification problem consists of extraction of an unknown relational matrix which can be translated as a set of fuzzy IF-THEN rules. In fuzzy relational calculus this type of the problem relates to inverse problem resolution for the composite fuzzy relational equations [Peeva and Kyosev 2004]. Solvability and approximate solvability conditions of simplified and multidimensional fuzzy relational equations are considered in [Pedrycz 1988; Peeva and Kyosev 2004]. While the theoretical foundations of fuzzy relational equations are well developed they call for more efficient use of their potential in system modelling. Non-optimizing approach [Branco and Dente 2000] is widely used for fuzzy relational identification. Such adaptive recursive techniques are of interest for the most of on-line applications. Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 375–394. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
376
A.P. Rotshtein and H.B. Rakytyanska
Under general conditions, an optimization environment is the convenient tool for fuzzy relational identification [Bourke and Fisher 2000]. In this paper, we consider a problem of MIMO object identification expressed mathematically in terms of fuzzy relational equations. The genetic algorithm as a tool to solve the fuzzy relational equations was proposed in [Rotshtein et al. 2006]. The genetic algorithm [Rotshtein and Rakytyanska 2008] allows us to solve the inverse problem which consists in the restoration of the unknown values of the vector of the unobserved parameters through the known values of the vector of the observed parameters and the known fuzzy relational matrix. In this paper, the genetic algorithm [Rotshtein and Rakytyanska 2008] is adapted to identify the relational matrix for the given inputs-outputs data set. The algorithm for fuzzy relation matrix identification is accomplished in two stages. At the first stage parameters of membership functions included in fuzzy knowledge base and rules weights are defined using the genetic algorithm [Rotshtein 1998]. In this case proximity of linguistic approximation results and experimental data is the criterion of extracted relations quality. It is shown here that in comparison with [Rotshtein 1998] the non-unique set of IF-THEN rules can be extracted from the given data. Following [Rotshtein et al. 2006; Rotshtein and Rakytyanska 2008], at the second stage the obtained null solution allows us to arrange the genetic search for the complete solution set, which is determined by the unique maximum matrix and a set of minimum matrices. After linguistic interpretation the resulting solution can be represented as a set of possible rules collections, discovering the structure of the given data. The approach proposed is illustrated by computer experiment and examples of diagnosis and prediction.
2 Problem Statement Let us consider an object Y = f (X) with n inputs X = ( x1 ,..., x n ) and m outputs Y = ( y1 ,..., y m ) , for which the following is known: intervals of input and output
variables change xi ∈ [ x i , x i ] , i = 1, n ; y j ∈ [ y , y j ] , j = 1, m ; classes of decij
sions e jp (types of diagnoses) for evaluation of output variable y j , j = 1, m , p = 1, q j ,
formed
by
[ y , y j ] = [ y , y j1 ] ∪ ...[ y j
j
digitizing jp
the
, y jp ] ∪ ...[ y
jq j
range
[ y ,yj ] j
into
qj
levels
, y j ] ; training data in the form of L
pairs of “inputs – outputs” experimental data
ˆ ,Y ˆ X s s ,
s = 1, L , where
ˆ = ( xˆ s ,..., xˆ s ) and Y ˆ = ( yˆ s ,..., yˆ s ) are the vectors of the values of the input X s 1 n s 1 m and output variables in the experiment number s. It is necessary to transfer the available training data into the following system of IF-THEN rules:
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
377
Rule l : IF x1 = a1l and ... xn = a nl THEN y1 = b1l and ... ym = bml , l = 1, N ,
(1)
where ail ( b jl ) is the fuzzy term describing a variable xi ( y j ) in rule l , i = 1, n , j = 1, m ; N is the number of rules.
3 Fuzzy Rules, Relations and Relational Equations This fuzzy rule base is modelled by the fuzzy relational matrix presented in Table 1. Table 1 Fuzzy knowledge base IF inputs
x1
… xi
THEN outputs … xn
y1
… yj
…
e11 … e1q1 … e j1 …
E1
e jq j …
ym
em1 … emq m
Ek
EM
C1
a11 … ai1
… an1
r11
…
r1k
…
r1M
…
…
… …
… …
…
…
…
…
…
Cl
a1l
… ail
… anl
rl1
…
rlk
…
rlM
…
…
… …
… …
…
…
…
…
…
rN 1
…
rNk
…
rNM
C N a1N … aiN … a nN
This relational matrix can be translated as a set of fuzzy IF-THEN rules: Rule l : IF X = Cl THEN y j = e jp with weigh t rl , jp , l = 1, N ,
(2)
where Cl is the combination of input terms in rule l ; rl , jp is the relation Cl × e jp , j = 1, m , p = 1, q j , interpreted as the rule weight.
We shall redenote the set of classes of output variables as { E1 ,..., E M } =
{e11 ,..., e1q1 ,..., em1 ,..., emqm } , where M = q1 + ... + qm . In the presence of relational matrix R ⊆ Cl × E k =[ rlk , l = 1, N , k = 1, M ] the “inputs-outputs” dependency can be described with the help of Zadeh’s compositional rule of inference [Pedrycz 1984]: μ C ( X) R = μ E (Y) ,
(3)
where μ C ( X) = ( μ C1 ,..., μ C N ) is the vector of membership degrees of vector X to input combinations Cl ; μ E (Y ) = ( μ E1 ,..., μ EM ) is the vector of membership
378
A.P. Rotshtein and H.B. Rakytyanska
degrees of variables y j to classes e jp ; is the operation of max-min composition [Pedrycz 1984]. The system of fuzzy relational equations is derived from relation (3): max ( min[ μ C l ( X) , rl , jp ]) = μ
e jp
l =1, N
(y j ) ,
(4)
where μ Cl ( X) = min [ μ ail ( xi )] , l = 1, N . i =1, n
Here μ ail ( xi ) is a membership function of a variable xi to the fuzzy term ail ;
μ
e jp
( y j ) is a membership function of a variable y j to the class e jp .
We use a bell-shaped membership function model in the form:
μ ail ( xi ) = 1 /(1 + (( xi − β ail ) / σ ail ) 2 )
(5)
where β ail is a coordinate of function maximum, μ ail ( β ail ) = 1 ; σ ail is a parameter of concentration [Rotshtein 1998]. The operation of defuzzification is defined as follows: qj
y j = y ⋅μ jp p =1
e jp
qj
(y j)
μ
p =1
e jp
(y j )
(6)
Relationships (4)–(6) define the generalized fuzzy model of an object as follows: Y = FR ( X, R, Β, Ω) ,
(7)
where Β = (β1 ,.., β K ) and Ω = (σ 1 ,...,σ K ) are the vectors of β - and σ parameters for fuzzy terms membership functions in (2); K is the total number of fuzzy terms; FR is the operator of inputs-outputs connection, corresponding to formulae (4)–(6).
4 Solving Fuzzy Relational Equations We propose an approach for MIMO object identification, which enables solving fuzzy relational equations together with generation of a system of fuzzy IF-THEN rules on the basis of experimental information. 4.1 Optimization Problem Let us impose limitations on the knowledge base (1) volume in the following
N ≤ N , where N is the maximum permissible total number of rules. So as content and number of linguistic terms ail , i = 1, n , l = 1, N , used in fuzzy
form:
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
379
knowledge base (1), are not known beforehand then we suggest to interpret them on the basis of membership functions (5) parameter values ( β il , σ il ). Therefore, knowledge base (1) synthesis is reduced to obtaining the matrix of parameters shown in Table 2. It is necessary to find such matrix, which satisfies the limitations imposed on knowledge base volume and provides the least distance between model and experimental outputs of the object: a
a
L
2 [ FR ( Xˆ s , R , Β, Ω) − Yˆ s ] = min
(8)
R ,Β,Ω
s =1
Table 2 Knowledge base parameters matrix IF inputs
x1
THEN outputs …
xn
y1
… yj
… ym
e11 … e1q1 … e j1 …
C1 …
Cl …
CN
a ( β n1 ,
a ( β 11 , …
σ a11 ) …
…
(β
σ
a1l
a1l
…
)
…
…
(β
σ
a1N
… (β
,
a1N
σ a n1 )
anl
…
,
σ
anN
anN
E1
…
Ek
…
EM
r11
…
r1k
…
r1M
…
…
…
…
…
rl1
…
rlk
…
rlM
…
…
…
…
…
rN 1
…
rNk
…
rNM
)
… (β
, )
σ
anl
e jq j … em1 … emq m
, )
4.2 Genetic Algorithm The chromosome needed in the genetic algorithm for solving this optimization problem includes the real codes of parameters R , Β , Ω . The crossover operation is carried out by way of exchanging genes inside each variable rlk , β a il ,
σ ail , i = 1, n , k = 1, M , l = 1, N . The multi-crossover operation provides a more accurate adjusting direction for evolving offsprings that allows to systematically reduce the size of the search region. The non-uniform mutation the action of which depends on the age of the population provides generation of the nondominated solutions. We used the roulette wheel selection procedure giving priority to the best solutions. The greater the fitness function of some chromosome the greater is the probability for the given chromosome to yield offsprings. The fitness function is built on the basis of criterion (8). While performing the genetic algorithm the size of the population stays constant. That is why after crossover and
380
A.P. Rotshtein and H.B. Rakytyanska
mutation operations it is necessary to remove the chromosomes having the worst values of the fitness function from the obtained population. The genetic algorithm for solving optimization problem (8) transforms the initial relational equations (3) into solvable ones. If R 0 is a solution of the optimization problem (8), then R 0 is the exact solution of the composite system of fuzzy relational equations: ˆ ) R = μˆ B ( X ˆ ) , s = 1, L , μˆ A ( X s s
(9)
where the experimental input and output matrices
ˆ ) ... μˆ C N (X ˆ ) ˆ ) ... μˆ E M ( X ˆ ) μˆ C1 (X μˆ E1 ( X 1 1 1 1 ˆB ... ... μ = ... , μ = ... ... ... E ˆ C ˆ ˆ ) ˆ ) ˆ C N (X ˆ EM (X μˆ 1 (X μˆ 1 ( X L ) ... μ L L ) ... μ L ˆA
are obtained for the given training data. Following [Peeva and Kyosev 2004], the system (9) has a solution set S (μˆ A , μˆ B ) , which is determined by the unique maximal solution R and the set of minimal solutions S * (μˆ A , μˆ B ) = { R I , I = 1, T } : S * (μˆ A , μˆ B ) =
∪ [R I , R ] .
R I ∈S *
(10)
I ] are the matrices of the upper and lower bounds of Here R = [r lk ] and R I = [r lk
the fuzzy relations rlk , where the union is taken over all R I ∈ S * (μˆ A , μˆ B ) . The problem of solving fuzzy relational equations (9) is formulated as follows [Rotshtein et al. 2006; Rotshtein and Rakytyanska 2008]. Fuzzy relation matrix R = [rlk ] , l = 1, N , k = 1, M , should be found which satisfies the constraints
rlk ∈ [0, 1] , and also provides the least distance between model and experimental outputs of the object, that is the minimum value of the criterion (8). Following [Rotshtein et al. 2006; Rotshtein and Rakytyanska 2008], formation of intervals (10) is accomplished by way of solving a multiple optimization problem (8) and it begins with the search for its null solution. As the null solution of optimization problem (8) we designate R 0 = [rlk0 ] , where rlk0 ≤ r lk , l = 1, N , k = 1, M . The upper bound r lk is found in the range [rlk0 ,1] . The lower bound I r lk for I = 1 is found in the range [0, rlk0 ] , and for I > 1 – in the range [0, r lk ] ,
where the minimal solutions R J , J < I , are excluded from the search space. Let R (t ) = [rlk (t )] be some t-th solution of optimization problem (8), that is
F (R (t )) = F (R 0 ) , since for all R ∈ S (μˆ A , μˆ B ) we have the same value of
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
381
criterion (8). While searching for upper bounds r lk it is suggested that I it is suggested that rlk (t ) ≥ rlk (t − 1) , аnd while searching for lower bounds r lk
rlkI (t ) ≤ rlkI (t − 1) . The definition of the upper (lower) bounds follows the rule: if I R (t ) ≠ R (t − 1) , then r lk (r lk ) = rlk (t ) . If R (t ) = R (t − 1) , then the search for the
interval solution [R I , R ] is stopped. Formation of intervals (10) will go on till the condition R I ≠ R J , J < I , has been satisfied. The chromosome needed in the real-coded genetic algorithm for solving fuzzy relational equations (9) includes only the real codes of parameters rlk , l = 1, N , k = 1, M . The crossover operation is carried out by way of exchanging genes inside each variable rlk . Parameters of membership functions are defined simultaneously with the null solution.
5 Computer Experiment The aim of the experiment is to generate the system of IF-THEN rules for the target “two inputs ( x1 , x2 ) – two outputs ( y1 , y2 )” model presented in Fig. 1: y1 = ((2 z − 0.9)(7 z − 1) (17z − 19) (15z − 2)) / 10 , y2 = −y1 / 2 +1,
where z = (( x1 − 3.0) 2 + ( x2 − 2.5) 2 ) / 40 . The training data in the form of the interval values of input and output variables is presented in Table 3.
Fig. 1 “Inputs-outputs” model-generator
382
A.P. Rotshtein and H.B. Rakytyanska
ˆ ,Y ˆ ) Table 3 Training data ( X s s Inputs s 1
Outputs
x1
x2
[0.2, 1.2]
[0.3, 1.6]
y1
y2
[0, 1.0]
[0.5, 1.0]
2
[0.2, 1.2]
[1.3, 4.0]
[0, 0.8]
[0.6, 1.0]
3
[0.7, 3.0]
[0.3, 1.6]
[0, 2.3]
[-0.15, 1.0]
4
[0.7, 3.0]
[1.3, 4.0]
[0, 3.4]
[-0.7, 1.0]
5
[3.0, 5.3]
[0.3, 1.6]
[0, 2.3]
[-0.15, 1.0]
6
[3.0, 5.3]
[1.3, 4.0]
[0, 3.4]
[-0.7, 1.0]
7
[4.8, 5.8]
[0.3, 1.6]
[0, 1.0]
[0.5, 1.0]
8
[4.8, 5.8]
[1.3, 4.0]
[0, 0.8]
[0.6, 1.0]
The total number of fuzzy terms for input variables is limited to three. The total number of combinations of input terms is limited to six. The classes for output variables evaluation are formed as follows:
[y , y1] =[0, 0.2) ∪[0.2, 1.2) ∪[1.2, 3.4] , [ y , y2 ] = [−0.7, 0) ∪[0, 1.2] . 1 2 e11
e12
e13
e21
e22
The null solution R 0 presented in Table 4 together with the parameters of the knowledge matrix is obtained using the genetic algorithm. The obtained null solution allows us to arrange for the genetic search for the solution set of the system ˆ ) for the training data take the folˆ ) and μˆ B ( X (9), where the matrices μˆ A ( X s s lowing form: [0.16, 0.74] [0.21, 0.46] [0, 0.50] [0, 0.46] A μˆ = 0 0 0 0
[0.16, 0.52] 0 [0.33, 0.61] [0.28, 0.52] 0 [0.21, 0.46] 0 [0.35, 0.90] [0.28, 0.52] 0 [0.16, 0.74] 0 [0, 0.50] [0.33, 0.61] 0 [0.21, 0.46] 0 [0, 0.50] [0.37, 0.95] 0 ; [0.16, 0.74] [0, 0.50] 0 [0.33, 0.61] [0, 0.50] [0.21, 0.46] [0, 0.46] 0 [0.34, 0.95] [0, 0.50] [0.16, 0.52] [0.16, 0.74] 0 [0.28, 0.52] [0.33, 0.61] [0.21, 0.46] [0.21, 0.46] 0 [0.28, 0.52] [0.35, 0.90]
[0.33, 0.61] [0.35, 0.86] [0.21, 0.74] [0.21, 0.46] μˆ B = [0.21, 0.74] [0.21, 0.50] [0.33, 0.61] [0.35, 0.90]
[0.16, 0.74] [0.21, 0.46] [0.16, 0.50] [0.16, 0.46] [0.16, 0.50] [0.16, 0.46] [0.16, 0.74] [0.21, 0.46]
[0.30, 0.52] [0.30, 0.52] [0.33, 0.61] [0.37, 0.95] [0.33, 0.61] [0.34, 0.95] [0.30, 0.52] [0.30, 0.52]
[0.33, 0.61] [0.35, 0.80] [0.16, 0.74] [0.21, 0.50] [0.16, 0.74] [0.21, 0.50] [0.33, 0.61] [0.35, 0.75]
[0.30, 0.52] [0.30, 0.52] [0.33, 0.61] [0.37, 0.95] . [0.33, 0.61] [0.34, 0.95] [0.30, 0.52] [0.30, 0.52]
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
383
The complete solution set for the fuzzy relation matrix is presented in Table 5, where inputs x1 , x2 and outputs y1 , y2 are described by fuzzy terms Low (L), Average (A), High (H), higher than Low (hL), lower than Average (lA). The obtained solution provides the approximation of the object shown in Fig. 2. In the experiments, crossover and mutation ratios were set to 0.6 and 0.01, respectively. Beginning with the ten initial rules sets the genetic algorithm could then reach the null solution of optimization problem (8) after 5000 generations. About 1000 generations were required to grow complete solution set for fuzzy relational equations (9) (45 min on Intel Core 2 Duo P7350 2.0 GHz). The resulting solution can be linguistically interpreted as the set of the four possible rules bases (see Table 6), which differ in the fuzzy terms describing output y2 in rule 1 and rule 3 with overlapping weights. Table 4 Fuzzy relational matrix (null solution) IF inputs
THEN outputs
x1
x2
y1
y2
e11
e12
e13
e21
e22
C1
(0.03, 0.72)
(0.01, 1.10)
0.15
0.78
0.24
0.52
0.48
C2
(3.00, 1.77)
(0.02, 1.14)
0.85
0.16
0.02
0.76
0.15
C3
(5.96, 0.71)
(0.04, 0.99)
0.10
0.92
0.27
0.50
0.43
C4
(0.00, 0.75)
(2.99, 2.07)
0.86
0.04
0.30
0.80
0.30
C5
(3.02, 1.80)
(2.97, 2.11)
0.21
0.11
0.10
0.15
0.97
C6
(5.99, 0.74)
(3.02, 2.10)
0.94
0.08
0.30
0.75
0.30
Table 5 Fuzzy relational matrix (complete solution set) IF inputs
THEN outputs
x1
x2
y1
y2
hL
lA
H
lA
L
C1
L
L
[0, 0.21]
[0.74, 1.0]
[0, 0.30]
[0.33, 0.61]
[0, 0.52]
C2
A
L
[0.74, 1.0]
[0, 0.16] ∪ 0.16
[0, 0.30]
[0.74, 1.0]
[0, 0.30]
C3
H
L
[0, 0.21]
[0.74, 1.0]
[0, 0.30]
[0.33, 0.61]
[0, 0.52]
C4
L
H
0.86
[0, 0.16]
0.30
0.80
0.30
C5
A
H
0.21
0.16 ∪ [0, 0.16]
[0.95, 1]
[0, 0.16]
[0.97, 1]
C6
H
H
[0.90, 1.0]
[0, 0.16]
0.30
0.75
0.30
384
A.P. Rotshtein and H.B. Rakytyanska Table 6 System of IF-THEN rules Rule
IF inputs
x1
x2
THEN outputs
y1
y2
1
L
L
lA
lA or L
2
A
L
hL
lA
3
H
L
lA
lA or L
4
L
H
hL
lA
5
A
H
H
L
6
H
H
hL
lA
Fig. 2 “Inputs-outputs” model extracted from data
6 Diagnosis of Heart Diseases The aim is to generate the system of IF-THEN rules for diagnosis of heart diseases. Input parameters are: x1 – aortic valve size (0.75–2.5 cm2); x2 – mitral valve size (1–2 cm2); x3 – tricuspid valve size (0.5–2.7 cm2); x4 – lung artery pressure (65–100 mm Hg). Output parameters are: y1 – left ventricle size (11–14 mm); y2 – left auricle size (40–70 mm); y3 – right ventricle size (36–41 mm); y4 – right auricle size (38–45 mm). The training data obtained in the Vinnica clinic of cardiology is represented in Table 7.
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
385
Table 7 Training data Input parameters
Output parameters
s
x1
1
0.75-2
2
2
65-69
12-14
41-44
36
38
2
2.0-2.5
2
2
65-69
11-13
40-41
36
38
3
2.0-2.5
1-2
2
71-80
11
40
38-40
40-45 38-40
x2
x3
x4
y1
y2
y3
y4
4
2.0-2.5
2
2
71-80
11
50-70
37-38
5
2.0-2.5
2
0.5-2
72-90
11-12
60-70
40-41
40-45
6
2.0-2.5
1-2
2-2.7
80-90
11-12
40
40-41
38
7
2.0-2.5
2
2
80-100
11
50-60
36
38
8
2.0-2.5
1-2
2-2.7
80-100
11
40
40-41
38-40
In clinical practice, the number of combined heart diseases (aortic-mitral, mitral-tricuspid etc.) is limited to six ( N = 6 ). The classes for output variables evaluation are formed as follows: [ y , y1 ] = [11, 12) ∪ [13, 14] , [ y , y 2 ] = [41, 50) ∪ [50, 70] , 2
1
e11
e12
e21
e22
[ y , y3 ] = [36, 38) ∪ [38, 41] , [ y , y 4 ] = [38, 40) ∪ [40, 45]. 3 4 e31
e32
e41
e42
These classes correspond to the types of diagnoses e j1 low inflation and ej2 dilation of heart sections y1 ÷ y4 . The aim of the diagnosis is to translate a set of specific parameters x1 ÷ x4 into decision e jp for each output y1 ÷ y4 .The null solution R 0 presented in Table 8 together with the parameters of the knowledge matrix is obtained using the genetic algorithm. The obtained null solution allows us to arrange for the genetic search for the solution set of the system (9), where the ˆ ) for the training data take the following form: ˆ ) and μˆ B ( X matrices μˆ A ( X s s [0.62, 0.94] [0.35, 0.62] [0.21, 0.54] [0.21, 0.54] A μˆ = [0.10, 0.54] [0.10, 0.21] [0, 0.21] [0, 0.21]
[0.32, 0.74] [0.74, 0.90] [0.20, 0.52] [0.20, 0.52] [0.08, 0.52] [0.08, 0.21] [0, 0.21] [0, 0.21]
[0.30, 0.40] 0.40 [0.22, 0.56] [0.22, 0.40] [0.07, 0.56] [0.07, 0.22] [0, 0.22] [0, 0.22]
[0.09, 0.31] [0.07, 0.35] [0.08, 0.29] [0.09, 0.31] [0.07, 0.35] [0.08, 0.29] [0.31, 0.72] 0.35 [0.29, 0.77] [0.31, 0.72] 0.35 [0.29, 0.41] ; [0.31, 0.86] [0.35, 0.89] [0.29, 0.41] [0.72, 0.86] [0, 0.35] [0.41, 0.85] [0.72, 0.90] 0.35 0.41 [0.72, 0.90] [0, 0.35] [0.41, 1.0]
386
A.P. Rotshtein and H.B. Rakytyanska
[0.32, 0.40] 0.40 [0.35, 0.77] [0.35, 0.72] B μˆ = [0.35, 0.89] [0.72, 0.86] [0.72, 0.90] [0.72, 0.90]
[0.62,0.94] 0.63 [0.21,0.54] [0.21,0.54] [0.10,0.54] 0.37 0.37 0.37
[0.62, 0.76] [0.74, 0.90] [0.29, 0.76] [0.29, 0.54] [0.29, 0.56] [0.41,0.76] 0.41 [0.41,0.76]
[0.16, 0.35] [0.16, 0.35] [0.35, 0.59] [0.35, 0.59] [0.35, 0.89] 0.59 0.59 0.59
[0.62, 0.94] [0.74, 0.90] [0.31, 0.55] [0.31, 0.55] [0.31, 0.55] 0.55 0.55 0.55
[0.30,0.40] 0.40 [0.35,0.77] [0.35,0.41] [0.35,0.89] [0.41,0.85] 0.41 [0.41,0.88]
[0.62,0.90] [0.74,0.85] [0.31,0.75] [0.31,0.64] [0.31,0.64] [0.64,0.75] 0.64 [0.64,0.75]
[0.30,0.40] 0.40 [0.35,0.56] [0.35,0.40] . [0.35,0.89] [0.26,0.35] 0.35 [0.26,0.35]
Table 8 Fuzzy relational matrix (null solution) IF inputs
x1
THEN outputs
x2
x3
x4
y1
y2
y3
y4
e11
e12
e21
e 22
e31
e32
e41
e 42
(0.75, 1.30)
(2.00, 0.63)
(2.35, 0.92)
(65.54, 8.81)
0.21
0.95
0.76
0.16
0.95
0.10
0.90
0.10
(2.50, 0.95)
(2.00, 0.65)
(2.44, 1.15)
(64.90, 9.57)
0.40
0.63
0.93
0.15
0.90
0.12
0.85
0.06
(2.52, 1.04)
(1.00, 0.82)
(2.32, 0.88)
(69.32, 10.23)
0.92
0.20
0.86
0.08
0.31
0.75
0.14
0.82
(2.55, 0.98)
(2.00, 0.72)
(2.36, 0.90)
(95.07, 21.94)
0.90
0.15
0.24
0.59
0.55
0.02
0.64
0.26
(2.51, 1.10)
(1.92, 0.75)
(0.50, 0.90)
(100.48, 0.85 26.14)
0.18
0.12
0.95
0.10
0.90
0.21
0.93
(2.55, 0.96)
(1.00, 0.94)
(2.30, 1.20)
(95.24, 22.46)
0.37
0.76
0.31
0.22
0.88
0.75
0.14
0.80
The complete solution set for the fuzzy relation matrix is presented in Table 9, where the valve sizes x1 ÷ x3 are described by fuzzy terms stenosis (S) and insufficiency (I); pressure x4 is described by fuzzy terms normal (N) and lung hypertension (H). The obtained solution provides the results of diagnosis presented in Table 10 for 57 patients. Heart diseases diagnosis obtained an average accuracy rate of 90% after 10000 iterations of the genetic algorithm (100 min on Intel Core 2 Duo P7350 2.0 GHz). The resulting solution can be linguistically interpreted as the set of the four possible rules bases (see Table 11), which differ in the fuzzy terms describing outputs y1 and y3 in rule 3 with overlapping weights.
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
387
Table 9 Fuzzy relational matrix (complete solution set) IF inputs
THEN outputs
x1 x2 x3 x4 y1
y2 L
y3
y4
L
D
D
L
S I
I
N
[0, 0.4]
[0.94, 1] 0.76
0.16
[0.94, 1] [0, 0.3]
I
I
I
N
0.4
0.63
[0, 0.35] [0.9, 1]
I
S
I
N
[0.4, 1]
[0, 0.54] [0.56, 1] [0, 0.35] [0, 0.55] [0.4, 1]
[0.9, 1]
D [0, 0.3]
L
D
0.9
[0, 0.3]
0.85
[0, 0.3]
[0, 0.31] [0.56, 1] 0.26 ∪ [0, 0.26]
I
I
I
H
[0.9, 1]
[0, 0.37] [0, 0.41] 0.59 ∪ 0.37
I
I
S
H
[0.89, 1]
[0, 0.54] [0, 0.56] [0.89, 1] [0, 0.55] [0.89, 1] [0, 0.31] [0.89, 1]
I
S
I
H
[0.77, 0.9]
0.37 ∪ 0.76 [0, 0.37]
0.55
[0, 0.41] 0.64
[0, 0.59] [0, 0.55] [0.85, 1] 0.75
[0, 0.26] ∪ 0.26
Table 10 Genetic algorithm efficiency characteristics Output
Type
Number
Probability
parameter
of diagnose
of cases
of the correct diagnose
y1
e11 ( e12 )
20 ( 37)
17/20=0.85 (34/37=0.92)
y2
e21 ( e22 )
26 (31)
23/26=0.88 (28/31=0.90)
y3
e31 ( e32 )
28 (29)
25/28=0.89 (27/29=0.93)
y4
e41 ( e42 )
40 (17)
37/40=0.92 (15/ 17=0.88)
Table 11 System of IF-THEN rules Rule
IF inputs
THEN outputs
x1
x2
x3
x4
y1
y2
y3
y4
1
S
I
I
N
D
L
L
L
2
I
I
I
N
D
L
L
L
3
I
S
I
N
L or D
L
L or D
D
4
I
I
I
H
L
D
L
L
5
I
I
S
H
L
D
D
D
6
I
S
I
H
L
L
D
L
7 Prediction of Diseases Evolution The aim is to generate the system of IF-THEN rules for prediction of the number of diseases. We consider information on the incidence of appendicular peritonitis disease according to the data of the Vinnitsa clinic of children’s surgery in 19822009 presented in Table 12.
388
A.P. Rotshtein and H.B. Rakytyanska Table 12 Distribution of the diseases number Four-year cycle
Year
Four-year cycle
1982
1983
1984
1985
1986
1987
1988
1989
Number of diseases
109
143
161
136
161
163
213
220
Year
1990
1991
1992
1993
1994
1995
1996
1997
Number of diseases
162
194
164
196
245
252
240
225
Year
1998
1999
2000
2001
2002
2003
2004
2005
237
258
245
230
Number of diseases
160
185
174
207
Year
2006
2007
2008
2009
Number of diseases
145
189
152
186
Analyzing the disease dynamics in Fig. 3, it is easy to observe the presence of four-year cycles the third position of which is occupied by the leap year. These cycles will be denoted as follows: ... x4i −1}{x1i x2i x3i x4i }{x1i +1... , where i is the number of a four-year cycle, x1i is the number of diseases during two years prior to a leap year, x2i is the number of diseases during one year prior to a leap year, x3i is the number of diseases during a leap year, x4i is the diseases number during the year following the leap year.
Fig. 3 Disease dynamics
The network of relations in Fig. 4 shows that it is possible to predict the situation for the next four years: for the last two years of the i-th cycle and for the first two years of the succeeding (i+1)-th cycle using the data of the first two years of the i-th cycle.
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
389
Fig. 4 A network of relations for prediction
It is necessary to find such knowledge matrices F1 ÷ F 3 , which satisfy the limitations imposed on knowledge base volume and provide the least distance between theoretical and experimental number of diseases: L
L
L −1
L −1
i =1
i =1
i =1
i =1
i i 2 i i 2 i +1 i +1 2 i +1 i +1 2 ( x3 − xˆ 3 ) + ( x4 − xˆ 4 ) + ( x1 − xˆ1 ) + ( x2 − xˆ 2 ) =
min
R1−3 ,Β1−3 ,Ω1−3
,
where x3i , x4i , x1i +1 , x 2i +1 are predicted numbers of diseases depending on parameters Β1− 3 and Ω1− 3 of membership functions and rules weights R1− 3 ; xˆ 3i , xˆ 4i , xˆ1i +1 , xˆ 2i +1 are experimental numbers of diseases; L is the number of fouryear cycles used to extract the model. The total number of fuzzy terms for the sickness rate is limited to five. The total number of combinations of input and output terms is limited to four. The null solutions R10 ÷ R 30 presented in Tables 13-15 together with the parameters of the knowledge matrices F1 ÷ F 3 are obtained using the genetic algorithm. The obtained null solutions allow us to arrange for the genetic search for ˆ ) and μˆ B ( X ˆ ) the solution set of the relations F1 ÷ F 3 . The matrices μˆ1A−3 ( X s 1−3 s formed for each relation F1 − 3 on the basis of the observations during L = 6 fouryear cycles in 1982-2005 take the following form: 0.91 0 0 μˆ1A = 0 0 0
0.20 0 0 0 0 0 0 0.75 0.85 0 0 0.74 0 0 0.99 0.79 0.75 0 0.77 0.84 0 0 0 0.70 0 0 0.74 , μˆ 2A = , μˆ 3A = ; 0 0 0.75 0.67 0.63 0.82 0 0 0 0 0 0.52 0.64 0 0.22 0.99 0 0.91 0 0 0.85 0 0 . 33 0 . 50 0 0 . 78 0 0 0 0 0 0 0 0.96 0.82 0.99 0 0.77 0 0.80 0 0.25 0 0.81 0 0.35 0 0.75 0 0 0.84 0 0 0.80 μˆ1B = , μˆ 2B = , μˆ 3B = . 0.64 0 0.80 0 0.70 0 0 0.87 0 0.27 0 0 0 0.68 0 0.89 0 0.95 0 0 . 50 0 0 . 94 0 . 61 0 0 0 . 98 0
390
A.P. Rotshtein and H.B. Rakytyanska Table 13 Fuzzy relational matrix (null solution) for F1
IF inputs
THEN outputs
x3i
(165.22, 21.15)
(223.64, 21.58)
(169.64, 10.17)
(250.69, 21.92)
x 4i
(138.84, 41.75)
(221.12, 15.82)
(201.04, 8.80)
(235.18, 24.89)
(150.17, 20.81)
0.97
0
0
0
(154.35, 22.68)
(179.89, 28.51)
0.31
0.82
0.75
0.25
(152.63, 21.08)
(191.57, 8.74)
0.17
0
0.69
0
(248.27, 26.92)
(257.64, 9.81)
0
0.57
0
0.87
x1i
x 2i
(103.06, 18.55)
Table 14 Fuzzy relational matrix (null solution) for F 2 IF inputs
THEN outputs
x1i +1
x 4i
(155.08, 12.72)
(240.56, 10.21)
(130.25, 9.86)
0.91
0
(220.11, 6.98)
0.69
0
(209.27, 20.56)
0.77
0.89
Table 15 Fuzzy relational matrix (null solution) for F 3 IF inputs
THEN outputs
x 4i
x1i +1
x2i +1 (162.78, 6.09) (190.20, 7.86)
(135.24, 6.85)
(156.84, 10.07)
0.86
0
0
(222.10, 14.78)
(152.38, 16.54)
0
0.78
0
(203.45, 12.57)
(241.18, 13.26)
0
0
0.94
(256.04, 8.21)
The complete solution sets for the relation matrices F1 ÷ F 3 are presented in Tables 16-18, where the sickness rate is described by fuzzy terms Low (L), lower than Average (lA), Average (A), higher than Average (hA), High (H).
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
391
Table 16 Fuzzy relational matrix (complete solution set) for F1 IF inputs
THEN outputs
x3i
lA
hA
lA
H
x 4i
L
hA
A
hA
lA
[0.91, 1.0]
0
0
0
x1i L
x 2i
lA
lA
0.31 ∪ [0, 0.31]
[0.74, 1.0]
0.75 ∪ [0, 0.75]
0.25
lA
A
[0, 0.31] ∪ 0.31
0
[0.64, 0.75] ∪ 0.75
0
H
H
0
0.57
0
[0.85, 1.0]
Table 17 Fuzzy relational matrix (complete solution set) for F 2 IF inputs
THEN outputs
x 4i
x1i +1
lA
H
[0.75, 1.0]
0
hA
0.77 ∪ [0.67, 0.77]
0
A
[0.50, 0.77] ∪ 0.77
0.89
L
Table 18 Fuzzy relational matrix (complete solution set) for F 3 IF inputs
THEN outputs
x 4i
x1i +1
x2i +1
lA
A
H
L
lA
[0.85, 1.0]
0
0
hA
lA
0
0.78
0
A
H
0
0
[0.91, 1.0]
The obtained solution provides the results of up to 2013 prediction presented in Fig. 5. Since experimental values of the numbers of appendicular peritonitis diseases in 2006-2009 have not been used for fuzzy rules extraction, the proximity of the theoretical and experimental results for these years demonstrates the sufficient quality of the constructed prediction model from the practical viewpoint. A comparison of the results of simulation with the experimental data is presented in Table 19. About 8500 generations were required to grow the complete solution set for fuzzy relations F1 ÷ F 3 (90 min on Intel Core 2 Duo P7350 2.0 GHz).
392
A.P. Rotshtein and H.B. Rakytyanska
Fig. 5 Comparison of the experimental data and the extracted fuzzy model Table 19 Prediction of the number of diseases Four-year cycle Year
1982
1983
Experiment
109
143
Theory Error Year
1990
1991
Four-year cycle 1984
1985
1986
1987
1988
1989
161
136
161
163
213
220
170
140
168
175
220
229
9
4
7
12
7
9
1992
1993
1994
1995
1996
1997
Experiment
162
194
164
196
245
252
240
225
Theory
174
205
170
183
236
244
255
211
Error
12
11
6
13
9
8
15
14
Year
1998
1999
2000
2001
2002
2003
2004
2005
Experiment
160
185
174
207
237
258
245
230
Theory
147
180
147
190
250
238
234
215
Error
13
5
27
17
13
20
11
15
Year
2006
2007
2008
2009
2010
2011
2012
2013
Experiment
145
189
152
186
Theory
172
200
161
200
239
247
252
216
Error
27
11
9
14
The resulting solution for relation F1 can be translated as the set of the two possible rules bases (see Table 20), which differ in the combinations of the fuzzy terms describing outputs x3i and x4i in rule 2 with overlapping weights. To provide more reliable forecast, the resulting solution for relation F 2 can also be translated as the set of the two rules bases (see Table 21), which differ in the fuzzy terms describing output x1i +1 in rule 3 with sufficiently high weights.
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
393
Table 20 System of IF-THEN rules for F1 F1
IF inputs
THEN outputs
Rule
x1i
x2i
x3i
x4i
1
L
lA
lA
L
2
lA
lA
hA
hA
lA
A
or 3
lA
A
lA
A
4
H
H
H
hA
Table 21 System of IF-THEN rules for F 2 and F 3 F2
IF input
THEN output
F3
IF inputs
Rule
x 4i
x1i +1
Rule
x 4i
1
L
lA
1
L
lA
lA
2
hA
lA
2
hA
lA
A
3
A
lA or H
3
A
H
H
x1i +1
THEN output
x2i +1
8 Conclusions This paper proposes a method based on fuzzy relational equations and genetic algorithms to identify MIMO systems. In experimental data analysis rule generation combined with solving fuzzy relational equations is a promising technique to restore and identify the relational matrix together with a rule based explanation. The method proposed is focused on generating accurate and interpretable fuzzy rulebased systems. The results obtained by the application of the genetic algorithm depend on the randomness of the training data initialization, e.g., on the generation of the training intervals during the execution. It may be the case that the model has the highest rule performance only with the special test and training data partition that was used to build and test the model. In the course of the computer experiment the training intervals are generated artificially. For the practical applications the intervals can be derived directly from the problem. Although the work presented here shows good practical results, some future investigations are still needed. While the theoretical foundations of the fuzzy relational equations are well developed, they still call for more efficient and diversified schemes of solution finding. The issue of adaptation of the resulting solution while the samples of experimental data (training intervals) are changing remains unresolved. The genetically guided global optimization should be augmented by more refined gradient-based adaptation mechanisms to provide the invariability of the generated fuzzy rule-based systems. Such an adaptive approach envisages the development of a hybrid genetic and neuro algorithm for solving fuzzy relational
394
A.P. Rotshtein and H.B. Rakytyanska
equations. By our new hybrid approach it will be possible to avoid random effects caused by different partitions of training and test data by detecting a representative set of rules bases.
References [Bourke and Fisher 2000] Bourke, M.M., Fisher, D.G.: Identification algorithms for fuzzy relational matrices. Part 2: Optimizing algorithms. Fuzzy Sets Syst. 109(3), 321–341 (2000) [Branco and Dente 2000] Branco, P.J., Dente, J.A.: A fuzzy relational identification algorithm and its application to predict the behaviour of a motor drive system. Fuzzy Sets Syst. 109(3), 343–354 (2000) [Di Nola et al. 1989] Di Nola, A., Sessa, S., Pedrycz, W., Sancez, E.: Fuzzy relation equations and their applications to knowledge engineering. Kluwer Academic Press, Dordrecht (1989) [Higashi and Klir 1984] Higashi, M., Klir, G.J.: Identification of fuzzy relation systems. IEEE Trans. on Syst. Man Cybern. 14, 349–355 (1984) [Peeva and Kyosev 2004] Peeva, K., Kyosev, Y.: Fuzzy relational calculus theory Applications and software. World Scientific, New York (2004) [Pedrycz 1984] Pedrycz, W.: An identification algorithm in fuzzy relational systems. Fuzzy Sets Syst. 13, 153–167 (1984) [Pedrycz 1988] Pedrycz, W.: Approximate solutions of fuzzy relational equations. Fuzzy Sets Syst. 28(2), 183–202 (1988) [Rotshtein 1998] Rotshtein, A.: Design and tuning of fuzzy rule-based systems for medical diagnosis. In: Teodorescu, N.H., Kandel, A., Gain, L. (eds.) Fuzzy and Neuro-fuzzy Systems in Medicine, pp. 243–289. CRC Press, Boca Raton (1998) [Rotshtein et al. 2006] Rotshtein, A., Posner, M., Rakytyanska, H.: Cause and effect analysis by fuzzy relational equations and a genetic algorithm. Reliab. Eng. Syst. Saf. 91(9), 1095–1101 (2006) [Rotshtein and Rakytyanska 2008] Rotshtein, A., Rakytyanska, H.: Diagnosis problem solving using fuzzy relations. IEEE Trans. Fuzzy Syst. 16(3), 664–675 (2008)
Server-Side Query Language for Protein Structure Similarity Searching B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek Institute of Informatics, Silesian University of Technology, Gliwice, Poland {bozena.malysiak,stanislaw.kozielski,dariusz.mrozek}@polsl.pl
Abstract. Protein structure similarity searching is a complex process, which is usually carried out through comparison of the given protein structure to a set of protein structures from a database. Since existing database management systems do not offer integrated exploration methods for querying protein structures, the structural similarity searching is usually performed by external tools. This often lengthens the processing time and requires additional processing steps, like adaptation of input and output data formats. In the paper, we present our extension to the SQL language, which allows to formulate queries against a database in order to find proteins having secondary structures similar to the structural pattern specified by a user. Presented query language is integrated with the relational database management system and it simplifies the manipulation of biological data.
1 Introduction Proteins are biological molecules that play very important role in all biological reactions in living cells. They are involved in many processes, like: reaction catalysis, energy storage, signal transmission, maintaining of cell structure, immune response, transport of small biomolecules, regulation of cell growth and division. 1.1 Basic Concepts and Definitions Analyzing the general construction of proteins, they are macromolecules built with amino acids (usually more than 100 amino acids, aa), which are linked to each other by peptide bonds forming a kind of linear chain. Formally, in the construction of proteins we can distinguish four description or representation levels: primary structure, secondary structure, tertiary structure and quaternary structure. Primary structure is defined by amino acid sequence in protein linear chain. There are 20 standard amino acids found in most living organisms. Examples of amino acid sequences of myoglobin and hemoglobin molecules are presented in Fig. 1. Each letter in a sequence corresponds to one amino acid in the protein chain.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 395–415. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
396
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
>1MBN:A|PDBID|CHAIN|SEQUENCE VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALG AILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKY KELGYQG >4HHB:A|PDBID|CHAIN|SEQUENCE VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHV DDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Fig. 1 Protein primary structures (amino acid sequences) in FASTA format for two protein molecules: myoglobin (1MBN) and hemoglobin (4HHB, chain A)
Secondary structure describes spatial arrangement of amino acids located closely in the sequence. This description level distinguishes in the spatial structure some characteristic, regularly folded substructures. Examples of the secondary structures are α-helices, β-sheets and loops. Spiral shapes of α-helices are visible in tertiary structures presented in Fig. 2. Tertiary structure (Fig. 2a, Fig. 2b) refers to spatial relationships and mutual arrangement of amino acids located closely and distantly in the protein sequence. Tertiary structure describes the configuration of a protein structure caused by additional, internal forces, like: hydrogen bonds, disulfide bridges, attractions between positive and negative charges, and hydrophobic and hydrophilic forces. This description level characterizes the biologically active spatial conformation of proteins. Quaternary structure refers to proteins made up of more than one amino amid chain. This level describes the arrangement of subunits and the type of their contact, which can be covalent or not covalent (Fig. 2c). The last three representation levels define protein conformation or protein spatial structure, which is determined by location of atoms in the 3D space. The biochemical analysis of proteins is usually carried on one of the description levels and depends on the purpose of the analysis.
a)
b)
c)
Fig. 2 Protein spatial structures represented by secondary structure elements: a) tertiary structure of myoglobin (1MBN), b) tertiary structure of hemoglobin (4HHB, chain A), c) quaternary structure of hemoglobin (4HHB, all chains)
Server-Side Query Language for Protein Structure Similarity Searching
397
1.2 Scope of the Paper In the paper, we concentrate on secondary structures, which are valuable source of information regarding the construction of protein molecules. Secondary structures provide more information about the spatial construction of proteins than primary structures. On the other hand, they are so straightforward that allow the analysis of proteins at a general level, which is often used in protein similarity search tools. This organization level of protein structure allows studying a general shape of proteins and formation of amino acid chain caused by local hydrogen interactions [Allen 2008]. Visualizing protein spatial structures by secondary structure elements, as it is presented in Fig. 2, allows to reveal and discover what types of characteristic spatial elements are present in the protein conformation (whether there are only α-helices or only β-strands in the structure, or maybe both) and what is their arrangement (whether they are heavily segregated or appear alternately). Secondary structure representation of proteins became very important in the analysis of protein constructions and functions. Therefore, it is frequently used in the protein structure similarity searching performed by various algorithms, like these presented in [Shapiro et al. 2004, Can and Wang 2003, Yang 2008]. If we compare amino acid sequences of myoglobin and hemoglobin in Fig. 1, we can conclude they are not very similar. However, if we compare their tertiary structures in Fig. 2a and Fig. 2b represented by secondary structure elements, we can see they are structurally similar. Since we know these two molecules have similar function, which is oxygen transport, we can confirm the thesis that structural similarity often implies the functional similarity. Moreover, functionally similar molecules do not have to posses similar amino acid sequences. This simple example shows how important secondary structures are. 1.3 Goal of Our Work For scientists that study the structure and function of proteins, it is very important to have the ability to search for structures similar to the construction of a given structure. This is usually hindered by several factors: • Data describing protein structures are managed by database management systems (DBMSs), which work excellent in commercial uses. However, they are not dedicated for storing and processing biological data. They do not provide the native support for processing biological data with the use of the SQL language, which is a fundamental, declarative way of data manipulation in most database systems. • Processing must be performed by external tools and applications, which is a big disadvantage. • Results are returned in different formats, like: table-form data sets, TXT or XML files, and users must adopt them. • Secondary processing of the data is difficult and requires additional tools.
398
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
For these reasons, we have decided to develop an effective, dedicated language for querying protein structures on the secondary structure level. Protein Secondary Structure – Structured Query Language (PSS-SQL) that we have designed and developed supports searching the database against proteins having their structure similar to the structure specified in a user’s query. Moreover, the PSS-SQL extends standard syntax of the SQL language and becomes a declarative method of protein similarity searching, which is integrated with the database server. With the use of the PSS-SQL users are able to retrieve data from bio-databases in a standardized manner by formulating appropriate PSS-SQL queries and receive results in a uniform, tabular form (Fig. 3).
User application User web site
PSSPSS-SQL queries
TableTable-format results
PSS-SQL extension
BIO database
User tool Fig. 3 Exploring bio-databases using PSS-SQL language
2 Effective Storage of Secondary Structures in Database Retrieving protein structures by formulating queries in PSS-SQL requires specific storage format for data describing protein secondary structures. In our solution, we store protein structures as sequences of secondary structure elements (SSE). Each SSE corresponds to one amino acid in the primary structure. In Fig. 4 we show the amino acid sequence of the 4'-phosphopantetheinyl transferase acpT in the Salmonella typhimurium and the corresponding sequence of SSEs. Particular elements have the following meaning: H denotes α-helix, E denotes β-strand, C (or L) stands for loop, turn or coil. Such a representation of protein structure is very simple in terms of storing the structure in a database and allows effective processing and searching. Data describing types and location of SSEs in the protein structure may come from different sources – they can be extracted directly from the Protein Data Bank [Berman et al. 2000], taken from systems that classify protein structures, like SCOP [Murzin et al. 1995] or CATH [Orengo et al. 1997], or generated using programs that predict secondary structures on the basis of primary structures.
Server-Side Query Language for Protein Structure Similarity Searching
399
Q8ZLE2 ACPT_SALTY 4'-phosphopantetheinyl transferase acpT OS=Salmonella typhimurium GN=acpT PE=3 SV=1 MYQVVLGKVSTLSAGQLPDALIAQAPQGVRRASWLAGRVLLSRALSPLPEMVYGEQGKPAFSAGAPLWFNLSHSGDTIALLLS DEGEVGCDIEVIRPRDNWRSLANAVFSLGEHAEMEAERPEQQLAAFWRIWTRKEAIVKQRGGSAWQIVSVDSTLPSALSVSQC QLDTLSLAVCTPTPFTLTPQTITKAL CCCEEEECEEECCCCCCCCCEEECCCCCCHHHHHHHHHHHHHHHCCCCCCEEEECCCCCCCCCCCCCEEEEECCCCEEEEEEC CCCCCEEEEEEECCCCCHHHHHHHHHCCCHHHHHHHHCCCHHHHHHHHHHHHHHHHHHHCCCCCEEEEEECCCCCCCCCCCCC CCCEEEEEEECCCCCCCCCCCCCCCC
Fig. 4 Sample amino acid sequence of the protein 4'-phosphopantetheinyl transferase acpT in the Salmonella typhimurium with the corresponding sequence of secondary structure elements
Nevertheless, they should be represented in the common format as a sequence of H, E, C/L symbols. In our research, we store sequences of SSEs in the ProteinTbl table of the Proteins database. The schema of the table is presented in Fig. 5. This table will be used in all examples presented in following chapters. id ---799 800 808 809 810
protID -----------ABCX_GUITH 1A02_GORGO 1A110_ARATH 1A111_ARATH ABCX_PORPU
protAC -----O78474 P30376 Q9LQ10 Q9S9U6 P51241
name ---------------Probable ATP-... Class I histo... Probable amin... 1-aminocyclop... Probable ATP-...
length -----253 365 557 460 251
primary ---------------MKKKILEVTNLHA... MAVMAPRTLLLLL... MTRTEPNRSRSSN... MLSSKVVGDSHGQ... MSDYILEIKDLHA...
secondary ---------------CCCCEEECCCHHH... CCCCHHHHHHHHH... CCCCCCCCCCCCC... CCCEEEECCCCCC... CCCHHHHHHHHHH...
Fig. 5 Schema of the table storing protein sequences of SSEs
Detailed description of particular fields of the ProteinTbl is presented in Table 1. Table 1 Description of particular fields of the ProteinTbl database table Field
Description
id
internal identifier of protein in a database
protAC
protein Accession Number
protID
protein IDentification in the popular SwissProt database
name
protein name and description
length
protein length in amino acids
primary
primary structure of a protein (amino acid sequence)
secondary
sequence of secondary structure elements of a protein
3 Alignment Method for Sequences of Secondary Structure Elements In our approach, we assume that the similarity searching is performed by a pairwise comparison of the query sequence of SSEs to a candidate sequence from database. The PSS-SQL language makes use of the alignment procedure in order to
400
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
match two sequences of SSEs. In the section, we describe how the alignment method works. Suppose we have two proteins A and B, one of which represents the given pattern and the other a candidate protein from the database. We represent primary structures of proteins A and B in the following form: P A = p1A p2A ... pnA and P B = p1B p2B ... pmB , where: n is a length of the protein A (in amino acids), m is a
length of the protein B, pi ∈ P , and P is a set of 20 common types of amino acids. We represent secondary structures of proteins A and B in the following form: S A = s1A s2A ...snA and S B = s1B s2B ...smB , where: si ∈ S is a single secondary structure
element (SSE), which corresponds to the i-th amino acid pi, S = {H , E , C , ?} is a set of 3 types of the secondary structures: H denotes α-helix, E denotes β-strand, C stands for loop, turn or coil, the ? symbol corresponds to any of the mentioned SSEs. The alignment is carried out using the Smith-Waterman method [Smith and Waterman 1981]. The method was originally intended to align two input nucleotide sequences of DNA/RNA and amino acid sequences of proteins. However, we modified the Smith-Waterman method to align two sequences of SSEs – one of the sequences is the query sequence, and the second one is a candidate sequence from a database. Moreover, the modified version of the Smith-Waterman method returns more than one optimal solution, by reason of the approximate character of the specified query pattern. In PSS-SQL queries, the pattern is represented by a sequence of segments, where each segment can be defined precisely or by an interval (details concerning the definition of query patterns are described in chapter 4.2). For example, in the pattern h(4),e(2;5), c(2;4) we can distinguish an α-helix containing exactly 4 elements, followed by β-strand of the length 2 to 5 elements, and loop of the length between 2 and 4 elements. During the alignment phase the pattern is expanded to the full possible length, e.g. for the given pattern, it takes the following form HHHHEEEEECCCC. In this form it may take part in comparison to a candidate SSEs sequences from the database. In the alignment process we build the similarity matrix D according to the following rules – for 0 ≤ i ≤ n and 0 ≤ j ≤ m :
Di , 0 = D0, j = 0 ,
(1)
Di(,1j) = Di −1, j −1 + δ ( siA , s Bj ) ,
(2)
= max{Di −k , j − ωk } ,
(3)
( 2) i, j
D
1≤ k ≤n
Di(,3j) = max{Di , j −l − ωl } ,
(4)
Di(,4j) = 0 ,
(5)
Di , j = max{Di(,vj) } ,
(6)
1≤l ≤ m
v =1..4
Server-Side Query Language for Protein Structure Similarity Searching
where: δ ( siA , s Bj ) is an award
δ + , if two SSEs from proteins A and B match to
each other, or a penalty for a mismatch
1 if δ (s , s ) = − 1 if A i
ωk
B j
401
δ − , if they do not match:
siA = s Bj , siA ≠ s Bj
(7)
is a penalty for a gap of the length k:
ω k = ωO + k × ω E ,
(8)
where: ωO = 3 is a penalty for opening a gap, ω E = 0.5 is a penalty for a gap extension. In Fig. 6 we show the scoring matrix for particular pairs of SSEs. This scoring system, with such values of gap penalties, promotes longer alignments, without gaps. Although occurrence of gaps is still possible in the run of algorithm, we assume users can determine places of possible gaps by specifying optional segments in a query pattern.
Fig. 6 Scoring system for compared pairs of secondary structure elements
Filled similarity matrix D consists of many possible paths how two sequences of SSEs can be aligned. In the set of possible paths the modified Smith-Waterman method finds and joins these paths that give the best alignment. Backtracking from the highest scoring matrix cell and going along until a cell with score zero is encountered gives the highest scoring alignment path (Fig. 7). However, in the modified version of the alignment method that we have developed, we find many possible alignments by searching consecutive maxima in the similarity matrix D. This is necessary, since the pattern is usually not defined precisely, contains ranges of SSEs or undefined elements. Therefore, there can be many regions in a protein structure that fit the pattern. In the process of finding alternative alignment paths, the modified Smith-Waterman method follows the value of the internal parameter MPE (Minimum Path End), which defines the stop criterion. We find alignment paths until the next maximum in the similarity matrix D is lower than the value of the MPE parameter. The value of the MPE depends on the specified pattern, according to the following formula.
MPE = ( MPL × δ + ) + ( NoIS × δ − ) ,
(9)
where: MPL is a minimum pattern length, NoIS is a number of imprecise segments, i.e. segments, for which minimum length is different than maximum
402
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
length. E.g. for the structural pattern h(10;20),e(1;10),c(5),e(5;20) containing αhelix of the length 10 to 20 elements, β-strand of the length 1 to 10 elements, loop of the length 5 elements, and β-strand of the length 5 to 20 elements, the MPL=21 (10 elements of the type h, 1 element of the type e, 5 elements of the type c, and 5 elements of the type e), the NoIS=3 (first, second, and fourth segment), and therefore, MPE=18.
Fig. 7 Similarity matrix D showing one of possible alignment paths
The Score similarity measure is calculated for each of possible alignment paths and it totals all similarity awards
ωk
δ + , mismatch penalties δ −
and gap penalties
according to the following formula:
Score = δ + + δ − − ωk .
(10)
Server-Side Query Language for Protein Structure Similarity Searching
403
4 Protein Secondary Structure – Structured Query Language Protein Secondary Structure – Structured Query Language (PSS-SQL) extends the standard syntax of the SQL language providing additional functions that allow to search protein similarities on secondary structures. We disclose two important functions to this purpose: containSequence and sequencePosition, which will be presented in this chapter. However, PSS-SQL covers also a series of supplementary procedures and functions, which are used implicitly, e.g. for extracting segments of particular types of SSEs, building additional segment tables, indexing SSEs sequences, processing these sequences, aligning the target structures from a database to the pattern, validating patterns, and many other operations. The PSSSQL extension was developed in the C# programming language. All procedures were gathered in the form of the ProteinLibrary DLL file and registered for the Microsoft SQL Server 2005/2008 (Fig. 8). DBMS Microsoft SQL Server PSS-SQL Extension
Protein database
Users applications ProteinLibrary DLL
Fig. 8 General architecture of the system with the PSS-SQL extension
4.1 Indexing Sequences of SSEs PSS-SQL benefits from additional indexing structures that should be set on columns storing sequences of SSEs. These indexing structures are not required, but strongly recommended as they accelerate the searching. Calling appropriate procedure usp_indexSSE causes the creation of additional segment table, which is stored in the structure of B-Tree clustered index. EXEC dbo.usp_indexSS @columnName = 'secondary', @indexName = 'TIDX_Secondary';
The @ColumnName parameter indicates which column contains the indexed sequence of SSEs. Execution of a procedure creates a segment table, which name is specified in the @indexName parameter. The segment table (Fig. 9) contains extracted information regarding consecutive segments of particular types of SSEs (type), their lengths (length) and positions (startPos). The information accelerates the process of similarity searching through the preliminary filtering of protein structures that are not similar to the query pattern. In the filtering, we extract the most characteristic features of the query pattern and, on the basis of the information in the index, we eliminate proteins that do not meet the similarity criteria. In the next phase, proteins that pass the preselection are aligned to the query pattern.
404
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek id ----67 68 69 70 71 72
protID -----3 3 3 3 3 3
type ---C H C H C E
startPos -------0 3 26 34 46 49
length -----3 23 8 12 3 3
Fig. 9 Part of the segment table
Metadata describing the relationship between particular segment table and the column describing secondary structures are stored in the MappingTbl table, which structure is presented in Fig. 10. id columnName indexName maxLength ----- ------------ --------------- --------3 secondary TIDX_Secondary 290
Fig. 10 Part of the segment table
4.2 Representation of Structural Pattern in PSS-SQL Queries While searching protein similarities on secondary structures, we need to pass the query structure (query pattern) as a parameter of the search process. Similarly to the storage format, in PSS-SQL queries the pattern is represented as a sequence of SSEs. However, the form of the sequence is slightly different. During the development of the PSS-SQL functionality we assumed the new extensions should allow users to formulate a large number of various query types with different degrees of complexity. Moreover, the form of these extensions should be as simple as possible and should not cause any syntax difficulties. Therefore, we have defined the corresponding grammar in order to help constructing the query pattern. In PSS-SQL queries, the sequence of SSEs is represented by blocks of segments. Each segment is determined by its type and length. The segment length can be represented precisely or as an interval. It is possible to define segments, for which the type is not important or undefined (wildcard symbol ‘?’), and for which the end value of the interval is not defined (wildcard symbol ‘*’). The grammar for defining patterns written in the Chomsky notation has the following form. The grammar is formally defined as the ordered quad-tuple
: Gpss = , where the symbols respectively mean: Npss – a finite set of nonterminal symbols, Σpss – a finite set of terminal symbols, Ppss – a finite set P of production rules, Spss – a distinguished symbol S ∈ Npss that is the start symbol.
Server-Side Query Language for Protein Structure Similarity Searching
405
Σpss = {c, h, e, ?, *, N+} Npss = { <sequence>, , <segment>, , , <end>, , <whole_number_greater_than_zero_and_zero>, } Ppss = { <sequence> ::= ::= <segment> | <segment>, <segment> ::= (; <end>) | () ::= <whole_number_greater_than_zero_or_zero> <end> ::= <whole_number_greater_than_zero_or_zero> | ::= <whole_number_greater_than_zero_or_zero> ::= c | h | e | ? <whole_number_greater_than_zero_or_zero> ::= N+ | 0 ::= * } Spss = <sequence>
Assumption: <= <end> The following terms are compliant with the defined grammar Gpss: • h(1;10) – representing α-helix of the length 1 to 10 elements • e(2;5),h(10;*),c(1;20) – representing β-strand of the length 2 to 5 elements, followed by α-helix of the length at least 10 elements, and loop of the length 1 to 20 elements • e(10;15),?(5;20),h(35) – representing β-strand of the length 10 to 15 elements, followed by any element of the length 5 to 20, and α-helix of the exact length 35 elements With such a representation of the query pattern, we can start the search process using the containSequence and sequencePosition functions described in next sections. 4.3 Cheking for Presence of Query Pattern in Protein Structures The containSequence function allows to check if a particular protein or set of proteins from a database contain the structural pattern specified as a sequence of SSEs. This function returns Boolean value 1, if the protein from a database contains specified pattern, or 0, if the protein does not include the particular pattern. The header of the containSequence function is as follows: FUNCTION containSequence ( @proteinId int, @columnSSeq text, @pattern varchar(4000) ) RETURNS bit
Input arguments of the containSequence function are described in Table 2. Table 2 Arguments of the containSequence function Argument
Description
@proteinId
unique identifier of protein in the table that contains sequences of SSEs (e.g. the id field in case of the ProteinTbl)
@columnSSeq
database field containing sequences of SSEs of proteins (e.g. secondary)
@pattern
pattern that defines the query SSEs sequence represented by a set of segments, e.g. h(2;10), c(1;5),?(2;*)
406
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
The containSequence function can be used both in SELECT and WHERE phrases of the SQL SELECT statement. In Fig. 11 we schematically show how to construct PSS-SQL query with the containSequence function. id protAC protID ----- ------- -----------3294 P01903 2DRA_HUMAN 3295 Q30631 2DRA_MACMU 3296 P11887 2ENR_CLOTY 1.3297 DB table storing Q01284 2NPD_NEUCR
name --------------HLA class II... HLA class II... 2-enoate red... 2-nitropropa...
length -----254 254 30 378
primary --------------MAISGVPVLGFF... MAESGVPVLGFF... MKNKSLFEVIKI... MHFPGHSSKKEE...
secondary structures
secondary --------------CCCCCEEECCCE... CCCCCEEECCCE... CCCCCCEEEEEC... CCCCCCCCCHHH...
2. Processed field
from DB table SELECT id, protID, protAC, name FROM ProteinTbl WHERE containSequence(id,'secondary','c(10;20),e(7;20),c(1;20)') AND name like '%Arabidopsis thaliana%' 3. Pattern -- Results: id protID ---- -----------175 A494_ARATH 443 AAH_ARATH 522 AASS_ARATH 553 AAT1_ARATH 560 AAT2_ARATH ...
protAC -------P43295 O49434 Q9SMZ4 P46643 P46645
4. Additional filtering criteria
name ------------------------------------Probable cysteine proteinase A494 OS= Allantoate deiminase, chloroplastic O Alpha-aminoadipic semialdehyde syntha Aspartate aminotransferase, mitochond Aspartate aminotransferase, cytoplasm
5. Results
Fig. 11 Construction of PSS-SQL queries with containSequence function
Using the function in the SELECT statement allows to display information, whether the protein or set of proteins contain a specified pattern. Below, we present an example of using the containSequence function in order to verify, whether the structure of the Q9FHY1 protein has the structural region containing β-strand of the length 7 to 20 elements, surrounded by two loops, one of the length 10 to 20 elements, and second of the length of 1 to 20 elements – pattern c(10;20),e(7;20), c(1;20). SELECT id, protID, protAC, name, containSequence(id,'secondary','c(10;20),e(7;20),c(1;20)') AS containSeq FROM ProteinTbl WHERE protAC='Q9FHY1'
Results of the verification are shown in Fig. 12. id protID protAC name containSeq ---- ------------ -------- ------------------------------------- ---------964 ABIL4_ARATH Q9FHY1 Protein ABIL4 OS=Arabidopsis thaliana 0
Fig. 12 Result of the verification for the protein Q9FHY1
The following query shows an example of using the containSequence function in order to display, whether proteins from the Arabidopsis thaliana species contain the given pattern (containSeq=1) or not (containSeq=0). Structural pattern is the same as in previous example.
Server-Side Query Language for Protein Structure Similarity Searching
407
SELECT id, protID, protAC, name, containSequence(id,'secondary','c(10;20),e(7;20),c(1;20)') AS containSeq FROM ProteinTbl WHERE name like '%Arabidopsis thaliana%'
Results of the search process are shown in Fig. 13. id ---175 244 443 522 553 560 ...
protID -----------A494_ARATH A9_ARATH AAH_ARATH AASS_ARATH AAT1_ARATH AAT2_ARATH
protAC -------P43295 Q00762 O49434 Q9SMZ4 P46643 P46645
name ------------------------------------Probable cysteine proteinase A494 OS= Tapetum-specific protein A9 OS=Arabid Allantoate deiminase, chloroplastic O Alpha-aminoadipic semialdehyde syntha Aspartate aminotransferase, mitochond Aspartate aminotransferase, cytoplasm
containSeq ---------1 0 1 1 1 1
Fig. 13 Partial result of the search process for proteins from the Arabidopsis thaliana species
Using the containSequence function in the WHERE clause allows to find proteins that contain the specified pattern. Below is an example of using the function for searching proteins from the Escherichia coli that contain the pattern h(5;15),c(3),?(6),c(1;*). SELECT id, protID, protAC, name, primary, secondary FROM ProteinTbl WHERE containSequence(id,'secondary','h(5;15),c(3),?(6),c(1;*)')=1 and name like '%Escherichia coli%'
Results of the searching process are shown in Fig. 14. id ---1294 1295 1296 1297 1298 1299 1300 1301
protID -----------ACCA_ECO24 ACCA_ECO57 ACCA_ECOHS ACCA_ECOK1 ACCA_ECOL5 ACCA_ECOL6 ACCA_ECOLI ACCA_ECOUT
protAC -------A7ZHS5 P0ABD6 A7ZWD1 A1A7M9 Q0TLE8 Q8FL03 P0ABD5 Q1RG04
name ----------------------Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca...
primary ----------------------MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID...
secondary --------------------------CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHHHH... CCCCCCCCHHHHHHHHHHHHHHHH...
Fig. 14 Partial result of searching process for proteins from the Escherichia coli having given structural pattern h(5;15),c(3),?(6),c(1;*)
4.4 Locating Patterns in Protein Structures The sequencePosition function allows to locate the specified pattern in the structure of a protein or group of proteins in a database. Pattern searching is performed with the use of segment table and through alignment of protein secondary structures. For this purpose, we have adapted the Smith-Waterman alignment method.
408
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
The header of the sequencePosition function is as follows: FUNCTION sequencePosition ( @columnSSeq text, @pattern varchar(4000), @predicate varchar(4000) ) RETURNS @resultTable table ( proteinId int, startPos int, endPos int, length int, gapsCount int, sequence text )
Input arguments of the sequencePosition function are described in Table 3. Table 3 Arguments of the sequencePosition function Argument
Description
@columnSSeq
database field that contains sequences of SSEs, e.g. secondary
@pattern
pattern that defines the query SSEs sequence represented by a set of segments, e.g.: h(2;10), c(1;5),?(2;*)
@predicate
an optional, simple or complex criteria that allow to limit the list of proteins that will be processed during the search, e.g.: name LIKE '%phosphogluconolactonase%'
The sequnecePosition function returns a table containing information about the location of query pattern in the structure of each database protein. Fields of the output table is described in Table 4. Table 4 Output table of the sequencePosition function Field
Description
proteinId
unique identifier of protein that contains specified pattern; using the identifier we can join resultant table with data from other tables
startPos
position, where the pattern starts in the target protein from a database
endPos
position, where the pattern ends in the target protein from a database
length
length of the segment that matches to the given pattern
sequence
sequence of SSEs, which matches to the pattern defined in the query
The sequencePosition function is used in the FROM clause of the SELECT statement. The resultant table is treated as one of source tables used in query execution. In Fig. 15 we schematically show how to construct PSS-SQL query with the sequencePosition function.
Server-Side Query Language for Protein Structure Similarity Searching SELECT p.id, p.name, s.startPos, s.endPos, s.sequence as [matched sequence], p.secondary FROM ProteinTbl p JOIN sequencePosition('secondary', 'h(5;20),c(0;*),e(1;*),c(0;*),e(1;*)','') AS s ON p.id = s.proteinId WHERE p.name LIKE '%PE=4%' AND p.length > 100
3. Additional filtering criteria id ---3298 3298 3298 3918 3918 3918 ...
name -----------------2-nitropropane ... 2-nitropropane ... 2-nitropropane ... Acetoin utiliza... Acetoin utiliza... Acetoin utiliza...
startPos -------330 244 110 115 60 149
409
1. Processed field from DB table
2. Pattern
5. Matched sequence endPos -----350 309 148 146 86 187
matched sequence -------------------------------------------------hhhhhccccccccceeeeee hhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccc... hhhhhhhhhhccccccccccccceeecccccceeeeee hhhhhhhhhhhhhhcccccccccceeeeeee hhhhhhhhhhhcccccccccceeeee hhhhhhhhhhhheeeeeeeeeeecccccceeeeeeeee
secondary ---------------------CCCHHHHHHHEEEEECCCC... CCCHHHHHHHEEEEECCCC... CCCHHHHHHHEEEEECCCC... CCCHHHHHHEEEEEECCCH... CCCHHHHHHEEEEEECCCH... CCCHHHHHHEEEEEECCCH...
4. Results
Fig. 15 Construction of PSS-SQL queries with sequencePosition function
Below, we show an example of using the function to locate pattern that contains a β-strand of the length from 1 to 10 elements, optional loop up to 5 elements, α-helix of the length at least 5 elements, optional loop up to 5 elements and β-strand of any length – pattern e(1;10),c(0;5),h(5;*), c(0;5),e(1;*). The pattern is searched only in proteins with the length exceeding 150 amino acids, which secondary structure was predicted (predicate PE=4). SELECT p.protAC AS AC, p.name, s.startPos AS start, s.endPos AS end, sequence AS [matched sequence], p.secondary sequencePosition('secondary', FROM ProteinTbl AS p JOIN 'e(1;10),c(0;5),h(5;*),c(0;5),e(1;*)','') AS s ON p.id = s.proteinId WHERE p.name LIKE '%PE=4%' AND p.length > 150
The query produces results as shown in Fig. 16. It should be noted that there may be many ways how the pattern can be aligned to the protein structure from a database. The modified Smith-Waterman method returns a number of possible alignments based on a value of the internal MPE parameter. As a result, in the table shown in Fig. 16 the same protein may appear several times with different alignment parameters. AC -------P75747 P75747 P75747 P75747 P75747 P75747 Q54GC8 P32104 P32104 P32104 ...
name -----------------Protein abrB OS... Protein abrB OS... Protein abrB OS... Protein abrB OS... Protein abrB OS... Protein abrB OS... Acyl-CoA-bindin... Transcriptional... Transcriptional... Transcriptional...
start ----72 222 136 172 4 22 172 185 120 98
end ---107 245 158 202 32 43 197 212 144 123
matched sequence -----------------------------------eeeeeeeeehhhhhhhhhhhhhhhhhheeeeeeee eeeeehhhhhhhhhhhhhhheee eeeeehhhhhhhhhhhcceeee eeeeccccchhhhhhhhhhhhhccceeeee eeeeehhhhhhhhhhhheeeeeeeeeee eeeeeeeeeecchhhhheeee eeeeeccchhhhhhhhhcccceeee eeeecccchhhhhhhhhhhheeeeeee eeeccccchhhhhhhccccceeee eeeecccchhhhhhhhhhhhhheee
secondary ------------------------CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCHHHHHHHHHHHHHHHHCCC... CCCCCHHHHHHHHHHHHHHHHH... CCCCCHHHHHHHHHHHHHHHHH... CCCCCHHHHHHHHHHHHHHHHH...
Fig. 16 Partial result of the search process for the given structural pattern e(1;10),c(0;5),h(5;*),c(0;5),e(1;*)
410
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
Predicates that filter the set of rows can be defined in the WHERE clause of the SELECT statement or can be passed as the @predicate argument of the sequencePosition function. However, regarding the query performance, it is better to pass them directly as the @predicate argument, when we call the function. This small extension forces the query processor to filter the set of proteins before creating the resultant table and before executing the Smith-Waterman method. Therefore, we do not waste time for time-consuming alignments that are not necessary in some cases. Sample query with filtering criteria specified in the function call, is shown below. SELECT p.protAC AS AC, p.name, s.startPos AS start, s.endPos AS end, sequence AS [matched sequence], p.secondary sequencePosition('secondary', FROM ProteinTbl AS p JOIN 'e(1;10),c(0;5),h(5;*),c(0;5),e(1;*)', ' p.name LIKE ''%PE=4%'' AND p.length > 150') AS s ON p.id = s.proteinId
5 Effectiveness and Efficiency of PSS-SQL Queries We have performed a set of tests in order to verify the effectiveness and efficiency of PSS-SQL queries containing different patterns. The effectiveness of PSS-SQL queries was successfully confirmed by testing both functions – containSequence and sequencePosition – in a set of 6 230 protein structures stored in the ProteinTbl table. Tests were performed for more than one hundred different SSE patterns, having different complexity, containing various numbers of segments, described precisely and rough, including SSEs of different types – defined explicitly or using wildcards. We also made a set of tests in order to examine the efficiency of PSS-SQL queries. These tests were performed on the PC computer with the processor Intel® 3.2 GHz Core Duo and 2GB of memory, working under the Microsoft Windows XP operating system. Similarly to effectiveness tests, efficiency was tested against the Proteins database containing data describing 6 230 primary and secondary structures of proteins, as well as some additional information. Primary structures and description of proteins were downloaded from the popular SwissProt database [Apweiler et al. 2004]. Secondary structures were predicted on the basis of primary structures with the use of the Predator program [Frishman and Argos 1996]. The execution time of PSS-SQL queries calling the sequencePosition function, which localizes patterns in protein structures, takes from single seconds up to several minutes. It depends on the pattern specified in the query. In Fig. 17 we show execution times for queries containing sample patterns:
Server-Side Query Language for Protein Structure Similarity Searching
• • • • •
411
SSE1: h(38),c(3;10),e(25;30),c(3;10),h(1;10),c(1;5),e(5;10) SSE2: e(4;20),c(3;10),e(4;20),c(3;10),e(15),c(3;10),e(1;10) SSE3: h(30;40),c(1;5),?(50;60),c(5;10),h(29),c(1;5),h(20;25) SSE4: h(10;20),c(1;10),h(243),c(1;10),h(5;10),c(1;10),h(10;15) SSE5: e(1;10),c(1;5),e(27),h(1;10),e(1;10),c(1;10),e(5;20)
The SSE1 pattern represents protein structures with the alternating α-helices and β-strands joined by loops. The SSE2 pattern represents protein structure built only with β-strands connected by loops. The SSE3 pattern consists of undefined segment of SSEs (? - wildcard). Patterns SSE4 and SSE5 have one unique region – h(243) and e(27), respectively. We have observed, the execution time tightly depends on the uniqueness of the pattern. The more unique the pattern, the more proteins are filtered out based on the segment table, the fewer proteins are aligned by the Smith-Waterman method and the less time we need to obtain results. We can see it clearly in Fig. 17 for patterns SSE4 and SSE5, having precisely defined, unique regions h(243) and e(27). For universal patterns, for which we can find many fitting proteins or multiple alignments, we can observe longer execution times of PSS-SQL queries. In such cases, the length of the pattern influences the alignment time – for longer patterns we experience longer response times. We have not observed any dependency between the type of the SSE and the response time. However, specifying wildcards in the pattern increases the waiting period (sometimes up to several minutes). 120
100
SSE1
SSE1 SSE2 SSE3 SSE4 SSE5
SSE3 SSE2
time [s]
80
60
40
20
SSE5 SSE4
0 pattern
Fig. 17 Execution times of PSS-SQL queries containing different patterns SSE1-SSE5
This is typical for standard SQL queries in database systems, where execution times are highly dependent on the selectivity of the queries and the number of data in a database. In Fig. 18 we present histograms of segment lengths for particular secondary structure elements (H, E, C). These histograms show the lengths of segments which occur most and least frequently in protein structures in our Protein database.
412
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
Number of regions
7000
Histogram for a-Helix SSE
6000 5000 4000 3000 2000 1000
Length of segment 210
152
113
89
100
83
73
68
63
58
54
50
46
41
37
33
29
25
21
17
9
a)
13
5
1
0
25000 Number of regions
Histogram for b-Strand SSE 20000 15000 10000 5000 Length of segment 0 1
b)
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 22 23 25 26 27 28 30
25000 Number of regions
Histogram for Loop SSE 20000 15000 10000 5000 Length of segment 193
175
157
131
122
114
108
96
90
85
78
72
67
61
55
49
43
36
31
26
21
16
11
101
c)
6
1
0
Fig. 18 Histograms of segment lengths for particular secondary structure elements (H, E, C)
Additional filtering criteria, which are commonly used in SQL queries, also decrease the execution time. In case of the containSequence function, additional filtering criteria can be specified only in the WHERE clause. In case of the sequencePosition function, they can be placed in the WHERE clause or passed as the @predicate parameter of the function. However, passing the criteria as parameters is better for the performance of PSS-SQL queries. The reason of this is the fact that filtering criteria in the WHERE clause are set on the resultant table of the sequencePosition function after it is constructed and populated. On the other hand, criteria passed as the @predicate parameter are set before the construction of the resultant table. In Fig. 19 we present execution times for PSS-SQL queries using the sequencePosition function searching the structural pattern SSE1: h(38), c(3;10),e(25;30),c(3;10),h(1;10),c(1;5),e(5;10), with additional filtering predicates defined as the @predicate parameter of the function (BUILT-IN) and in the WHERE clause: • • • •
predicate P1: p.name like ''%Homo sapiens%'' predicate P2: p.name like ''%Homo sapiens%PE=1%'' predicate P3: p.name like ''%Homo sapiens%PE=1%SV=4%'' predicate P4: p.primary like ''%NHSAAYRVDQGVLN%''
Server-Side Query Language for Protein Structure Similarity Searching
413
Additional predicate P1 causes the pattern to be compared only to proteins that act in Homo sapiens organisms. In the predicate P2 we added the condition that the candidate proteins must have the Protein existence attribute set to Evidence at protein level (PE=1). In predicate P3 we provided additional filter for the sequence version SV=4. Finally, predicate P4 sets a simple filter on the primary structure of proteins (amino acid sequence). Analyzing the execution times of queries with additional predicates in Fig. 19 (BUILT-IN) and comparing them to the execution time of the query containing SSE1 pattern in Fig. 17, we can notice that appropriately formulated filtering criteria significantly increase the performance of the search process and reduce the search time from several minutes even to several seconds (P3 and P4). It is also worth noting that for the analyzed pattern SSE1 we benefit from specifying additional filtering criteria as a parameter of the sequencePosition function. The WHERE clause is not so efficient in this case. 120 P1
100
P2
P3
P4
time [s]
80 P1 60
P2
40
20 P3
P4
0 BUILT-IN
predicate
WHERE
Fig. 19 Execution times of PSS-SQL queries containing only the SSE1 pattern and various filtering predicates P1-P4 passed as a parameter (BUILT-IN) or in the WHERE clause
6 Concluding Remarks PSS-SQL language provides ready to use and easy search mechanisms that allow searching protein similarities on the secondary structure level. The syntax of the PSS-SQL is transparent to users and flexible in possibilities of defining query patters. The pattern defined in a query does not have to be specified strictly: • Segments in the pattern can be specified as intervals and they can have undefined lengths (users can use the wildcard ‘*’ symbol). • The PSS-SQL allows to specify patterns with undefined types of the SSE (using the SSE type wildcard ‘?’ symbol) or patterns, where some SSE segments may occur optionally. Therefore, the search process has an approximate character, regarding various possible options for segments matching. • The possibility to define patterns that include optional segments, allows users to specify gaps in a particular place. For programmers and scientists involved in the data processing it is not surprising that integrating methods of protein similarity searching with a database
414
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
management system makes it easy to manipulate biological data without the need for external data mining applications. The SQL extension presented in this paper is an example of such integration. There are many advantages of the proposed extension: • Entire logic of data processing is removed from the user application and moved towards the database server. The advanced analysis of biological data is then performed while retrieving data from a database with the use of PSS-SQL queries. Therefore, the number of data returned to the user and network traffic between the server and the user application are much reduced. • Users familiar with the SQL syntax will easily manage to formulate PSS-SQL queries. We have designed a simple and understandable SQL extension, and in consequence, a very clear language for protein structures. This gives an advantage of the PSS-SQL language over other known solutions. However, there are many implicit operations that hide behind this simplicity and transparency, such as the alignment using the modified Smith-Waterman method, which belongs to the class of dynamic programming algorithms. • As a result of PSS-SQL queries, users obtain pre-processed data. These data can then be used in further processing, e.g. users can treat results as strictly selected proteins, which meet specified criteria regarding the construction, and will be analyzed in more details. In our research, we use the presented extension in the similarity searching of protein tertiary structures. In the process, PSS-SQL queries allow us to roughly preselect proteins on the basis of their secondary structures.Future works will cover further development of the PSS-SQL language. Especially, we plan to focus on improving the efficiency of PSS-SQL queries through the use of intelligent heuristics.
Acknowledgment Scientific research supported by the Ministry of Science and Higher Education, Poland in years 2008-2010, Grant No. N N516 265835: Protein Structure Similarity Searching in Distributed Multi Agent System.
References [Allen 2008] Allen, J.P.: Biophysical chemistry. Wiley-Blackwell, London (2008) [Apweiler et al. 2004] Apweiler, R., Bairoch, A., Wu, C.H., et al.: Uniprot: the Universal Protein knowledgebase. Nucleic Acids Res. (Database issue) D115–119 (2004) [Berman et al. 2000] Berman, H.M., Westbrook, J., Feng, Z., et al.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000) [Can and Wang 2003] Can, T., Wang, Y.F.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: Proc. 2003 IEEE Bioinf. Conf., Stanford, CA, pp. 169–179 (2003)
Server-Side Query Language for Protein Structure Similarity Searching
415
[Frishman and Argos 1996] Frishman, D., Argos, P.: Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 9(2), 133–142 (1996) [Murzin et al. 1995] Murzin, A.G., Brenner, S.E., Hubbard, T., et al.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995) [Orengo et al. 1997] Orengo, C.A., Michie, A.D., Jones, S., et al.: CATH – A hierarchic classification of protein domain structures. Structure 5(8), 1093–1108 (1997) [Shapiro et al. 2004] Shapiro, J., Brutlag, D.: FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web. Nucleic Acids Res. 32, 536–541 (2004) [Smith and Waterman 1981] Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981) [Yang 2008] Yang, J.: Comprehensive description of protein structures using protein folding shape code. Proteins 71(3), 1497–1518 (2008)
A New Kinds of Rules for Approximate Reasoning Modeling M. Pałasiński, B. Fryc, and Z. Machnicka University of Information Technology and Management, Rzeszów, Poland {mpalasinski,bfryc,zmachnicka}@wsiz.rzeszow.pl
Abstract. In this paper we prove some properties of new kind of rules – called generalized rules.
1 Introduction The notion of information and the decision systems are basic notions of the theory of rough sets [Pawlak 1991]. This theory is an effective methodology to extract rules from information and the decision tables. A lot of different algorithms for rule generations give different sets of rules. The problem is that we get to many rules. In this paper we introduce the new kind of rules – generalized rules and some of their properties. We also show how to use this new kind of rules for modeling of approximate reasoning.
2 Basic Definitions 2.1 Information and Decision Systems An information system is a pair S = (U , A) , where U is a nonempty finite set of objects called universe, A is a nonempty finite set of attributes such that a : U → Va for a ∈ A , where Va is called a value set of a . The set V = ∪ Va is a∈A
said to be the domain of A . A decision system is any information system of the form S = (U , A ∪ {d }) , where d ∉ A is a distinguished attribute called a decision. Elements of A are called conditional attributes (conditions). Any decision system S = (U , A ∪ {d }) can be represented by a data table with the number of rows equal to the cardinality of the universe U and the number of columns equal to the cardinality of the set A ∪ {d } . The value a (u ) appears on the position corresponding to the row u and column a .
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 417–428. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
418
M. Pałasiński, B. Fryc, and Z. Machnicka
Let S = (U , A' ) be an information system, where A' = A ∪ D , and V ' are the domain of A' . Pairs (a, v) , where a ∈ A' , v ∈V ' are called descriptors over A' and V ' (or over S , in short). Instead of (a, v) we also write a = v . The set of terms over A' and V ' is the least set of descriptors (over A' and V ' ) and closed with respect to the classical propositional connectives such that: NOT (negation), OR (disjunction), AND (conjunction), i.e., if τ , τ ' are terms over A' and V ' , then (NOT τ ), ( τ AND τ ' ), ( τ OR τ ' ) are terms over A' and V ' . The meaning τ S (or τ , in short) of a term τ in S is defined inductively as follows: if τ
is of the form a = v then
τ = {u ∈ U : a(u ) = v},
τ OR τ ' = τ ∪ τ ' ,
τ AND τ ' = τ ∩ τ ' , NOT τ = U − τ . 2.2 Example 1
Let us consider an example of a decision system S = (U , A ∪ D) representing some data about 6 patients as shown in Table 1. Table 1 Decision system S U/A∪{Flu}
Headache
Muscle-pain
Temperature
Flu
u1
Yes
Yes
Normal
No
u2
Yes
Yes
High
Yes
u3
Yes
Yes
Very-high
Yes
u4
No
Yes
Normal
No
u5
No
No
High
No
u6
No
Yes
Very-high
Yes
In the decision system S , the universe U = {u1 , u 2 , …, u 6 } where each object represents one patient. Each object is described by three conditional attributes: Headache, Muscle-pain and Temperature. The decision attribute is denoted by Flu. The sets of possible values of attributes from S are equal as follows: VHeadache = VMuscle-pain = VFlu = {Yes, No}, and VTemperature = {Normal, High, Very-high}. For simplicity of the presentation, let us consider an encoded decision system S1 from Table 2. In this system, attributes H, M, T, F correspond to attributes Headache, Muscle−pain, Temperature, Flu from the system S , respectively. The values 1, 2, 3 of attributes H, M, T, F correspond to values Yes, No, Normal, High, Very-high of attributes Headache, Muscle-pain, Temperature, Flu, respectively.
A New Kinds of Rules for Approximate Reasoning Modeling
419
Table 2 Coded decision system S U/A∪ {Flu}
H
M
T
F
u1
1
1
1
2
u2
1
1
2
1
u3
1
1
3
1
u4
2
1
1
2
u5
2
2
2
2
u6
2
1
3
1
3 Two Valued Information and Decision Systems Here we present a procedure of converting any information system S = (U , A) into uniquely determined information system S " = (U , A" ) satisfying the condition for each attribute a"∈ A". In the system S " set of Va is a two elements set. Starting from the information system S = (U , A) we define the set of attributes the following ways: A" = ∪ ({a} × Va ). Now, if v ∈ Va , we write a∈ A
A" in
av instead
(a, v). We put (av )(ui ) = 1 if and only if a (ui ) = v. Below we show a procedure for generating two valued information system S " , on the basis of system S . Procedure : Input: An information system S = (U , A), where set V = ∪ Va . a∈A
Output: Two value information system S " = (U , A"). Begin Create an empty set of the attributes A". Create an empty set of V ". Copy the set of object U to the new set U ". For each vk ∈ V create a new attribute a" = vk . Add an attribute a" to A". For each vk ∈ V For each a"∈ A". For each v(a )
if u (a ) = v then for u (a" ) = 1 else u (a" ) = 0 End A two value decision system is any two value information system of the form S " = (U , A"∪ D" ), where D"∉ A" is a set of distinguished attributes called decisions.
420
M. Pałasiński, B. Fryc, and Z. Machnicka
3.1 Example 2
Let us consider a decision system S = (U , A ∪ D ), presented in Table 2 in which: • • • •
set of objects U = {u1 , u 2 , u3 , u 4 , u5 , u5 }, set of conditional attributes A = {H, M, T}, set of decision attributes D = {F}, sets of values of attributes: VH = VM={1, 2}, VT= { 1, 2, 3}, VF= { 1, 2}.
The system from Table 2 gives us the two valued system S " = (U , A"∪ D" ), presented in Table 3 in which: • set of all attributes A"∪ D" = {H 1 , H 2 , M 1 , M 2 , T1 , T2 , T3 , F1 , F2 }, • sets of values of all attributes are equal to {0, 1}. Table 3 Two valued decision system U/A’
H1
H2
M1
M2
T1
T2
T3
F1
F2
u1
1
0
1
0
1
0
0
0
1
u2
1
0
1
0
0
1
0
1
0
u3
1
0
1
0
0
0
1
1
0
u4
0
1
1
0
1
0
0
0
1
u5
0
1
0
1
0
1
0
0
1
u6
0
1
1
0
0
0
1
1
0
4 Rules in Decision Systems 4.1 Rules in Standard Decision Systems
Rules express some of the relations between values of attributes in decision system. This subsection contains the definition of different kinds of rules considered in the paper as well as other related concepts. Let S = (U , A' ) be a decision system, where A' = A ∪ D. Let V ' be the domain of A'. Any expression r of the form IF φ THEN ψ, where φ and ψ are terms over A' and V ' , is called a rule in S . φ is referred to as predecessor of r and denoted by Pred (r). ψ is referred to as the successor of r and denoted by Succ (r). We extract two types of rules from a decision table using the rough set methods. First type of rules called decision rules, represents the relations between the values of conditional attributes and the decision. The second type of rules called conditional rules represent relations between the values of conditional attributes. We additionally assume that each of the two types of rules mentioned above can be deterministic or non-deterministic. Some numerical factor called certainty
A New Kinds of Rules for Approximate Reasoning Modeling
421
factor associated with a given rule determine a kind of rule: deterministic or nondeterministic. Let S = (U , A' ) be a decision system, where A' = A ∪ D. and IF φ THEN ψ, be a rule in S . The number: card ( φ ∩ ψ ) card ( φ )
(1)
is called the certainty factor (CF) of the given rule. It is easy to see that CF ∈ [0,1]. If CF = 1 then we will say that a given rule is deterministic. Otherwise (i.e. CF < 1 ), we will say that a given rule is non-deterministic. In order to generate the set of deterministic rules we can use the standard rough set methods proposed among others by [Skowron 1993]. 4.2 Example 3
Let us consider an encoded decision system S from Example 1. We can compute the following rules for this system: Deterministic decision rules: R1: IF T = 1THEN F = 2, CF=1.0; R2: IF T = 2 AND H = 1 THEN F = 1, CF=1.0; R3: IF T = 2 AND M = 1 THEN F = 1, CF=1.0; R4:. IF T = 3 THEN F = 1, CF=1.0; R5: IF M = 2 THEN F = 2, CF=1.0; R6: IF T = 2 AND H = 2 THEN F = 2, CF=1.0. 4.3 Rules in Two Valued Decision Systems
To describe the relation between rules for S and the associated with two valued system S " we need to describe the translation of the rule corresponding to S into the rule of S ". Let S = (U , A), where A = (a1 , … , a n ) and the set of values of the attribute ai is (0, …, k i −1 ), for i = 1, … , n. In the two valued system S " associated with S we have attributes (a10 , …, a1ki−1 , …, ank n−1 ) each one assuming two values 0 or 1. For any rule r for S we define the corresponding rule r" for S " , replacing each expression of the form ai = j by aij = 1 and in case the rule is an inhibitory one expression ai ≠ j by aij = 0. It is not difficult to see that: Lemma 1. For any rule r , if r is true and realizable in S then r" is true and realizable in S ".
422
M. Pałasiński, B. Fryc, and Z. Machnicka
Translation the other way goes as follows. For any rule r" for S " , we replace an expression aij = 1 by ai = j and each expression aij = 0 is replaced by ai ≠ j. The rule obtained this way we denote by r. Lemma 2. For any rule r" , if then r" is true and realizable in S " , then r is true and realizable in S . Theorem. For all two valued decision systems S1 = (U1 , A) and S 2 = (U 2 , A) and the sets Rul1 and Rul2 of valid and realizable rules for S1 and S 2 respectively, Rul1 ≠ ∅ and Rul 2 ≠ ∅, S1 = S 2 if and only if Rul1 = Rul 2 . Two valued decision system associated with the information system S allows one to consider additional types of rules (for S ) of the form ϕ → ψ , where ϕ is a conjunction of a finite number of formulas aij = bij and ϕ is a formula of the form aij = bij or ¬(aij = bij ). In this paper we consider so called generalized rules defined as follows: Definition 1. By the generalized rule we mean any implication of the form ϕ → ψ , where ϕ is a conjunction of formulas of two types: 1. aij = bij 2. ¬(aij = bij ). And ψ is a formula of the form aij = bij or ¬(aij = bij ).
Remark. If we assume that the set of values is finite then any subformula of ϕ (under notation as above) containing as a factor all formulas of the form ¬(aij = bij ) for fixed i, can be replaced by disjunction of formulas of the form ai = bik , where bik does not appear in any factor ¬(ai = bij ). Using remark we will use in the requel the equivalent definition of the generalized formula. Definition2. By the generalized rule we mean any implication of the form ϕ → ψ , where ϕ is a conjunction of formulas of two types: aij = bij disjunction of formulas of the form ai = bik moreover each attribute a j can appear only one in factor of type 1 or type 2 and
ψ is a formula of the form aij = bij or ¬(aij = bij ). The definition of true generalized rule in u, where u is an object of the system S and the definition of the true generalized rule in the system S are exactly the same as in case of “ordinary” rule, as both are implications.
A New Kinds of Rules for Approximate Reasoning Modeling
423
Definition3. We say that a generalized rule r , r : ϕ → ψ is a minimal generalized rule of the system S if and only if r is true in S and removing any factor of ϕ or any summand of any disjunction appearing in ϕ the resulting rule is not true in S . Now for any generalized rule: r : ϕ1 ∧ … ∧ ϕ k −1 ∧ ((ak = vk1 ) ∨ … ∨ (a k = vk p )) ∧ ϕ k +1 ∧ … ∧ ϕ n → ψ , for p > 1 we define for j = 1, …, p : rk1 : ϕ1 ∧ … ∧ ϕ k −1 ∧ (ak = vk1 ) ∧ ϕ k +1 ∧ … ∧ ϕ n → ψ , rk : ϕ1 ∧ … ∧ ϕ k − 1 ∧ ak = vk ∧ ϕ k + 1 ∧ … ∧ ϕ n → ψ , j j rk : ϕ1 ∧ … ∧ ϕ k − 1 ∧ ak = vk p p
∧ϕ k + 1 ∧ … ∧ ϕn → ψ
Let us note the following: Fact 1. If r is any generalized rule true in S , then for j = 1,… , p, rk j is true in S . Proof by contradiction. Let us suppose that for some t , 1 ≤ t ≤ p, rkt is not true in S . It means that there is an object in S at which rkt is not true, i.e. the predecessor of rkt is true in u and the successor of rkt is not true in u. But then the predecessor of r is true in u and the successor of r is not true in u. It means that r is not true in u , a contradiction. Fact 2. If each of generalized rules rk1 , … , rk p is true in S , then r is true in S . Proof by contradiction. If r is not true in S , then there is u, an object from S in which r is not true i.e. predecessor of r is true in u and the successor of r is not true in u. So all formulas: ϕ1, … , ϕ k − 1, (ak = vk ) ∨ … ∨ (ak = vk ), ϕ k + 1, …ϕ n are true in u and 1 p ψ is not true in u. The former implies that at least one of formulas a k = vk1 , … , ak = vk p is true in u. Thus we get that at least one of rules rk1 , … , rk p is not true in u , so it is not true in S . As a conclusion we get: Corollary. A generalized rule r is true in S if and only if for all j = 1, … , p, rk j is true in S . Let us state the obvious.
424
M. Pałasiński, B. Fryc, and Z. Machnicka
Fact 3. If for all j = 1, … , p, j = 1, … , p, is realizable and true in S , then the generalized rule r is realizable and true in S . Fact 4. If r is a generalized minimal rule in S , then for all j = 1, … , p, ri j is realizable and true in S . Proof. Here we prove only realizability, as the second part is obvious. Let us suppose to the contrary, that there is j , 1 ≤ j ≤ p such that ri j is not realizable in S . Take U := {u ∈ S : u |= ϕ1 ∧ … ∧ ϕ k −1 ∧ ϕ k +1 ∧ … ∧ ϕ n }. Each u ∈ U has k-th coordinate element different from vij . So, in S the rule r ' is true, where r ' is a rule obtained from r by removing the rummand
ak = vij from
ϕ k . This means that r is not minimal. The last fact concerns the rules of any two information system S " annunciated with the given system S . In such a system we do not consider generalized rules however as the notion of the generalized rule appeared while dealing with a system S and associated two information system S " , it is worth to present. Let us end with the fact partly describing rules in S ". Fact 5. If a rule: r : (ai1 = q1 ) ∧ … ∧ (ain = qn ) → (ai j = q), where q, q1 ,… , qn ∈ {0, 1} is a minimal and realizable rule in S " , then if in the predecessor of r we have a jk = 1, for some k , 1 ≤ k ≤ p, then there is no formula a jk = 0, for t ≠ k . Proof by contradiction. Let us suppose that we have a rule r , r : … (aik = 1) ∧ … ∧ (ait = 0) ∧ … → ψ , k ≠ t , which is minimal and realizable in S ". Removing a jt = 0 from the prede-
cessor of r gives us a rule r ' which is not true in S " (it follows from minimality of r ). It means that there is an u object of S " in which the predecessor of r ' is true and the successor of r ' is not true. Adding back to the predecessor of r ' , factor a jt = 0 we get the rule true in S " which means that the predecessor of r has logical value 0 at u (i.e. is not true in S ). It means that u ≠ (a jt = 0) i.e. u ≠ (a jt = 1). But then u ≠ (a jk = 1) what is not possible.
A New Kinds of Rules for Approximate Reasoning Modeling
425
4.4 Example 4
Let us consider two value decision system S " from Example 2. The set of rules Rul ' computed for this system as follows: R1: IF M 1 = 1 ∧ T1 = 1 THEN F2 = 1 R2: IF H 1 = 1 ∧ M 1 = 1 ∧ (T2 = 1 ∨ T3 = 1) THEN F2 = 1 R3: IF M 1 = 1 ∧ T3 = 1 THEN F1 = 1 R4: IF H 2 = 1 ∧ (T1 = 1 ∨ T2 = 1) THEN F2 = 1. Each rule from Rul" can be transformed into rule valid and realizable in S . For example: R1: IF ¬( M = 2) ∧ ¬(T = 2) ∧ ¬(T = 3) THEN F = 2, is obtained from rule 1, R2: IF ¬( H = 2) ∧ ¬( M = 2) ∧ ¬(T = 1) THEN F = 2, is obtained from rule 2, R3: IF ¬( M = 2) ∧ ¬(T = 1) ∧ ¬(T = 2) THEN F = 1, is obtained from rule 3, R4: IF ¬( H = 1) ∧ ¬(T = 3) THEN F = 2, is obtained from rule 4.
5 Approximate Petri Nets In this section, approximate Petri nets are recalled. The formal definition of APnets and their dynamic properties are presented in [Fryc et al. 2004]. 5.1 General Description
AP-nets are high-level nets. The structure in AP-nets is a directed graph with two kinds of nodes: places (drawn as ellipses) and transitions (drawn as rectangles), interconnected by arcs - in such a way that each arc connects two different kinds of nodes (i.e., a place and a transition). The places and their tokens represent states, while the transitions represent state changes. The data value which is attached to a given token is referred to as the token colour. The declaration of the net tells us about colour sets and variables. Each place has a colour set attached to it and this means that each token residing on that place must have a colour which is a member of the colour set. Each net inscription is attached to the place, transition or arc. Places have four different kinds of inscriptions: names, sets of colour, initialization expressions and current markings. Transitions have four kinds of inscriptions: names, guards, threshold and certainty value, while arcs only have one kind of inscription: arc expressions. The initialization expression of a place must evaluate to a fuzzy set over the corresponding colour set. The guard of a transition is a Boolean expression which must be fulfilled before the transition can occur. The arc expression (as the guard) may contain variables, constants, functions and operations that are defined in the declarations. When the variables of an arc expression are bounded, then the arc expression must evaluate to a fuzzy set over the colour that belongs to the colour set attached to the place of the arc. A distribution of tokens (on the places) is
426
M. Pałasiński, B. Fryc, and Z. Machnicka
called marking and denotes M. The initial marking M0 is the marking determined by evaluating the initialization expressions. A pair, where the first element is a transition and the second element is a binding of that transition, is called an occurrence element. If an occurrence element is enabled in a given marking then we can talk about the next marking which is reached by the occurrence of the occurrence element in the given marking. The formal definition of AP-nets is given bellow. Approximate Petri net is a touple: APN = (Σ, P, T , A, N in , N out , C , G, Ein , Eout , I , f )
(2)
satisfying the following requirements: • • • • •
Σ is a nonempty, finite set of types which are called colour sets, P is a finite set of places, T is a finite set of transitions, A is a finite set of arcs, N in : A → ( P × T ) is a input node function,
• N out : A → (T × P ) is a output node function, • C : P → Σ is a colour function, • G is a guard function, • Ein is a input arc expression function, • Eout is a output arc expression function, • I is an initialization function, • f : T → [0,1] is a certainty factor function. In the next example we show how to use a new kind of rules for creating an approximate Petri net. 5.2 Example 5
Let us consider the set of rules for the system S” computed in Example 4. For this set of rules we can create an AP-nets as a graphical model of approximate reasoning (see Fig. 1). In the AP-net from Figure 1, places p H , p M , pT represent the conditional attributes H , M , T from S , respectively. However, the place p F represents the decision F . The colour sets (types) are as follows: H = {H 1 , H 2 }, M = {M 1 , M 2 }, T = {T1 , T2 , T3 }, F = {F1 , F2 }. The transitions t1 , … , t 4 represent the rules R1 , … , R4 , respectively. For example, transition
A New Kinds of Rules for Approximate Reasoning Modeling
427
t 4 represents the decision rule: IF ¬( H = 1) ∧ ¬(T = 3) THEN F = 2. The input arc expressions are the following: e41 = {x H }, e42 = {xT }, where x H , x M are variables of types H and T , respectively. The output arc expression has the form e43 = CF ⋅ max( μ ( x H ), μ ( xT )) / y F , where y F is a variable of the type F . The guard
expression
for
t4
is
the
following:
g 3 = [¬( x M = M 1 ) ∧
¬( xT = T3 ) ∧ ( x F = F2 )]. Moreover, CF3 = 1 (because the rule is deterministic). Analogously we can describe other transitions and arcs.
Fig. 1 Approximate Petri net corresponding to decision system S”
6 Conclusions In the paper we consider the new type of rules. This kind of rules we use for approximate reasoning modeling. It can also be used for reasoning in these nets, for modeling of concurrent systems [Pancerz 2008], in the classification systems and so on. In the next paper we will compare these rules with another kind of rules, to determined characteristics of these rules.
References [Delimata et al. 2008] Delimata, P., Moshkov, M., Skowron, A., Suraj, Z.: Inhibitory rules in data analysis a rough set approach. Springer, Heidelberg (2008) [Fryc et al. 2004] Fryc, B., Pancerz, K., Suraj, Z.: Approximate petri nets for rule-based decision making. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 733–742. Springer, Heidelberg (2004) [Fryc et al. 2010] Fryc, B., Machnicka, Z., Pałasiński, M.: Remarks on two valued information systems. In: Pardela, T., Wilamowski, B. (eds.) Proc. the 3rd International Conference on Human System Interaction (HSI 2010), Rzeszow, Poland, pp. 775–778 (2010)
428
M. Pałasiński, B. Fryc, and Z. Machnicka
[Pancerz 2008] Pancerz, K., Suraj, Z.: Rough sets for discovering concurrent system models from data tables. In: Hassanien, A.E., Suraj, Z., Ślęzak, D., Lingras, P. (eds.) Rough Computing, Theories, Technologies and Applications, Information Science Reference, Herrshey, pp. 239–268 (2008) [Pawlak 1991] Pawlak, Z.: Rough sets - theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht (1991) [Skowron 1993] Skowron, A.: Boolean reasoning for decision rules generation. In: Komorowski, J., Raś, Z.W. (eds.) Methodologies for Intelligent Systems, pp. 295–305. Springer, Heidelberg (1993)
Technical Evaluation of Boolean Recommenders S. Chojnacki and M.A. Kłopotek Institute of Computer Science, Polish Academy of Sciences [email protected]
Abstract. The purpose of this paper is to describe a new methodology dedicated to the analysis of boolean recommenders. The aim of most recommender system is to suggest interesting items to a given user. The most common criteria utilized to evaluate a system are its statistical correctness and completeness. The two can be measured by accuracy and recall indices. In this paper we argue that technical performance is an important step in the process of recommender system’s evaluation. We focus on four real-life characteristics i.e. time required to build a model, memory consumption of the built model, expected latency of creating a recommendation for a random user and finally the time required to retrain the model with new ratings. We adapt a recently developed evaluation technique, which is based on an iterative generation of bipartite graphs. In this paper we concentrate on a case when preferences are boolean, which is opposite to value-based ratings.
1 Introduction Recommender systems are an important component of the Intelligent Web. The systems make information retrieval easier and push users from typing queries towards clicking at suggested links. We experience real-life recommender systems when browsing for books, movies, news or music. The engines are an essential part of such websites as Amazon, MovieLens or Last.fm. Recommender systems are used to deal with the tasks that are typical for statistical classification methods. They fit especially the scenarios in which the number of attributes, classes or missing values is large. Classic data-mining techniques like logistic regression or decision trees are well suited to predict which category of news is the most interesting for a particular customer. Recommender systems are used to output more fine-grained results and point at concrete stories. In recent years we have observed a surge of interest of research community in recommender systems. One of the events that was responsible for this phenomenon was the Netflix Prize challenge. The competition was organized by a large DVD retailer in US. The prize of 1 million dollars was awarded to the team that managed to improve RMSE (root mean standard error) of the retailer's Cinematch algorithm by more than 10%. The lesson we learned during the Netflix Prize is that the difference between the quality of simple methods and the sophisticated ones is not as significant as we could have expected. Moreover, in order to lower
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 429–441. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
430
S. Chojnacki and M.A. Kłopotek
RMSE an ensemble of complex and computationally intensive methods has to be done. Even though the organizers made much effort to deliver realistic and huge data, the setting did not envision the problems that we need to face in diverse reallife recommender systems applications, such as: • the Cold Start problem, i.e. an arrival of new users with short history • instant creation of new items (e.g. news, auction items or photos) • real-time feedback from users about our performance These drawbacks were overcome during the Online Task of the Discovery Challenge organized as a part of the ECML 2009 (European Conference on Machine Learning). The owners of the www.BibSonomy.org bookmarking portal opened its interfaces to recommender systems taking part in the evaluation. Whenever a user of BibSonomy was bookmarking a digital resource (a publication or a website) a query was sent to all the systems. The tag recommendation of a random one was displayed to the user. After the action a feedback with user's actions was sent to all systems. The systems could have been maintained during the challenge, because they were configured as web services. The results showed that all of the teams found it difficult to deliver majority of its recommendations within time constraint of 1 000 milliseconds. Our research was motivated by the above result and an observation that the development of recommender systems is limited by a fact that there are not enough possibilities to test the algorithms with various datasets. The data structure used by recommender systems is a sparse user - item matrix with ratings. It is a hard exercise to generate randomly such matrices. We have challenged the problem recently [Chojnacki and Kłopotek 2010a]. We proposed to look at the matrix with ratings as if it was a bipartite graph with nodes of both modalities representing users and items respectively. A rating from the matrix is mapped onto an edge in the bigraph. We proposed an algorithm in which we can control not only simple statistics like numbers of users, items or rankings, but also obtain skewed distributions and correlations among users or items. Moreover, our random bigraph generator's asymptotic properties were verified by virtue of formal and numerical tools and we can add user or items to the graph without loosing the properties of the original datasets. In this paper we apply the generator to produce several random bigraphs with various properties and evaluate how the properties impinge on the performance of analyzed recommender systems. We analyze four features of the systems that in our opinion are responsible for the success of an algorithm in a real-life setting: (1) time required to build a model from a scratch, (2) memory consumption of the trained model, (3) latency of creating a recommendation and (4) time of updating the model with new ratings. We focus our attention on a situation when users’ preferences are boolean. The situation is occurs when we only possess an information whether a user expressed an interest for an item or not. When we have an access to information about the strength of a preference we say that preferences are value based. The magnitude of a preference (or ranking) is often expressed by the number of stars given to an item. It is sometimes advisable to build a boolean model if the quality of rankings
Technical Evaluation of Boolean Recommenders
431
is low. We proposed to utilize random graphs to an analysis of value-based recommenders in [Chojnacki and Kłopotek 2010b]. In this paper we compare performance of recommender systems in both boolean and value-based settings. We considered four algorithms during the tests: UserBased, SlopeOne, KnnItem and SVD. We used high-performance implementations of the algorithms delivered in the Mahout system [Owen et al 2010]. The rest of the article is organized as follows. In Section 2 we describe in detail the differences between value-based and boolean recommenders. In Section 3 we outline the details of applied random bigraph generator. The fourth section contains the results of extensive experiments. The last fifth section is dedicated for the concluding remarks.
2 Value-Based vs Boolean Recommenders Recommender algorithms are generic and can be used with both value-based and boolean preferences. At the abstract level preferences of a particular user are represented by n-dimensional vector, where n is the number of items in a system. The fields of a vector represent values of preferences of the user for the items. Virtually any measure can be utilized to measure the distance between any two users or items. However, when we consider an optimal implementation of such abstract structure the difference between value-based and boolean preferences becomes clear. In case of value-based implementations HashMaps are utilized to store the vectors. In case of boolean implementations it is reasonable to used more memory efficient structure and HashSets are utilized. The selection of a distance measure for value-based scenario is not constrained. However, in boolean scenario only setbased measures are allowed. Pearson, Euclidean and Spearman are examples of measures that can be used only in value-based setting. LogLikelihood, Jaccard or Tanimoto measures can be used in both settings. In our experiments we compare technical performance of three variants of implementation: 1. vectors are represented by HashMaps and the distance is calculated with the Pearson similarity 2. vectors are represented by HashMaps and the distance is calculated by means of the LogLikelihood similarity 3. vectors are represented by HashSets and LogLikelihood similarity.
3 Bipartite Random Graph Generator In this section we describe an algorithm used to generate random bigraphs. The algorithm was introduced and described in detail in [Chojnacki and Kłpotek 2010a]. The generative procedure consists of three steps: (1) new node creation, (2) edge attachment type selection and (3) running bouncing mechanism. The
432
S. Chojnacki and M.A. Kłopotek
steps are run after an initialization of the bigraph. The procedure requires specifying eight parameters. Table 1 The parameters of the random graph generative procedure Parameter
Interpretation
m
the number of initial loose edges with a user and an item at the ends
T
the number of iterations
p
the probability that a new node is a user
1-p
the probability that a new node is an item
u
the number of edges created by each new user
v
the number of edges created by each new item
alpha
the probability that a new user's edge is being connected to an item with the preferential attachment mechanism
1-alpha
the probability that a new user's edge is being connected to an item with the random attachment mechanism
beta
the probability that a new item's edge is being connected to a user with the preferential attachment mechanism
1-beta
the probability that a new item's edge is being connected to a user with the random attachment mechanism
b
the fraction of preferentially attached edges that are created via a bouncing mechanism
In the preferential attachment mechanism the probability that a node is drawn is linearly proportional to its degree. Opposite to the preferential attachment is random attachment, in which a probability of selection is equal for all nodes. The model is based on an iterative repetition of three steps. Step 1 If a random number is greater than p create a new user with u loose edges, otherwise create a new item with v loose edges. Step 2 For each edge decide whether to join it to a node of the second modality randomly or with preferential attachment. The probability of selection preferential attachment is alpha for new user and beta for new item. Step 3 For each edge that is supposed to be created with preferential attachment decide if it should also be generated via a bouncing mechanism. Bouncing is performed in three micro steps: (1) a random node is drawn from the nodes that are already joined with the new node, (2) a random neighbor of the drawn node is chosen, (3) a random neighbor of the neighbor is selected for joining with the new node. The bouncing mechanism was injected into the model in order to parameterize the level of transitivity in a graph. The transitivity is a
Technical Evaluation of Boolean Recommenders
433
feature of real datasets and in terms of recommender systems represent the correlations between items ranked by different users. In unipartite graphs transitivity is measured by the local clustering coefficient, which is calculated for each node as a number of edges among direct neighbors of the node divided by all possible pairs of the neighbors. In bipartite graphs the coefficient is always zero. Hence it is substituted by bipartite local clustering coefficient (BLCC). Bipartite local clustering coefficient of node j takes values of one minus the proportion of node's second neighbors to the potential number of the second neighbors of the node. The steps of the generator are depicted in Fig. 1.
Fig. 1 For each edge of a new node, that is to be connected with an existing node with accordance to the preferential attachment mechanism, a decision is made whether to create it via a bouncing mechanism. In case of attaching new user node, u new edges are created. On average u·alpha edges' endings are to be drawn preferentially and b of them are to be obtained via bouncing from the nodes that are already selected
One can see that after t iterations the bigraph consists of U(t) = m+pt users, I(t) = m+(1-p)t items, and E(t) = m+t(pu+(1-p)v) edges. It can be shown that: • as alpha/beta grows the item/user degree distribution becomes more power-law like than exponential like • as bouncing parameter b grows an average BLCC grows • both alpha and beta impinge on the average number of the second neighbors in the bigraph • the influence of alpha and beta on the above quantity is opposite. The above observations can be used to show that two features of data structures derived from social networks domain have an impact on the technical performance of recommender systems. The features are heavy-tailed node degree distribution and positive clustering. It is worth to mention that formal tools used to analyze the algorithms are based only on numbers of users, items and ratings [Jahrer et al. 2010].
434
S. Chojnacki and M.A. Kłopotek
4 Experiments In order to evaluate the performance of analyzed algorithms we generated 83 artificial bipartite graphs. The statistics describing the graphs are contained in Table 2. In case of HashMap representation each graph's edge was augmented with a random integer from a set of possible rankings {0; 1; 2; 3; 4; 5}. After the last iteration (usually T = 10 000) hundred more edges were created by running 100 steps for each graph with unchanged parameters. This enabled us to preserve asymptotic properties of the graphs within a set of rankings used to batch update of the models. The experiments were run in-memory within separate threads on a 64bit Fedora operating system with the Quatro 2.66GHz Intel(R) Core(TM) i5 CPUs. 4.1 Evaluated Systems We evaluated four recommender algorithms implemented in the Mahout java library. Mahout contains highly efficient open-source implementations of machinelearning algorithms maintained by a vibrant community. It is powering several portals e.g. SpeedDate, Yahoo! Mail, AOL or Mippin. The algorithms are: GenericUserBasedRecommender [Herlocker et al. 1999], SlopeOneRecommender [Lamire and Maclachlan 2005], KnnItemBasedRecommender [Bell and Koren 2007] and SVDRecommender [Zhang et al. 2005]. The algorithms cover wide spectrum of approaches to the problems of Collaborative Filtering. 4.2 Building Models We measured time required to build a model as the number of milliseconds, that are required to load whole bigraph from a text file and train a model, after this period of time the model is ready to create recommendations. We measured memory consumption of built model in megabytes. Times and memory requirements needed to train four considered recommenders are depicted in Fig. 2. There exists strong relationship between time and memory. However, we do not observe major changes in behavior between three analyzed variants: (1) HashMap and Pearson simiality, (2) HashMap and LogLikelihood similarity and (3) HashSet and LogLikelihood similarity. The fact that UserBased and KnnItem models are trained immediately in the second variant rises our concern. This observation shows that random graphs can be used not only to compare various algorithms, but also to identify potential bugs in their implementations. The fact that memory consumption is usually lower in variant three than in the first two variants is consistent with our expectations. There is one surprising result, i.e. the memory consumption of SVD recommender is the highest in variant three.
Technical Evaluation of Boolean Recommenders
435
Fig. 2 Time of building models and memory consumption of built models. Top row contains the first variant with HashMap data structure and Pearson similarity. Middle row contains the second variant with HashMap data structure and LogLikelihood similarity. Bottom row contains the third variant with HashSet data structure and LogLikelihood similarity. Left column contains times of building the models. Right column contains memory requirements
4.3 Creating Recommendations Time required to create a list of top recommended items for a random user is the most important technical criterion in many settings. We measured this latency as an average time in milliseconds required to output five best recommendations for a sample of 500 users. We can see in Fig. 3. that the longer it took to train a model the faster recommendations can be expected. The shortest latency is obtained in the first implementation variant, the longest in the second variant. We do not observe in Fig. 3.
436
S. Chojnacki and M.A. Kłopotek
qualitative changes of behavior as we proceed from variant one to variant two and variant three.
Fig. 3 Expected latency of creating a recommendation differentiated by size of the dataset and proportion of the number of users to the number of items. Top row contains the first variant with HashMap data structure and Pearson similarity. Middle row contains the second variant with HashMap data structure and LogLikelihood similarity. Bottom row contains the third variant with HashSet data structure and LogLikelihood similarity
The only qualitative difference between the three variants that we managed to identify is drawn in Fig. 4. SlopeOne recommender was consistently slower than UserBased in the first variant, but it slows down significantly in the second and the third variant. The results in Fig. 5. suggest that the premises with low and high latency that are visible for various configurations of alpha and beta are preserved among the variants for each algorithm.
Technical Evaluation of Boolean Recommenders
437
Fig. 4 Latency dimensioned by the skewness of node degree distributions. Top row contains the first variant with HashMap data structure and Pearson similarity. Middle row contains the second variant with HashMap data structure and LogLikelihood similarity. Bottom row contains the third variant with HashSet data structure and LogLikelihood similarity
438
S. Chojnacki and M.A. Kłopotek
Fig. 5 Latency of recommender algorithms for various values of alpha and beta. Botton left corner of each figure represents the variant with alpha=beta=0. The upper right corner of each figure represents the variant with alpha=beta=1. The values of alpha and beta change gradually within horizontal and vertical axis. Left column contains the first variant with HashMap data structure and Pearson similarity. Middle column contains the second variant with HashMap data structure and LogLikelihood similarity. Right column contains the third variant with HashSet data structure and LogLikelihood similarity
Technical Evaluation of Boolean Recommenders
439
4.4 Additional Analyses We performed several additional analyses to see what happens as we switch from value-based to boolean data structures and similarity measures. We checked the influence of the density of a graph and the level of clustering. In both cases the behavior of the algorithms in the second and the third variants was qualitatively consistent with the first variant. We also confirmed this fact by evaluating time required to update the models with new users and items.
5 Discussion and Conclusions In the paper we identified two factors that may impinge on the technical performance of recommender systems when we switch from value-based to boolean preferences. The factors are data structure implementation and similarity measure. We proposed three settings to compare value-based and boolean recommenders. In the first two variants datasets are implemented with a HashMap, which is characteristic for value-based models. However, only in the first case the similarity measure utilizes the values of ratings. In the third variant both data structure implementation and the similarity measure are optimal for boolean preferences. Our observations can be summarized in five points: • recommender systems based on a HashSet data structure require less memory than HashMap based implementations • time required to create a recommendation is longer for purely boolean recommenders than for purely value-based, the longest time is needed for mixed imlementations (i.e. the second variant) • random datasets enable us to identify potential bugs in implemented algorithms • the only quantitative difference in the behavior of algorithms was observed for the UserBased model, which slows down faster than e.g. SlopeOne in the boolean similarity variants The second point is in our opinion the most surprising result. It shows that in case of recommender systems the time needed to process smaller amount of information may be longer than the time required to process enriched information. This is because, even though value-based models have access to more information than boolean, they can utilize fast vector similarity measures. In case of boolean recommenders only set based distance between vectors can be calculated.
Acknowledgment This work was partially supported by Polish state budget funds for scientific research within research project Analysis and visualization of structure and dynamics of social networks using nature inspired methods, grant No. N516 443038.
440
S. Chojnacki and M.A. Kłopotek
References [Bell and Koren 2007] Bell, R.M., Koren, Y.: Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In: Proc. of ICDM, pp. 43–52. IEEE Computer Society, Los Alamitos (2007) [Chojnacki and Kłopotek 2010a] Chojnacki, S., Kłopotek, M.A.: Random graph generator for bipartite networks modeling (2011), http://arxiv.org/abs/1010.5943 [Chojnacki and Kłopotek 2010b] Chojnacki, S., Kłopotek, M.A.: Random graphs for performance evaluation of recommender systems (2011), http://arxiv.org/abs/1010.5954 [Herlocker et al. 1999] Herlocker, J.L., Konstan, J.A., Borchers, A., et al.: An algorithmic framework for performing collaborativeltering. In: Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Informarmation Retrieval, pp. 230–237. ACM Press, New York (1999) [Jahrer et al. 2010] Jahrer, M., Toscher, A., Legenstein, R.: Combining predictions for accurate recommender systems. In: KDD, pp. 693–702. ACM Press, New York (2010) [Lemire and Maclachlan 2005] Lemire, D., Maclachlan, A.: Slope one predictors for online rating-based collaborative filtering. In: Proc. of SIAM Data Mining (2005) [Owen et al. 2010] Owen, S., Anil, R., Dunning, T., et al.: Mahout in Action, Manning (2010) [Zhang et al. 2005] Zhang, S., Wang, W., Ford, J., et al.: Using singular value decomposition approximation for collaborative filtering. In: Proc. of the 7th IEEE Conf. on ECommerce, pp. 257–264 (2005)
Technical Evaluation of Boolean Recommenders
441
Appendix Table 2 Synthetic bigraphs used in experiments Parameters of bigraph generator
Lp. 1
Properties of obtained graphs
m
T
p
u
v
alpha
beta
b
users
items
edges
100
10 000
0,9
7
7
0,5
0,5
0
9 086
1 114
70 100
2
100
10 000
0,8
7
7
0,5
0,5
0
8 159
2 041
70 100
3
100
10 000
0,7
7
7
0,5
0,5
0
7 102
3 098
70 100
4
100
10 000
0,6
7
7
0,5
0,5
0
6 185
4 015
70 100
5
100
10 000
0,5
7
7
0,5
0,5
0
5 122
5 078
70 100
6
100
10 000
0,4
7
7
0,5
0,5
0
4 098
6 102
70 100
7
100
10 000
0,3
7
7
0,5
0,5
0
3 120
7 080
70 100
8
100
10 000
0,2
7
7
0,5
0,5
0
2 083
8 117
70 100
9
100
10 000
0,1
7
7
0,5
0,5
0
1 107
9 093
70 100
10
100
1 000
0,9
7
7
0,1
0,1
0
1 008
192
7 100
11
100
2 000
0,9
7
7
0,1
0,1
0
1 888
312
14 100
12
100
3 000
0,9
7
7
0,1
0,1
0
2 788
412
21 100
13
100
4 000
0,9
7
7
0,1
0,1
0
3 699
501
28 100
14
100
5 000
0,9
7
7
0,1
0,1
0
4 568
632
35 100
15
100
6 000
0,9
7
7
0,1
0,1
0
5 511
689
42 100
16
100
7 000
0,9
7
7
0,1
0,1
0
6 419
781
49 100
17
100
8 000
0,9
7
7
0,1
0,1
0
7 306
894
56 100
18
100
9 000
0,9
7
7
0,1
0,1
0
8 178
1 022
63 100
19
100
10 000
0,9
7
7
0,1
0,1
0
9119
1 081
70 100
20
100
25 000
0,9
7
7
0,1
0,1
0
22 576
2 624
175 100
21
100
50 000
0,9
7
7
0,1
0,1
0
45 172
5 028
350 100
22
100
100 000
0,9
7
7
0,1
0,1
0
90 211
9 989
700 100
48
100
10 000
0,5
7
7
1
1
0
5 081
5 119
70 100
49
100
10 000
0,5
7
7
1
0,8
0
5 078
5 122
70 100
50
100
10 000
0,5
7
7
1
0,6
0
5 083
5 117
70 100
…
…
…
…
…
…
…
…
…
…
…
…
83
100
10 000
0,5
7
7
0
0
0
4 985
5 215
70 100
Interval Uncertainty in CPL Models for Computer Aided Prognosis L. Bobrowski Faculty of Computer Science, Białystok Technical University Institute of Biocybernetics and Biomedical Engineering, PAS, Warsaw, Poland [email protected]
Abstract. Multivariate regression models are often used for the purpose of prognosis. Parameters of such models are estimated on the basis of learning sets, where feature vectors (independent variables) are combined with values of response (target) variable. The values of response variable can be determined only with some uncertainty in some important applications. For example, in survival analysis, the values of response variable is often censored and can be represented as intervals. The interval regression approach has been proposed for designing prognostic tools in circumstances of such type of uncertainty. The possibility of using the convex and piecewise linear (CPL) functions in designing linear prognostic models on the basis of interval learning sets is examined in the paper.
1 Introduction Multivariate regression models are widely used in statistics, pattern recognition or data mining context [Johnson and, Wichern 1991; Duda et al. 2001]. The most important applications of regression models are linked to prognosis (prediction) goals. The value of dependent (target) variable should be predicted on the basis of independent variables values. The main role is played here by linear regression models, when the dependent variable is a linear combination of independent variables. Linear regression models can be designed by using different methods depending on the structure of learning data sets. In accordance with the classical last-square approach, the parameters of the linear regression models are estimated on the basis of learning sequence in the form of feature vectors combined with exact values of dependent (target) variable [Johnson and, Wichern 1991]. The exact value of target variable represents additional knowledge about a particular object represented by given feature vectors. The logistic regression is typically used when the target variable is categorical. If the target variable is a binary one, the regression model is based on a linear division of feature vectors into two groups [Duda et al. 2001].
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 443–461. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
444
L. Bobrowski
The ranked regression models are designed on the basis of a set of feature vectors with additional knowledge in the form of ordering relation inside selected pairs of these vectors [Bobrowski 2009]. Linear ranked models can be designed through the minimization of the convex and piecewise linear (CPL) criterion function defined on differences of feature vectors. Special regression methods known as interval regression are developed for the case when values of target values are uncertain and can be represented in the form of intervals [Buckley. and James 1979], [Gomez et al. 2003]. Uncertainty means in this case some missing information about the exact values of target variable. Cox proportional hazards model developed in the context of survival analysis is commonly applied in the case of censored data [Klein and Moeschberger 1997]. The algorithms based on the Expectation Maximization (EM) principle have been developed and used for the purpose of estimation parameters of the interval regression models. Designing linear regression models on the basis of interval learning data by using the convex and piecewise linear (CPL) criterion functions has been proposed and analyzed in the paper [Bobrowski 2010]. This paper describes a different approach to designing interval regression models with using a different CPL criterion functions. This approach is referring to the concept of linear separability of data sets [Duda et al. 2001].
2 Linear Regression Models The pattern recognition terminology is used throughout this paper [2]. We are considering a set of m feature vectors xj[n] = [xj1,…,xjn]T belonging to a given n dimensional feature space F[n] (xj[n] ∈ F[n]). Feature vectors xj[n] represent a family of m objects (events, patients) Oj (j = 1,..., m). Components xji of the vector xj[n] could be treated as the numerical results of n standardized examinations of the given object Oj (xji ∈ {0,1} or xji ∈ R1). Each vector xj[n] can be also treated as a point of the n-dimensional feature space F[n]. Linear regression models have a form of linear (affine) transformations of ndimensional feature vectors x[n] (x[n]∈ F[n]) on the points y of the line (y∈R1): y(x) = w[n]Tx[n] + θ
(1)
where w[n] = [w1,…, wn] ∈ R is the parameters (weight) vector and θ is the threshold (θ ∈ R1). Properties of the model (1) depend on the choice of the parameters w[n] and θ. The weights wi and the threshold θ are usually computed on the basis of the data (learning) sets. In the classical regression analysis the learning sets have the below structure [Johnson and, Wichern 1991]: T
n
Cm′ = {xj[n]; yj} = {xj1,…., xjn,; yj}, where j = 1,….., m
(2)
Interval Uncertainty in CPL Models for Computer Aided Prognosis
445
Each of m objects Oj is characterized in the set Cm′ by values xji of n independent variables (features) xi, and by the observed value yj (yj ∈ R1) of the dependent (target) variable Y. In the case of classical regression, the parameters w[n] and θ] are chosen in such a manner that the sum of the squared differences (yj - yj^)2 between the observed target variable yj and the modeled variable yj^ = w[n]Txj[n] + θ (1) is minimal [Johnson and, Wichern 1991]. In the case of interval regression, additional knowledge about particular objects Oj is represented by the intervals [yj-, yj+] (yj- < yj+) instead of the exact values yj (2) [Buckley. and James 1979], [Gomez et al. 2003]: Cm = {xj[n], [yj-, yj+]}, where j = 1,….., m
(3)
where yj- is the lower bound (yj-∈ R1) and yj+ is the upper bound (yj+∈ R1) of unknown value of the target variable Y (yj- < yj+). Let us remark that the classical learning set Cm′ (2) can be transformed into the interval learning set Cm (3) by introducing the boundary values yj-= yj - ε and yj+ = yj + ε, where ε is a small positive parameter (ε > 0). Imprecise measurements of dependent variable y can be represented in such a manner. The transformation (1) constitutes the interval regression model if the below linear inequalities are fulfilled in the best way possible for elements of the set Cm (3): (∀j ∈ {1,…., m})
yj- < w[n]Txj[n] + θ < yj+
(4)
The formula (4) can be used among others for representation of the survival analysis problems as it is shown in the below example: Example 1: Traditionally, the survival analysis data sets Cs have the below structure [Klein and Moeschberger 1997]: Cs = {xj[n], tj , δj} ( j = 1,.......,m)
(5)
where tj is the observed survival time between the entry of the j-th Oj patient into the study and the end of the observation, δj is an indicator of failure of this patient (δj ∈{0,1}): δj= 1 - means the end of observation in the event of interest (failure), δj = 0 - means that the follow-up on the j-th patient ended before the event (the right censored observation). In this case (δj = 0) information about survival time tj is not complete. The real survival time Tj can be defined in the below manner on the basis of the set Cs (5): (∀ j = 1,.......,m) if δj = 1, then Tj = tj, and if δj = 0, then Tj > tj
(6)
446
L. Bobrowski
The right censoring can mean that an unknown survival time Tj is greater than some lower bound tj- (Tj > tj-). Similarly, the left censoring can mean that an unknown survival time Tj of the j-th patient Oj is less than some upper bound tj+ (Tj < tj+). We can use the below inequalities (4) for the purpose of designing the linear prognostic model T = w[n]Tx[n] + θ (1) from the censored data Tj: if Tj is right censored, then w[n]Txj[n] + θ > tj-
(7)
if Tj is left censored, then w[n] xj[n] + θ <
(8)
T
tj+
The right censored time Tj can be represented by the interval [tj-,+∞] (3). Similarly, the left censored survival time Tj can be represented by using the interval [-∞, tj+]. The parameters w[n] and θ of the interval regression model (1) are typically estimated from the data set Cm (3) by using the Expectation Maximization (EM) algorithms]. There are rather troublesome procedures with serious drawbacks concerning, among others, low efficiency, particularly in the case of high dimensional feature space. An alternative approach to designing of the interval regression models on the basis of the minimization of the convex and piecewise-linear (CPL) criterion functions is described and analysed in the paper. This approach is linked to the perceptron criterion function and linear separability of data sets, the basic concepts in the theory of neural networks and pattern recognition [Bobrowski 2010]. Let us also remark that the sharp inequalities (4) have been used in the definition of the interval regression model. Similar concept can be also defined on the basis of the soft inequalities yj- ≤ θ + w[n]Txj[n] ≤ yj+. document.
3 Linear Separability of the Sets R+ and RLet us modify the interval inequalities (4) for the purpose of the sets R+ and Rdefinition: (∀j ∈ {1,…., m}) and
w[n]Txj[n] - yj- + θ > 0 w[n]Txj[n] - yj+ + θ < 0
(9)
Two types of the augmented feature vectors xj+[n+2] and xj-[n+2] and the augmented weight vector v[n+2] (v[n+2] ∈ Rn+2) can be linked to the above inequalities: (∀j ∈ {1,…., m}) (10) if (yj- > - ∞), then xj+[n+2] = [xj[n]T, 1, -yj-]T else xj+[n+2] = 0, and if (yj+ < + ∞), then xj-[n+2] = [xj[n]T, 1, -yj+]T else xj-[n+2] = 0
Interval Uncertainty in CPL Models for Computer Aided Prognosis
447
and v[n+2] = [v1,…,vn+2]T = [w[n]T, θ, β]T
(11)
where β is the interval weight (β ∈ R1). The inequalities (9) can by represented by using the symbols xj+[n+2], xj-[n+2] and v[n+2]: (∀j ∈ {1,…., m})
(12)
(∀xj+[n+2] ≠ 0) v[n+2]T xj+[n+2] > 0, and (∀xj [n+2] ≠ 0) v[n+2]T xj-[n+2] < 0
The above inequalities can be linked to the demand of the linear separability of the sets R+ and R-. The positive set R+ is composed of m+ augmented vectors xj+[n+2] (10) which are different from zero (xj+[n+2] ≠ 0) and the negative set R- is composed of m- augmented vectors xj-[n+2] (10) which are different from zero (xj[n+2] ≠ 0): R+ = {xj+[n+2]} and R- = {xj-[n+2]}
(13)
We will examine the possibility of the sets R+ and R- separation by a such hyperplane H(v[n+2]) in the (n+2) – dimensional feature space F[n+2] which passes through the point 0 (origin) of this space [Bobrowski 2005]: H(v[n+2]) = {x[n+2]∈F[n+2]: v[n+2]Tx[n+2] = 0}
(14)
Definition 1: The sets R+ and R- (8) are linearly separable in the feature space F[n+2] if and only if exists such augmented weight vector v′[n+2] (11), that the below inequalities hold for the all non-zero vectors xj+[n+2] and xj-[n+2] (10): (∃ v′[n+2] )
(15)
(∀xj+[n+2] ≠ 0) v′[n+2]T xj+[n+2] > 0. and (∀xj-[n+2] ≠ 0) v′[n+2]T xj-[n+2] < 0 If the inequalities (15) hold then all the non-zero elements xj+[n+2] of the set R+ (13) are situated on the positive side of the hyperplane H(v′[n+2]) (14) and all the non-zero elements xj-[n+2] of the set R- are situated on the negative side of this hyperplane (Fig. 1). Example 2: Let us take an example of seven values xj of dependent variable x (xj ∈ F[1] - one dimensional feature space) and the dependent variable y characterised by the intervals [yj-, yj+] (3).
448
L. Bobrowski
Table 1 An example of interval data set (3) with seven elements xj (m+ = 6 and m- = 5) j
xj
yj-
yj+
1
-1.0
-3.5
-1.0
2
-0.5
-3.0
+∞
3
1.0
2.0
2.5
4
1.5
-3.0
+∞
5
2.5
1.5
2.5
6
3.5
-∞
3.0
7
4.5
2.0
4.0
We can remark that in the above data set that the values y2 and y4 are right censored and the value y6 is left censored. The equation y = x - 1 fulfils all the inequalities (3) resulting from the Table 1 and can be treated as the interval regression model (1) of this data (Fig. 1). The parameters of the model (1) are equal in this case w[1] = 1 and θ = 1.
yy +
y3 y5+
2
-2 y + 1 -
y1- y2
y3 y5
y7+
-
y =x-1
y74
2 -
y6
+
x
y4-
Fig. 1 An illustration of the data set from the Table 1 and the interval regression model y = x – 1 (3)
Remark 1: If the augmented vector v′[n+2] (11) linearly separates (15) the sets R+ and R- (13), then the interval weight β′ (11) is greater than zero (β′ > 0). This Remark results directly from the definition (10) of the vectors xj+[n+2] and xj [n+2], the relation (9), and the inequality yj+ > yj-. Lemma 1: If the hyperplane H(v′[n+2]) (14) with the interval weight β′ (11) equal to one (v′[n+2] = [w′[n]T, θ′,1]) separates (15) the sets R+ and R- (13), then the linear model yj = w′[n]Txj[n] + θ′ (1) fulfils all the inequalities (4). This Lemma results directly from the definition of the augmented feature vectors xj+[n+2] and xj-[n+2] (10) and the augmented weight vector v[n+2] (11).
Interval Uncertainty in CPL Models for Computer Aided Prognosis
449
Lemma 2: All the inequalities yj- < w′[n]Txj[n] + θ′ < yj+ (4) can be fulfilled by some parameters vector v′[n+2] = [w′[n]T, θ′,1] (11) if and only if the sets R+ and R- (13) are linearly separable (15). Proof: If there exists such weight vector v′[n+2] (11) that all the inequalities yj- < w′[n]Txj[n] + θ′< yj+ (4) are fulfilled, then the sets R+ and R- (13) are linearly separable (15) with the interval weight β′ (11) equal to one (β′ = 1). This property results directly from the definition (10) of the vectors xj-[n+2] and xj+[n+2]. If the sets R+ and R- (13) are linearly separable (15), then there exists such hyperplane H(v′[n+2]) (14) which separates these sets. The separation (10) of the sets R+ and R- (8) by the hyperplane H(v′[n+2]) (14) means that the interval weight β′ is greater than zero (β′ > 0) (Remark 1). In this case, the sets R+ and R- (13) are also separated by the hyperplane H(v′′[n+1]) (14) with the marginal weight β′′ equal to one (β′′= 1) as it results from the below inequalities (15): (∃ w′[n], θ′, and β′ > 0) (∀j ∈ {1,…., m}) w′[n]Txj[n] + θ′ - β′ yj- > 0 and w′[n]Txj[n] + θ′ - β′ yj+ < 0
(16)
By dividing the above inequalities by β′ (β′ > 0) we obtain the inequalities (4). The inequalities (16) can be represented equivalently in the augmented manner [Bobrowski 2005]: (∃ v′[n+2] = [w′[n]T, θ′, β′]T, where β′ > 0 (∀j ∈ {1,…., m}) v′[n+2]T xj+[n+2] ≥ 1 and v′[n+2]T xj-[n+2] ≤ -1
(17)
Such representation is used further in the definition of the CPL penalty functions.
4 Convex and Piecewise Linear (CPL) Penalty and Criterion Functions The inequalities (17) can be a guideline in the definition of the CPL penalty φj+(v[n+2]) and φj-(v[n+2]) [Bobrowski 2005]. The upper penalty functions φj+(v[n+2]) are defined by different from zero feature vectors xj+[n+2] (10): (∀xj+[n+2] ≠ 0) φj+(v[n+2]) =
(18) 1 - v[n+2]Txj+[n+2]
if
v[n+2]Txj+[n+2] < 1
0
if
v[n+2]T xj+[n+2] ≥ 1
450
L. Bobrowski
Similarly, the lower penalty functions φj-(v[n+2]) are defined by the augmented feature vectors xj-[n+2] (10): (∀xj-[n+2] ≠ 0)
(19) 1 + v[n+2]Txj-[n+2]
if
v[n+2]Txj-[n+2] > -1
0
if
v[n+2]Txj-[n+2]
φj-(v[n+2]) =
≤ -1
The perceptron criterion function Φ(v[n+2]) is defined as the sum of the penalty functions φj+(v[n+2]) (18) and φj-(v[n+2]) (19) [7]: Φ(v[n+2]) = Σ αj φj+(v[n+2]) + Σ αj φj-(v[n+2]) j
(20)
j
where nonnegative parameters αj (αj ≥ 0) determine an importance (price) of the particular feature vectors xj[n] (3). The function Φ(v[n+2]) (20) is convex and piecewise-linear (CPL) as the sum of such type penalty functions. Designing the interval regression models (1) can be based on finding of the minimal value Φ(v*[n+2]) and the optimal vector v*[n+2] of the criterion function Φ(v[n+2]) (20) []: (∀v[n+2]) Φ(v[n+2]) ≥ Φ(v*[n+2]) = Φ* ≥ 0
(21)
where v [n+2] = [w [n] , θ , β ] , and w [n] = [w1 ,…., wn ] (1). The basis exchange algorithms, which are similar to the linear programming, allow one to find the minimum of the CPL function Φ(v[n+2]) (20) and the optimal parameters vector v*[n+2] (21) efficiently, even in the case of large, multidimensional data sets [Bobrowski 1991]. The below theorems can be proved: *
*
T
*
* T
*
*
* T
Theorem 1: The minimal value Φ* = Φ(v*[n+2]) (21) of the non-negative criterion function Φ(v[n+2]) (20) is equal to zero (Φ* = 0) if and only if the sets R+ and R(13) are linearly separable (15). In this case, the hyperplane H(v*[n+2]) (14) defined by the optimal vector v*[n+2] (21) exactly separates the sets R+ and R-. The proof of the similar theorem has been given in the author′s earlier works [7]. Theorem 2: If there exists such weight vector w′[n] and the threshold θ′, that all the inequalities yj- < w′[n]Txj[n] + θ′ < yj+ (4) are fulfilled for all feature vectors xj[n] (3), then the minimal value Φ(v*[n+2]) (21) of the criterion function Φ(v[n+2]) (20) is equal to zero (Φ(v*[n+2]) = 0).
Interval Uncertainty in CPL Models for Computer Aided Prognosis
451
The minimal value Φ′(v*[n+2]) (21) of the criterion function Φ(v[n+2]) (20) is greater then zero, if not all the inequalities (4) can be fulfilled. If the interval weight β* in the optimal weight vector v*[n+2] = [w*[n]T,θ*, β*]T (21) is greater than zero (β* > 0), then the below linear transformation of the feature vectors xj[n] on the line y (1) can be defined: (∀j ∈ {1,…., m})
yj^ = (w*[n] / β*)T xj[n] + θ*/ β*
(22)
We can infer from the Theorem 2, that the minimal value Φ(v*[n+2]) (21) of the criterion function Φ′(v[n+2]) (20) is equal to zero if the interval regression model (22) with β* > 0 fulfils all the constraints (4). Theorem 3: If the sets R+ and R- (13) are linearly separated (15) by the optimal vector v*[n+2] = [w*[n]T,θ*, β*]T (21) and the interval weight β* is greater than zero (β* > 0), then the model (22) fulfills all the inequalities yj- < (w*[n] / * T * * + β ) xj[n] + θ / β < yj (4). This theorem can be proved on the basis of the Theorem 1 and the equation (11).
5 Feature Selection for the CPL Interval Regression Designing the linear regression model yj^ = (w*[n] / β*)T xj[n] + θ*/ β* (22) can be based on the minimization of the CPL criterion function Φ(v[n+2]) (20) defined on the interval data set Cm (3). In practice, the interval data set Cm (3) often contain a small number m of multidimensional feature vectors xj[n] (m << n). It is particularly important in this case to apply feature selection procedure which is aimed at reducing the feature space F[n+2] into some subspace F[n′] (F[n′]⊂ F[n+2], n′ < n + 2). Feature selection procedure should allow to neglect the maximal number of unimportant features xi while preserving a good quality of the resulting model (22). We have proposed and implemented the relaxed linear separability (RLS) method of feature selection [Bobrowski and Łukaszuk 2009]. In accordance with this approach, features xi are omitted in such a manner that the linear separability (15) of the sets R+ and R- (8) is preserved in a reasonable way. Such features xi are reduced (neglected) which are linked to the weights wi* equal (or near equal) to zero in the optimal weight vector w*[n] (v*[n+2] = [w*[n]T, θ*, β*]T and w*[n] = [w1*,…, wn*]T (22)): (wi* = 0) (xi is reduced)
(23)
452
L. Bobrowski
Reduction of features xi in accordance with the above rule does not change the model values yj^ (22). Let us introduce the modified CPL criterion function Ψλ(v[n+2]) with feature costs for the purpose of feature reduction [Bobrowski 2009]: Ψλ(v[n+2]) = Φ(v[n+2]) + λ Σ γi φi(v[n+2]) + γn+2 |vn+2 - 1|
(24)
i∈{1,...,n+1}
where v[n+2] = [w[n]T, θ, β]T, w[n] = [w1,…,wn]T (1), Φ(v[n+2]) is defined by the formula (20), λ is the feature cost level (λ ≥ 0), γi – is the cost of the i-th feature xi (γi > 0), i = 1,..., n, and the cost functions φi(v[n+2]) are defined by the unit vectors ei[n+2] = [0,…,1,…,0]T: (∀i ∈ {1,…, n}) φi(v[n+2]) = | wi | =
-e i[n+2]Tv[n+2] if ei[n+2]Tv[n+2]< 0
(25)
e i[n+2]Tv[n+2] if ei[n+2]Tv[n+2]≥ 0 The cost function φi(v[n+2]) is linked to the feature xi and it is aimed at reducing (23) of this feature. Let us remark that in accordance with the above equations, the cost functions φi(v[n+2]) are related only to n real features xi (xj[n] = [x1,…, xn]T (1)). The cost function φn+1(v[n+2]) can be defined in a similar manner: φn+1(v[n+2]) = | θ | =
-e n+1[n+2]Tv[n+2] if en+1[n+2]Tv[n+2]< 0
(25)
e n+1[n+2]Tv[n+2] if en+1[n+2]Tv[n+2]≥ 0 The cost function φn+1(v[n+2]) is aimed at diminishing the threshold θ value to zero. We can remark that in some applications the reducing of the threshold θ value is not required. Such effect can be achieved by using a very small value of the parameter γn+1 (γn+1 > 0). The criterion function Ψλ(v[n+2]) (24) contains an additional cost function γn+2 |vn+2 - 1| = γn+2 |β - 1|. This cost function can serve as reinforcement of the condition β′ = 1 (11) that the interval weight β′ should be equal to one (Lemma 1). The non-negative parameter γn+2 (γn+2 ≥ 0) allows to regulate the level of this reinforcement. In accordance with the RLS method of feature selection, the reduction (23) of unimportant features xi in the cost sensitive manner is based on the minimization of the modified CPL criterion function Ψλ(v[n+2]) (24) with different values of the cost level λ [4]. The criterion function Ψλ(v[n+2]) (24) is the convex and piecewise linear (CPL) as the sum of the CPL functions Φ(v[n+2]) (20) and the
Interval Uncertainty in CPL Models for Computer Aided Prognosis
453
CPL functions λ γi φi(v[n+2]) (25). The basis exchange algorithms allow to find the optimal vector vλ*[n+2] which constitutes the minimal value of the criterion function Ψλ(v[n+2]) (24): (∃vλ*[n+2]) (∀v[n+2]) Ψλ(v[n+2]) ≥ Ψλ(vλ*[n+2]) = Ψλ*
(26)
Remark 2: The minimal value Ψλ* (26) of the non-negative criterion function Ψλ(v[n+2]) (24) with the cost level λ equal to zero (λ = 0) is equal to zero (Ψλ* = 0) if and only if the sets R+ and R- (13) are linearly separable (15). The above Remark can be linked to the Theorem 1. The CPL cost function φi(v[n+2]) (25) allows to reinforce the conditions wi = 0 and tends to reduce the feature xi (23) as a result of the function Ψλ(v[n+2]) (24) minimization. An influence of the cost function φi(v[n+2]) (25) on the feature xi reduction increases with the value of the parameters γi and λ (24). An increase of the cost level λ leads to reducing greater number of features xi in result of the criterion function Ψλ(v[n+2]) (24) minimization. Successive increase of the value of parameter λ in the criterion function Ψλ(v[n+2]) (24) allow to generates the descended sequence of feature subspaces Fk[nk]: F[n] ⊃ F1[n1] ⊃ F2[n2] ⊃… ⊃ Fk′[nk′]
(27)
where nk > nk+1. Each step Fk[nk] → Fk+1[nk+1] in the above sequence can be realized in the deterministic manner by an adequate increase λk → λk+1 = λk + Δkλ of the cost level λ in the criterion function Ψλ(v[n+2]) (24). The minimization of the criterion function Ψλ(v[n+2]) (24) with the parameter λk+1 results in the feature subspace Fk+1[nk+1]. The quality of particular feature subspaces Fk[nk] in the sequence (27) should be evaluated during the feature selection process. In the RLS approach, the quality of the feature subspace Fk[nk] was evaluated on the basis of the optimal linear classifier designed in this subspace. For this purpose, the perceptron criterion function Φk(v[nk+2]) (20) was defined by using the feature vectors xj[nk] from the subspace Fk[nk] (xj[nk] ∈ Fk[nk]). Two types of the augmented feature vectors xj+[nk+2] and xj-[nk+2] and the augmented weight vector v[nk+2] was defined in accordance with the rules (10) and (11): (∀j ∈ {1,…., m}) (28) if (yj- > - ∞), then xj+[nk+2] = [xj[nk]T, 1, -yj-]T else xj+[nk+2] = 0, and if (yj+ < + ∞), then xj-[nk+2] = [xj[nk]T, 1, -yj+]T
454
L. Bobrowski
and v[nk+2] = [w[nk]T, θ, β]T, where v[nk+2]
∈ V[nk+2].
(29) vk*[nk+2]
The basis exchange algorithm allows to find the optimal vector which constitutes the minimum (21) of the function Φk(v[nk+2]) (20) in the weight subspace V[nk+2]. vk*[nk+2] = [wk*[nk]T, θk*, βk*]T
(30)
vk*[nk+2]
The optimal vector (30) allows to define both the interval regression model (22) as well as the following decision rule of the optimal linear classifier in the subspace Fk [nk+2]: if vk*[nk+2]Tx[nk+2] ≥ 0, then x[nk+2] is allocated to the category ω+ * if vk [nk+2]Tx[nk+2] ≥ 0, then x[nk+2] is allocated to the category ω-
(31)
In accordance with the above rule, the augmented feature vector x[nk+2] (x[nk+2] ∈ Fk[nk+2]) is allocated to the positive category ω+, if the scalar product vk*[nk+2]Tx[nk+2] is not negative. In the other case, the vector x[nk+2] is allocated to the negative category ω+. We are considering the linear separability (10) of the sets Rk+ and Rk- (13) containing the augmented feature vectors xj+[nk+2] and xj-[nk+2] (28). Remark 3: If the sets Rk+ and Rk- (13) are linearly separable (15) in the feature space Fk[nk+2], then the decision rule (31) based on the optimal vector vk*[nk+2] (30) allocates all the non-zero elements xj+[nk+2] of the set Rk+ in the positive category ω+, and all the non-zero elements xj-[nk+2] of the set Rk- in the negative category ω-. The above Remark can be justified by using the Theorem 2. In accordance with the RLS method of feature selection, the quality of the feature subspace Fk[nk] (27) is evaluated on the basis of evaluation of the optimal linear classifier (31) defined in this subspace by a such parameters vector vk*[nk+2] = [wk*[nk]T, θk*, βk*]T (30), which constitutes the minimum of the criterion function Ψλ(v[nk+2]) (24). The quality of the linear classifier (31) can be evaluated by using the error estimator (apparent error rate) ea(vk*[nk+2]) as the fraction of wrongly classified non-zero elements xj+[nk+2] and xj-[nk+2] (28) of the sets Rk+ and Rk- (8) [Duda et al. 2001]: ea(vk*[nk+2]) = ma(vk*[nk+2]) / (m+ + m-)
(32)
Interval Uncertainty in CPL Models for Computer Aided Prognosis
455
where m+ is the number of the non-zero elements xj+[nk+2] (28) in the set Rk+ (13), m- is the number of the non-zero elements xj-[nk+2] in the set Rk-, and ma(vk*[nk+2]) is the number of such elements from these sets which are wrongly allocated by the rule (31). Wrong allocation happens, when the augmented feature vector xj+[nk+2] (28) is allocated to the negative category ω-, or the vector xj-[nk+2] is allocated to the positive category ω+. Because the same data xj[nk] was used for classifier (31) designing and for classifier evaluation, the evaluation result (32) is too optimistic (biased) [2]. The error rate ea(vk*[nk+2]) (32) evaluated on the elements xj+[nk+2] and xj-[nk+2] (28) of the learning sets Rk+ and Rk- (13) is called the apparent error (AE). In accordance with the Remark 2, if the sets Rk+ and Rk- (13) are linearly separable (15) in the feature subspace Fk[nk+2], then the apparent error ea(vk*[nk+2]) (32) is equal to zero. But it is typically found in practical applications that the error rate of classifier (31) evaluated on vectors xj[nk+2] (28) that do not belong to the learning sets Rk+ and Rk- (8) is higher than zero. For the purpose of reducing the classifier bias, the cross validation procedures can be applied [2]. The term p-fold cross validation means that the data sets Rk+ and Rk- (13) have been divided into p parts Pi, where i = 1,…, p. The vectors xj+[nk+2] and xj-[nk+2] (28) contained in p – 1 parts Pi are used for the definition of the criterion function Φk(v[nk+2]) (20) and in the computation of optimal parameters vk*[nk+2] (30). The remaining vectors xj+[nk+2] and xj-[nk+2] (28) are used as a test set (one p-part Pi′) for the evaluation of the error rate ei′(vk*[nk+2]) (32). This evaluation is repeated p times, and during each time different p-part Pi′ is used as the test set. After this, the mean value ec(vk*[nk+2]) of the errors rates ei′(vk*[nk+2]) (32) on the elements of the test sets Pi′ is computed. The cross validation procedure allows to use different vectors for designing of classifier (31) and its evaluation, and, as a result, to reduce the bias of the error rate estimation (32). The error rate eCVE(vk*[nk+2]) (32) estimated during the cross validation procedure is called the cross-validation error (CVE). A special case of the p-fold cross validation method is the leave-one out procedure. In the case of the leave-one out procedure, the number p of the parts Pi is equal to the number of the non-zero elements xj+[nk+2] and xj-[nk+2] (28) of the sets Rk+ and Rk- (13). In accordance with the RLS method of feature selection, such feature subspace Fk*[nk] in the sequence (27) is selected as the optimal one which is linked to the smallest value of the cross-validation error rate eCVE(vk*[nk+2]) (32) of the linear classifier (31) [Bobrowski and Łukaszuk 2009].
6 Hyperplanes and Vertices in the Parameter Space V[n + 2] Each non–zero feature vector xj+[n+2] (10) defines the hyperplane hj+ in the parameter space V[n+2]:
456
(∀j ∈ {1,…, m})
L. Bobrowski
if xj+[n+2] ≠ 0, then hj+ = {v[n+2]: xj+[n+2]Tv[n+2] = 1} (33)
Similarly, feature vectors xj-[n+2] (10) define the hyperplanes hj-: (∀j ∈ {1,…, m})
if xj-[nk+2] ≠ 0, then hj- = {v[nk+2]: xj-[nk+2]Tv[nk+2] = -1}
(34)
The unit vectors ei[n+2] = [0,…,1,…,0]T define the hyperplanes hi0 in the (n +2) - dimensional parameter space V[n +2]: (∀i ∈ {1,…, n +2}) h i0 = {v[n +2]: ei[n +2]Tv[n +2] = 0}
(35)
The hyperplanes hj+ (33), hj- (34) or hi0 (35) intersect in some points vr[n + 2] (vr[n + 2] ∈ V[n + 2]), which are called as vertices. Each vertex vr[n + 2] in the (n + 2) - dimensional parameter space V[n + 2] is the geometrical place of intersection at least n + 2 hyperplanes hj+ (33), hj- (34) or hi0 (35). Each vertex vr[n+2] can be defined by the set of n + 2 linear equations: xj+[n + 2]Tvr[n + 2] = 1 (33) or xj-[n + 2]Tvr[n + 2] = -1 (34) or ei0[n + 2]Tv r[n + 2] = 0 which can be represented in the below matrix form: Br[n + 2]Tvr[n + 2] = δr[n + 2]
(36)
where Br[n + 2] is the nonsingular matrix (basis) with the columns constituted by n + 2 linearly independent feature vectors xj+[n + 2], xj-[n + 2] (28) or unit vectors ei[n + 2] and δr[n + 2] is the margin vector with the components δri equal 1, -1, or 0 adequately to the type of the vector which constitutes the i-th row of the matrix Br[n + 2] (xj+[n + 2], xj-[n + 2] or ei[n + 2]). The vertex vr[n + 2] can be computed in accordance with the below formula on the basis of the equation (36): vr[n + 2] = (Br[n + 2]T)-1 δr[n + 2]
(37)
It can be proved that the minimum (26) of the modified CPL criterion function Ψλ(v[n + 2]) (24) defined on the feature vectors xj+[n + 2] and xj-[n + 2] (10) can be located in one of the vertices vr[n + 2] (36): (∃v r *[n +2]) (∀v[n + 2]) Ψλ(v[n + 2]) ≥ Ψλ(vr*[n + 2])
(38)
The minimization (26) of the modified CPL criterion function Ψλ(v[n + 2]) (24) allows also to find the basis Br*[n + 2] (36) related to the optimal vertex vr*[n + 2] (38).
Interval Uncertainty in CPL Models for Computer Aided Prognosis
457
Remark 4: If the i-th (i = 1,…, n) unit vector ei[n + 2] = [0,…,1,…,0]T constitutes one of the rows of the basis Br*[n+2] (36) related to the optimal vertex vr*[n+2] = [wr*[n]T, θr*, βr*]T (38), where wr*[n] = [wr,1*,…, wr,n+1*]T (22), then the weight wr,i* (28) linked to the i-th feature xi is equal to zero (wr,i* = 0). In accordance with the implication (23), the i-th feature xi can be reduced (neglected) in this case. The Remark 4 can be summarized in the below manner by using the implication (23): (The i-th unit vector ei[n+2] (i = 1,…, n) is in the basis Br*[n+2]) (The i-th feature xi can be reduced)
(39)
Remark 5: A sufficiently large increase of the cost level λ (λ ≥ 0) in the CPL criterion function Ψλ(v[n+2]) (24) leads to an increase of the number n0 of unit vectors ei[n+2] in the basis Br*[n+2] (36) related to the optimal vertex vr*[n+2] (38). In result, n0 features xi can be reduced from the feature space F[n]. The dimensionality n of the feature space F[n] can be reduced arbitrarily by an adequate increase of the parameter λ in the criterion function Ψλ(v[n+2]) (24). For example, the value λ = 0 means that the optimal vertex vr*[n+2] (38) constitutes also the minimum of the perceptron criterion function Φ(v[n+2]) (20) defined in the full feature space F[n]. On the other hand, sufficiently large value of the parameter λ results in the optimal vertex vr*[n+2] (38) equal to zero (vr*[n+2] = 0). Such solution is not constructive, because it means that all the features xi have been reduced (23) and the separating hyperplane H(vr*[n+2]) (14) cannot be defined. The basis exchange algorithms allow to find efficiently the parameters (vertex) vr*[n+2] constituting the minimum of the CPL criterion function, even in the case of large sets R+ and R- (13) of high dimensional vectors xj+[n+2] and xj-[n+2] (10) [Bobrowski 1991].
7 Examples Example 3: Let us consider the case of two-dimensional feature space F[2] with only one feature vector x1[2] = [x11, x12]T = [3, 2]T∈ F[2] and with only one constraint (3): the upper bound y1+ = -2. In this case, the linear model (1) y(x[2]) = w[2]Tx[2] + θ should fulfil only one inequality (4): 3w1 + 2 w2 + θ < -2. The augmented vector xj-[nik+2] = [xj[nk]T, 1, -yj+]T (28) is equal in this case to x1-[4] = [3, 2, 1, 2]T. The augmented inequality (17) v[n+2]Txj-[n+2] ≤ -1 takes the form v[4]Tx1-[4] ≤ -1, where v[4] = [v1, v2, v3, v4]T = [w1, w2, θ, β]T (11). The feature vector x1-[4] = [3, 2, 1, 2]T (10) defines the below hyperplane h1- (34) in the parameter space V[4]:
458
L. Bobrowski
h1- = {v[4]: x1-[4]Tv[4] = -1} = {v[4]: 3v1+ 2v2 + v3+ 2 v4 = -1}
(40)
The unit vectors ei[4] define the zero basis B0[0] = [e1[4],e2[4],e3[4],e4[4]] (36) and four hyperplanes hi0 with the margin equal to zero in the parameter space V[4]: (∀i ∈ {1,2, 3,4}) h i0 = {v[4]: ei[4]Tv[4] = 0} T
(41) 1
The unit vector e4[4] = [0, 0, 0, 1] defines also the hyperplane h4 with the margin equal to one: h41 = {v[4]: e4[4]Tv[4] = 1} = {v[4]: v4 = 1} =
(42)
The hyperplane h41 has been used for representation of the condition β = 1 in the vector v[4] = [v1, v2, v3, v4]T = [w1, w2, θ, β]T (Lemma 1). The four hyperplanes hi0 intersect the zero vertex v0[4] = [0, 0, 0, 0]T. (37). The zero vertex v0[4] should be excluded from further considerations because this vector does not fulfill the assumed inequality (4): 3w1 + 2 w2 + θ < - 2. Let us take into considerations such vertices vr[4] (37) in the parameter space V[4] which are the points of intersection of the hyperplane hi-(40) with the hyperplane h31 (42) and with two hyperplanes hi0 (41). Each such vertex vr[4] can be defined by the linear equation: x1-[4]Tvr[4] = -1 (34), by the equation e3[4]Tv[4] = 1 (42) and by two of four equations ei[4]Tvr [4] = 0, where i < 4 (41). The vertex vr[4] can be represented in the matrix form (36): Br[4] Tvr[4] = δr[4]
(43)
where v[4] = [v1, v2, v3, v4] = [w1, w2, θ, β] (11). The nonsingular matrix (basis) Br[4] is constituted by the vector x1-[4] = [3, 2, 1, 2]T and by three unit vectors ei(k)[4]. The margin vector δr[4] has components equal to -1, 0 or 1 adequately to the hyperplanes h101- (40), hi0 (41), or h41 (42). The solution of the equation (43) is given by: T
T
vr[4] = (Br[4]T)-1 δr[4]
(44)
The below rules can be obtained from the equation (44): • If the basis Br[4] (43) is equal to B1[4] = [x1-[4], e2[4], e3[4], e4[4]], then δ1[4] = [-1, 0, 0, 1]T and the vertex vr[4] is equal to v1[4] = [-1, 0, 0, 1]T. • If the basis Br[4] (43) is equal to B2[4] = [e1[4], x1-[4], e3[4], e4[4]], then δ2[4] = [0, -1, 0,1]T and the vertex vr[4] is equal to v2[4] = [0, -3/2, 0, 1]T. • If the basis Br[4] (43) is equal to B4[4] = [e1[4], e2[4], e3[4], x1-[4]], then δ4[4] = [0, 0, -1, 1]T and the vertex vr[4] is equal to v4[4] = [0, 0, -3, 1]T.
Interval Uncertainty in CPL Models for Computer Aided Prognosis
459
We can remark that in this case, the values Φ(vr[4]) of the perceptron criterion function Φ(v[4]) (20) are equal to zero for each of these points vr[4]: Φ(v1[4]) = Φ(v2[4]) = Φ(v4[4]) = 0
(45)
The values Ψ1(vr[4]) of the modified criterion function Ψ1(v[4]) (24) with λ = 1 and γ1= γ2 = γ3 = 1 are equal to: Ψ1(v1[4]) = 1.0, Ψ1(v2[4]) = 1.5, Ψ1(v4[4]) = 3.0
(46)
The modified criterion function Ψ1(v[4]) (24) has the lowest value equal to zero in the vertex v1[4] (Ψ1(v1[4]) = 0). In accordance with the relation (26), the optimal vertex v1*[4] is equal in this case to v1[4] = [-1, 0, 0, 1]T (v1*[4] = v1[4]) and Ψ1* = 1.0. Example 4: Let us consider, similarly as in Example 3, two dimensional feature space F[2]. We will take into consideration three feature vectors x1[2] = [x11, x12]T = [3, 2]T, x2[2] = [x21, x22]T = [2, -1]T, and x3[2] = [x31, x32]T = [1, -1]T. The upper bounds y1+ = -2 and y2+ = 2 has been related to the vectors x1[2] and x2[2]. The lower bound y3- = 1 has been related to the vector x3[2]. These constraints are described by the below inequalities: 3w1 + 2 w2 + θ < -2 2w1 - w2 + θ < 2 w1 - w2 + θ > 1
(47)
The augmented vector (10) can be linked to each of this inequality: x1 [4] = [3, 2, 1, 2]T x2-[4] = [2, -1, 1, -2]T x3+[4] = [1, -1, 1, -1]T -
(48)
In this case, the margin vector δr[4] (36) has the following components δri: δr[4] = [δr1, δr2, δr3, δr4]T = [-1, -1, 1, 1]T
(49)
The augmented vectors (48) can be used in the matrix (basis) Br[4] (36): Br[4] = [x1-[4], x2-[4], x3+[4], e4[4]]
(50)
T -1
The inverse matrix (Br[4] ) (37) is equal to: (Br[4]T)-1 = [r1[4], r2[4], r3[4], r4[4]]
(51)
where r1[4] = [0, 1/3, 1/3, 0]T r2[4] = [1, -2/3, -5/3, 0]T r3[4] = [-1, 1/3, 7/3, 0]T r4[4] = [1, -5/3, -5/3, 1]T
vr[4] = [-1, -1, 2, 1]T
(52)
We can compute the vertex vr[4] (37) by taking into account (49) and (52): vr[4] = [-1, -1, 2, 1]T
(53)
460
L. Bobrowski
The scalar products values vr[4]Txj[4] for the augmented feature vectors (48) are equal to: vr[4]T x1-[4] = [-1, -1, 2, 1]T[3, 2, 1, 2] = - 1 vr[4]T x2-[4] = [-1, -1, 2, 1]T[2, -1, 1, -2]T = - 1 vr [4]T x3+[4] = [-1, -1, 2, 1]T[1, -1, 1, -1]T = 1
(54)
The values of the penalty functions φj+ (v[4]) (18) and φj-(v[4]) (19) are equal to zero in the point vr[4]. In result, the value Φ(vr[4]) of the perceptron criterion function Φ(v[4]) (20) in the point vr[4] is also equal to zero (Φ(vr[4]) = 0). The vector of parameters vr[4] = [w1, w2, θ, β]T = [-1, -1, 2, 1]T (53) defines the regression model (1): y(x) = w1 x1 + w2 x2 + θ = - x1 - x2 + θ
(55)
This model fulfils all the constraints (47): y(x1) < y1 (-5 < -2), y(x2) < y2+ (1 < 2), and y(x3) > y3+ (2 > 1) document. +
8 Concluding Remarks The problem of designing the prognostic linear models (1) on the basis of data sets Cm (3) with an uncertainty of target variable in the form of intervals has been analyzed in the paper. In accordance with the proposed approach, this problem has been transformed into the problem of the linear separability (15) of the sets R+ and R- (13). The problem of the linear separability (15) means here the search for such hyperplane H(v*[n+2]) (14) which separates the sets R+ and R- (13) in the best possible way. The parameters v*[n+2] of the optimal hyperplane H(v*[n+2]) (14) can be found through the minimization (21) of the convex and piecewise linear (CPL) criterion function Φ(v[n+2]) (20) or the modified criterion function Ψλ(w[n+1]) (36). The basis exchange algorithms, similarly to linear programming, allow to find efficiently the minimum of each of this function [12]. The modified CPL criterion function Ψλ(w[n+1]) (36), which takes into account the features xi costs γI, allows to combine the designing interval regression models with the feature selection process. As a result, the most influential subsets of features (risk patterns) xi can be identified in accordance with the relaxed linear separability (RLS) method [11]. The described approach to designing interval prognostic models allows to take into account also the censored data sets. Important and widely used examples of censored data sets can be found in survival analysis applications [7]. The interval censored data represented by intervals [yj-, yj+] (3) can be treated as a kind of generalization of survival analysis data - even the case, when the data set Cs (5) contains only censored survival times tj, can be analyzed in this manner. Such possibility opens the way for applying interval regression modeling to many important problems where the dependent quantity cannot be measured exactly. This approach allows for designing prognostic models on the basis of imprecise measurements of the dependent variable. Such circumstances are commonly met in practice.
Interval Uncertainty in CPL Models for Computer Aided Prognosis
461
Acknowledgment This work was supported by the by the NCBiR project N R13 0014 04, and partially financed by the project S/WI/2/2011 from the Białystok University of Technology, and by the project 16/St/2011 from the Institute of Biocybernetics and Biomedical Engineering PAS.
References [Bobrowski 1991] Bobrowski, L.: Design of piecewise linear classifiers from formal neurons by some basis exchange technique. Pattern Recognition 24(9), 863–870 (1991) [Bobrowski 2005] Bobrowski, L.: Eksploracja danych oparta na wypukłych i odcinkowoliniowych funkcjach kryterialnych (Data mining based on convex and piecewise linear (CPL) criterion functions), Technical University Białystok (2005) (in Polish) [Bobrowski 2009] Bobrowski, L.: Ranked linear models and sequential patterns recognition. Pattern Analysis & Applications 12(1), 1–7 (2009) [Bobrowski and Łukaszuk 2009] Bobrowski, L., Łukaszuk, T.: Feature selection based on relaxed linear separabilty. Biocybernetics and Biomedcal Engineering 29(2), 43–59 (2009) [Bobrowski 2010] Bobrowski, L.: Linear prognostic models based on interval regression with CPL functions. Symulacja w Badaniach i Rozwoju 1, 109–117 (2010) (in Polish) [Buckley. and James 1979] Buckley, J., James, I.: Linear regression with censored data. Biometrika 66, 429–436 (1979) [Duda et al. 2001] Duda, O.R., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001) [Gomez et al. 2003] Gomez, G., Espinal, A., Lagakos, S.: Inference for a linear regression model with an interval-censored covariate. Statistics in Medicine 22, 409–425 (2003) [Johnson and, Wichern 1991] Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Prentice-Hall Inc., Englewood Cliffs (1991) [Klein and Moeschberger 1997] Klein, J.P., Moeschberger, M.L.: Survival Analysis, Techniques for Censored and Truncated Data. Springer, NY (1997)
Neural Network Training with Second Order Algorithms H. Yu and B.M. Wilamowski Department of Electrical and Computer Engineering, Auburn University, Auburn, AL, USA [email protected], [email protected]
Abstract. Second order algorithms are very efficient for neural network training because of their fast convergence. In traditional Implementations of second order algorithms [Hagan and Menhaj 1994], Jacobian matrix is calculated and stored, which may cause memory limitation problems when training large-sized patterns. In this paper, the proposed computation is introduced to solve the memory limitation problem in second order algorithms. The proposed method calculates gradient vector and Hessian matrix directly, without Jacobian matrix storage and multiplication. Memory cost for training is significantly reduced by replacing matrix operations with vector operations. At the same time, training speed is also improved due to the memory reduction. The proposed implementation of second order algorithms can be applied to train basically an unlimited number of patterns.
1 Introduction As an efficient way of modeling the linear/nonlinear relationships between stimulus and responses, artificial neural networks are broadly used in industries, such as nonlinear control, data classification and system diagnosis. The error back propagation (EBP) algorithm [Rumelhart et al. 1986] dispersed the dark clouds on the field of artificial neural networks and could be regarded as one of the most significant breakthroughs in neural network training. Still, EBP algorithm is widely used today; however, it is also known as an inefficient algorithm because of its slow convergence. Many improvements have been made to overcome the disadvantages of EBP algorithm and some of them, such as momentum and RPROP algorithm, work relatively well. But as long as the first order algorithms are used, improvements are not dramatic. Second order algorithms, such as Newton algorithm and Levenberg Marquardt (LM) algorithm, use Hessian matrix to perform better estimations on both step sizes and directions, so that they can converge much faster than first order algorithms. By combining the training speed of Newton algorithm and the stability of EBP algorithm, LM algorithm is regarded as one of the most efficient algorithms for training small and medium sized patterns.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 463–476. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
464
H. Yu and B.M. Wilamowski
Table 1 shows the training statistic results of two-spiral problem using both EBP algorithm and LM algorithm. In both cases, fully connected cascade (FCC) networks were used for training and the desired sum square error was 0.01. For EBP algorithm, the learning constant was 0.005 (largest possible avoiding oscillation), momentum was 0.5 and iteration limit was 1,000,000; for LM algorithm, the maximum number of iteration was 1,000. One may notice that EBP algorithm not only requires much more time than LM algorithm, but also is not able to solve the problem unless excessive number of neurons is used. EBP algorithm requires at least 12 neurons and the LM algorithm can solve it in only 8 neurons. Table 1 Training results of two-spiral problem Neurons
Success Rate
Average Iteration
Average Time (s)
EBP
LM
EBP
LM
EBP
LM
8
0%
13%
/
287.7
/
0.88
9
0%
24%
/
261.4
/
0.98
10
0%
40%
/
243.9
/
1.57
11
0%
69%
/
231.8
/
1.62
12
63%
80%
410,254
175.1
633.91
1.70
13
85%
89%
335,531
159.7
620.30
2.09
14
92%
92%
266,237
137.3
605.32
2.40
15
96%
96%
216,064
127.7
601.08
2.89
16
98%
99%
194,041
112.0
585.74
3.82
Even having such a powerful training ability, LM algorithm is not welcomed by engineers because of its complex computation and several limitations: 1. Network architecture limitation The traditional implementation of LM algorithm by Hagan and Menhaj in their paper was developed only for multilayer perceptron (MLP) neural networks. Therefore, much more powerful neural networks [Hohil et al. 1999; Wilamowski 2009], such as fully connected cascade (FCC) or bridged multilayer perceptron (BMLP) architectures cannot be trained. 2. Network size limitation The LM algorithm requires the inversion of Hessian matrix (size: nw×nw) in every iteration, where nw is the number of weights. Because of the necessity of matrix inversion in every iteration, the speed advantage of LM algorithm over the EBP algorithm is less evident as the network size increases. 3. Memory limitation LM algorithm cannot be used for the problems with many training patterns because the Jacobian matrix becomes prohibitively too large. Fortunately, the network architecture limitation was solved by recently developed neuron-by-neuron (NBN) computation in papers [Wilamowski et al. 2008;
Neural Network Training with Second Order Algorithms
465
Wilamowski et al. 2010]. The NBN algorithm can be applied to train arbitrarily connected neural networks. The network size limitation still remains unsolved, so that the LM algorithm can be used only for small and medium size neural networks. In this paper, the memory limitation problem of the traditional LM algorithm is addressed and the proposed method of computation is going to solve this problem by removing Jacobian matrix storage and multiplication. In this case, second order algorithms can be applied to train very large-sized patterns [Wilamowski and Yu 2010]. The paper is organized as follows: Section 2 introduces the computational fundamentals of LM algorithm and addresses the memory limitation problem. Section 3 describes the improved computation for both quasi Hessian matrix and gradient vector in details. Section 4 implements the proposed computation on a simple parity-3 problem. Section 5 gives some experimental results on memory and training speed comparison between traditional Hagan and Menhaj LM algorithm and the improved LM algorithm.
2 Computational Fundamentals Before the derivation, let us introduce some indices which will be used in the paper: • p is the index of patterns, from 1 to np, where np is the number of training patterns; • m is the index of outputs, from 1 to no, where no is the number of outputs; • i and j are the indices of weights, from 1 to nw, where nw is the number of weights. • k is the index of iterations and n is the index of neurons. Other indices will be explained in related places. The sum square error (SSE) E is defined to evaluate the training process. For all patterns and outputs, it is calculated as: E=
1 np no 2 e pm 2 p =1 m=1
(1)
where: epm is the error at output m when training pattern p, defined as e pm = o pm − d pm
(2)
where: dpm and opm are desired output and actual output, respectively, at output m for training pattern p. The update rule of LM algorithm is: Δw k = ( H k + μ I) −1 g k
(3)
where: μ is the combination coefficient, I is the identity matrix, g is the gradient vector and H is the Hessian matrix.
466
H. Yu and B.M. Wilamowski
The gradient vector g and Hessian matrix H are de fined as: ∂E ∂w 1 ∂E g = ∂w 2 " ∂E ∂wnw ∂2E 2 ∂w1 ∂2E H = ∂w ∂w 2 1 " ∂2E ∂w nw ∂w1
(4)
∂2E ∂w1∂w 2
"
∂2E ∂w 22 " ∂2E ∂wnw ∂w 2
" " "
∂2E ∂w1∂w nw ∂2E ∂w 2 ∂w nw " ∂2E 2 ∂w nw
(5)
As one may notice, in order to perform the update rule (3), second order derivatives of E in (5) has to be calculated, which makes the computation very complex. In the Hagan and Menhaj implementation of LM algorithm, Jacobian matrix J was introduced to avoid the calculation of second order derivatives. The Jacobian matrix has the format: ∂e11 ∂w 1 ∂e12 ∂w1 " ∂e 1no ∂w1 J = " ∂e np1 ∂w1 ∂e np 2 ∂w1 " ∂e npno ∂w1
∂e11 ∂w 2 ∂e12 ∂w 2 " ∂e1no ∂w 2 " ∂e np1 ∂w 2 ∂e np 2 ∂w 2 " ∂e npno ∂w 2
" " " " " " " " "
∂e11 ∂wnw ∂e12 ∂wnw " ∂e1no ∂wnw " ∂e np1
∂wnw ∂e np 2 ∂wnw " ∂e npno ∂wnw
(6)
By combining (1) and (4), the elements of gradient vector can be calculated as: np no ∂e pm ∂E e pm = ∂wi p =1 m =1 ∂wi
(7)
So the relationship between gradient vector and Jacobian matrix can be presented by
g = JTe
(8)
Neural Network Training with Second Order Algorithms
467
By combining (1) and (5), the elements of Hessian matrix can be calculated as np no ∂e pm ∂e pm np no ∂e pm ∂e pm ∂ 2 e pm ∂2E e pm ≈ + = p =1 m =1 ∂wi ∂wi ∂wi ∂w j ∂wi ∂w j p =1 m =1 ∂wi ∂wi
(9)
The relationship between Hessian matrix and Jacobian matrix can be described by H ≈ JT J = Q
(10)
where: matrix Q is the approximated Hessian matrix, called quasi Hessian matrix. By integrating equations (3) and (8), (10), the implementation of LM update rule becomes Δw k = ( J kT J k + μ I ) −1 J kT e k
(11)
where: e is the error vector. Equation (11) is used as the traditional implementation of LM algorithm. Jacobian matrix J has to be calculated and stored at first; then matrix multiplications (8) and (10) are performed for further weight updating. According to the definition of Jacobian matrix J in (6), there are np×no×nw elements needed to be stored. It may work smoothly for problems with small and medium sized training patterns; however, for large-sized patterns, the memory limitation problem could be triggered. For example, the MNIST pattern recognition problem [Cao et al. 2006] consists of 60,000 training patterns, 784 inputs and 10 outputs. Using the simplest possible neural network (one neuron per each output), the memory cost for entire Jacobian matrix storage is nearly 35 gigabytes which would be quite an expensive memory cost for real programming.
3 Improved Computation The key issue leading to the memory limitation in traditional computation is that the entire Jacobian matrix has to be stored for further matrix multiplication. One may think that if both gradient vector and Hessian matrix could be obtained directly, without Jacobian matrix multiplication, there is no need to store all the elements of Jacobian matrix so that the problem can be solved. 3.1 Matrix Algebra for Jacobian Matrix Elimination There are two ways of matrix multiplication. If the row of the first matrix is multiplied by the column of the second matrix, then a scalar is obtained, as shown in Fig. 1a. If the column of the first matrix is multiplied by the row of the second matrix, then a partial matrix q is obtained, as shown in Fig. 1b. The number of scalars is nw×nw, while the number of partial matrices q, which later have to be summed, is np×no.
468
H. Yu and B.M. Wilamowski
JT
×
J
=
Q
=
q
(a)
JT
×
J
(b) Fig. 1 Two ways of matrix multiplication: (a) row-column multiplication results in a scalar; (b) column-row multiplication results in a partial matrix q
When JT is multiplied by J using the routine shown in Fig. 1b, partial matrices q (size: nw×nw) need to be calculated np×no times, then all of the np×no matrices q must be summed together. The routine of Fig. 1b seems complicated; therefore, almost all matrix multiplication processes use the routine of Fig. 1a, where only one element of the resulted matrix is calculated and stored each time. Even the routine of Fig. 1b seems to be more complicated than the routine in Fig. 1a; after detailed analysis (Table 2), one may conclude that the computation cost for both methods of matrix multiplication are basically the same. Table 2 Computation analysis between the two methods of matrix multiplication Multiplication Methods
Addition
Multiplication
Row-column (Fig. 1a)
(np × no) × nw × nw
(np × no) × nw × nw
Column-row (Fig. 1b)
nw × nw × (np × no)
nw × nw × (np × no)
In the specific case of neural network training, only one row of Jacobian matrix J (column of JT) is known for each training pattern, and there is no relationship among training patterns. So if the routine in Fig. 1b is used, then the process of creation of quasi Hessian matrix can be started sooner without necessity of computing and storing the entire Jacobian matrix for all patterns and all outputs. Table 3 roughly estimates the memory cost in two multiplication methods separately.
Neural Network Training with Second Order Algorithms
469
Table 3 Memory cost analysis between two methods of matrix multiplication Multiplication Methods
Elements for storage
Row-column (Fig. 1a)
(np × no) × nw + nw × nw + nw
Column-row (Fig. 1b)
nw × nw + nw
Difference
(np × no) × nw
Notice that the column-row multiplication (Fig. 1b) can save a lot of memory. 3.2 Improved Gradient Vector Computation Let us introduce gradient sub vector ηpm (size: nw×1):
η pm
∂e pm ∂e pm e pm ∂w1 ∂w1 ∂e pm ∂e pm = ∂w e pm = ∂w × e pm 2 2 " " ∂e pm ∂e pm ∂w e pm ∂w N N
(12)
By combining (7), (8) and (12), gradient vector g can be calculated as the sum of gradient sub vectors ηpm np no
g = η pm
(13)
p =1 m =1
By introducing vector jpm (size: 1×nw) ∂e pm j pm = ∂w1
∂e pm ∂w 2
"
∂e pm ∂wnw
(14)
sub vectors ηpm in (12) can be also written in the vector form η pm = j Tpm e pm
(15)
One may notice that for the computation of sub vector ηpm, only nw elements of vector jpm need to be calculated and stored. All the sub vectors can be calculated for each pattern p and output m separately, and summed together, so as to obtain the gradient vector g. Considering the independence among all training patterns and outputs, there is no need to store all the sub vector ηpm. Each sub vector can be summed to a temporary vector after its computation. Therefore, during the direct computation of gradient vector g using (13), only memory for jpm (nw elements) and epm (1 element) is required, instead of the whole Jacobian matrix (np×no×nw elements) and error vector (np×no elements).
470
H. Yu and B.M. Wilamowski
3.3 Improved Quasi Hessian Matrix Computation Quasi Hessian sub matrix qpm (size: nw×nw) is introduced as
q pm
∂e 2 pm ∂w1 ∂e pm ∂e pm = ∂w ∂w 2 1 " ∂e pm ∂e pm ∂w nw ∂w1
∂e pm ∂e pm ∂w1 ∂w 2
"
2
∂e pm ∂w 2 " ∂e pm ∂e pm ∂w nw ∂w2
" " "
∂e pm ∂e pm ∂w1 ∂w nw ∂e pm ∂e pm ∂w2 ∂w nw " 2 ∂e pm ∂w nw
(16)
By combining (9), (10) and (16), quasi Hessian matrix Q can be calculated as the sum of quasi Hessian sub matrix qpm np no
Q = q pm
(17)
p =1 m =1
Using the same vector jpm defined in (14), quasi Hessian sub matrix can be calculated as
q pm = j Tpm j pm
(18)
Similarly, quasi Hessian sub matrix qpm can be calculated for each pattern and output separately, and summed to a temporary matrix. Since the same vector jpm is calculated during the gradient vector computation above, no extra memory is required. With the improved computation, both gradient vector g and quasi Hessian matrix Q can be computed directly, without Jacobian matrix storage and multiplication. During this process, only a temporary vector jpm with N elements needs to be stored; in other words, the memory cost for Jacobian matrix storage is reduced by np×no times. In the MINST problem mentioned in section 2, the memory cost for the storage of Jacobian elements could be reduced from more than 35 gigabytes to nearly 30.7 kilobytes. From (16), one may also notice that all the sub matrix qpm are symmetrical. With this property, only upper or lower triangular elements of those sub matrices need to be calculated. Therefore, during the improved quasi Hessian matrix Q computation, multiplication operations in (18) and sum operations in (17) can be both reduced by half approximately. 3.4 Simplified ∂epm/∂wi Computation For the improved computation of gradient vector g and quasi Hessian matrix Q above, the key point is to calculate vector jpm (defined in (14)) for each training pattern and each output. This vector is equivalent of one row of Jacobian matrix J.
Neural Network Training with Second Order Algorithms
471
By combining (2) and (14), the element of vector jpm can be computed by ∂e pm ∂wi
=
∂ (o pm − d pm ) ∂wi
=
∂o pm ∂net pn ∂net pn
(19)
∂wi
where: netpn is the sum of weighted inputs at neuron n, calculated by net pn = x pi wi
(20)
where: xpi and wi are the inputs and related weights respectively at neuron n. Inserting (19) and (20) into (14), the vector jpm can be calculated by ∂o pm [ x p11 j pm = ∂net p1
" x p1i
"] "
∂o pm ∂net pn
[ x pn1
" x pni
"] "
(21)
where: xpni is the i-th input of neuron n, when training pattern p. Using the neuron by neuron (NBN) computation, in (21), xpni can be calculated in the forward computation, while ∂opm/∂netpn is obtained in the backward computation. Again, since only one vector jpm needs to be stored for each pattern and output in the improved computation above, the memory cost for all those temporary parameters can be reduced by np×no times. All matrix operations are simplified to vector operations.
4 Implementation For a better illustration of the improved computation, let us use the parity-3 problem as an example.Parity-3 problem has 8 patterns, each of which is made up of 3 inputs and 1 output, as shown in Fig. 2.
Fig. 2 Parity-3 problem: 8 patterns, 2 inputs and 1 output
w
3
The structure, 2 neurons in FCC network (Fig. 3), is used to train parity-3 patterns.
Fig. 3 Two neurons in fully connected cascade network
472
H. Yu and B.M. Wilamowski
In Fig. 3, all weights are initialed by w={w1,w2,w3,w4,w5,w6,w7,w8,w9}. Also, all elements in both gradient vector and quasi Hessian matrix are set to “0”. Applying the first training pattern (-1, -1, -1, -1), the forward computation is organized from inputs to output, as 1. 2. 3. 4. 5.
net11=1×w1+(-1) ×w2+(-1) ×w3+(-1) ×w4 o11= f(net11), where f() is the activation function for neurons net12=1×w5+(-1) ×w6+(-1) ×w7+(-1) ×w8+o11×w9 o12=f(net12) e11=-1-o12
Then, the backward computation, from output to inputs, does the calculation of ∂e11/∂net11 and ∂e11/∂net12 in the following steps: 6. Using the results from steps 4) and 5), it could be obtained ∂e11 ∂ (−1 − o12 ) ∂f (net12 ) = =− ∂net12 ∂net12 ∂net12
(22)
7. Using the results from steps 1), 2) and 3), and the chain-rule in differential, one can obtain that: ∂e11 ∂(−1 − o12 ) ∂f (net12 ) ∂net12 ∂o11 ∂f (net12 ) ∂f (net11 ) = =− =− × w9 × ∂net11 ∂net11 ∂net12 ∂o11 ∂net11 ∂net12 ∂net11
(23)
Using equation (21), the elements in j11 can be calculated as ∂o j 11 = 11 [1 − 1 − 1 − 1] ∂net11
∂o11 [1 − 1 − 1 − 1 o11 ] ∂net12
(24)
By combining equations (15) and (24), the first sub vector η11 can be obtained as η11 = [s1 − s1 − s1 −s1
s2
−s 2
−s2
−s2
s 2 o11 ] × e11
(25)
where: s1= ∂e11/∂net11 and s2=∂e11/∂net12. By combining equations (18) and (24), the first quasi Hessian sub matrix q11 can be calculated as s12 q11 =
− s12
− s12
− s12
s1 s 2
− s1 s 2
− s1 s 2
− s1 s 2
s12
s12 s12
s12 s12 s12
− s1 s 2 − s1 s 2 − s1 s 2
s1 s 2 s1 s 2 s1 s 2
s1 s 2 s1 s 2 s1 s 2
s1 s 2 s1 s 2 s1 s 2
s 22
− s 22 s 22
− s 22 s 22 s 22
− s 22 s 22 s 22 s 22
s1 s 2 o11 − s1 s 2 o11 − s1 s 2 o11 − s1 s 2 o11 s 22 o11 − s 22 o11 − s 22 o11 2 − s 2 o11 2 s 22 o11
(26)
One may notice that in (26), only upper triangular elements of sub matrix q11 are calculated, since all quasi Hessian sub matrices are symmetrical (as analyzed in section 3.3). This further simplifies the computation.
Neural Network Training with Second Order Algorithms
473
So far, the first sub gradient vector η11 and quasi Hessian sub matrix q11 are calculated as equations (25) and (26), respectively. Then the last step for training the pattern (-1, -1, -1, -1) is to add the vector η11 and matrix q11 to gradient vector g and quasi Hessian matrix Q separately. After the sum operation, all memory costs in the computation, such as j11, η11 and q11, can be released.
% Initialization Q=0; g=0 % Improved computation for p=1:np % Number of patterns % Forward computation … for m=1:no % Number of outputs % Backward computation … calculate vector jpm; % Eq. (21) calculate sub vector ηpm; % Eq. (15) calculate sub matrix qpm; % Eq. (18) % Eq. (13) g=g+ηpm; Q=Q+qpm; % Eq. (17) end; end;
Fig. 4 Pseudo code of the improved computation
The computation above is only for training the first pattern of the parity-3 problem. For the other 7 patterns, the computation process is almost the same, except applying different input and output values. During the whole computation process, there is no Jacobian matrix storage and multiplication; only derivatives and outputs of activation functions are required to be computed. All the temporary parameters are stored in vectors which have no relationship with the number of patterns and outputs. Generally, for the problem with np training patterns and no outputs, the improved computation can be organized as the pseudo code shown in Fig. 4.
5 Experimental Results The experiments are designed to test the memory and training time efficiencies of the improved computation, comparing with traditional computation. They are divided into two parts, memory comparison and time comparison. 5.1 Memory Comparison Three problems, each of which has a huge number of patterns, are selected to test the memory cost of both the traditional computation and the improved computation. LM algorithm is used for training and the test results are shown in the tables below. The actual memory costs are measured by Windows Task Manager.
474
H. Yu and B.M. Wilamowski Table 4 Memory comparison for parity-14 and parity-16 problems Problems Patterns Structures* Jacobian matrix sizes Weight vector sizes
Average iteration Success Rate Algorithms Traditional LM Improved LM
Parity-14 16,384 15 neurons 20.6Mb 1.3Kb 99.2 13% 87.6Mb 11.8Mb
Parity-16 65,536 17 neurons 106.3Mb 1.7Kb 166.4 9% Actual memory cost 396.47Mb 15.90Mb
*All neurons are in fully connected neural networks.
For the test results in Tables 4 and 5, it is clear that memory cost for training is significantly reduced in the improved computation. Notice that, in the MINST pattern recognition problem, higher memory efficiency can be obtained by the improved computation if the memory costs for training patterns storage are removed. Table 5 Memory comparison for MINST pattern recognition problem Problems Patterns Structures Jacobian matrix sizes Weight vector sizes Algorithms Traditional LM Improved LM
MINST Problem 60,000 784=1 single layer network* 179.7Mb 3.07Kb Actual memory cost 572.8Mb 202.8Mb
*In order to perform efficient matrix inversion during training, only one digit is classified each time.
5.2 Time Comparison Parity-9, parity-11 and parity-13 problems are trained to test the training time for both traditional and the improved computation, using LM algorithm. For all cases, fully connected cascade networks are used for testing. For each case, the initial weights and training parameters are exactly the same. Table 6 Time comparison for parity-9, parity-11 and parity-13 problems Problems Patterns Neurons Weights Average Iterations Success Rate Algorithms Traditional LM Improved LM
Parity-9 512 8 108 35.1 38%
Parity-11 2,048 10 165 58.1 17%
Parity-13 8,192 15 315 88.2 21%
Averaged training time (ms) 2,226 73,563 2,868,344 1,078 19,990 331,531
Neural Network Training with Second Order Algorithms
475
From Table 6, one may notice that the improved computation can not only handle much larger problems, but it also computes much faster than the traditional one, especially for large-sized patterns training. The larger the pattern size is, the more time efficient the improved computation will be. As analyzed above, both the simplified quasi Hessian matrix computation and reduced memory contributes to the significantly improved training speed presented in Table 6. From the comparisons above, one may notice that the improved computation is much more efficient than traditional computation for training with Levenberg Marquardt algorithm, not only on memory requirements, but also training time.
6 Conclusion In this paper, the improved computation is introduced to increase the training efficiency of Levenberg Marquardt algorithm. Instead of storage the entire Jacobian matrix for further computation, the proposed method uses only one row of Jacobian matrix each time to calculate both gradient vector and quasi Hessian matrix gradually. In this case, the corresponding memory requirement is decreased by np×no times approximately, where np is the number of training patterns and no is the number of outputs. The memory limitation problem in Levenberg Marquardt training is eliminated. Based on the proposed method, the computation process of quasi Hessian matrix is further simplified using its symmetrical property. Therefore, the training speed of the improved Levenberg Marquardt algorithm becomes much faster than the traditional one, by reducing both memory cost and multiplication operations in quasi Hessian matrix computation. With the experimental results presented in section 5, one can conclude that the improved computation is much more efficient than traditional computation, not only for memory requirement, but also training time. The method was implemented in neural network trainer (NBN 2.10) [Yu and Wilamowski 2009; Yu et al. 2009], and the software can be downloaded from http://www.eng.auburn.edu/users/wilambm/nnt/
References [Cao et al. 2006] Cao, L.J., Keerthi, S.S., Ong, C.J., Zhang, J.Q., Periyathamby, U., Fu, X.J., Lee, H.P.: Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans. on Neural Networks 17(4), 1039–1049 (2006) [Hagan and Menhaj 1994] Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardt algorithm. IEEE Trans. on Neural Networks 5(6), 989–993 (1994) [Hohil et al. 1999] Hohil, M.E., Liu, D., Smith, S.H.: Solving the N-bit parity problem using neural networks. Neural Networks 12, 1321–1323 (1999) [Rumelhart et al. 1986] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986) [Wilamowski 2009] Wilamowski, B.M.: Neural network architectures and learning algorithms: How not to be frustrated with neural networks. IEEE Industrial Electronics Magazine 3(4), 56–63 (2009)
476
H. Yu and B.M. Wilamowski
[Wilamowski et al. 2008] Wilamowski, B.M., Cotton, N.J., Kaynak, O., Dundar, G.: Computing gradient vector and jacobian matrix in arbitrarily connected neural networks. IEEE Trans. on Industrial Electronics 55(10), 3784–3790 (2008) [Wilamowski et al. 2010] Yu, H., Wilamowski, B.M.: Neural network learning without backpropagation. IEEE Trans. on Neural Networks 21(11) (2010) [Wilamowski and Yu 2010] Yu, H., Wilamowski, B.M.: Improved Computation for Levenberg Marquardt Training. IEEE Trans. on Neural Networks 21(6), 930–937 (2010) [Yu and Wilamowski 2009] Yu, H., Wilamowski, B.M.: Efficient and reliable training of neural networks. In: Proc. 2nd IEEE Human System Interaction Conf. HSI 2009, Catania, Italy, pp. 109–115 (2009) [Yu et al. 2009] Yu, H., Wilamowski, B.M.: C++ implementation of neural networks trainer. In: Proc. of 13th Int. Conf. on Intelligent Engineering Systems, INES 2009, Barbados (2009)
Complex Neural Models of Dynamic Complex Systems: Study of the Global Quality Criterion and Results G. Drałus Department of Electrical Engineering Fundamentals, Rzeszow University of Technology, Rzeszow, Poland [email protected]
Abstract. In this paper dynamic global models of input-output complex systems are discussed. Dynamic complex system which consists of two nonlinear discrete time sub-systems is considered. Multilayer neural networks in a dynamic structure are used as a global model. The global model is composed of two sub-models according to the complex system. A quality criterion of the global model contains coefficients which define the participation of sub-models in the global model. The main contribution of this work is the influence study on the global model quality of these coefficients. That influence is examined for different back propagation learning algorithms for complex neural networks.
1 Introduction Complex system is a wide term which can concern a different nature (technical, economical, biological). Simply, we can take into consideration interactions between units of the system, large dimensionality, a large number of interacting entities or unusual behavior. In this paper complex system means a dynamic input-output complex system which has a technical nature. In such a complex system can be distinguished elementary processes or elementary objects having inputs and outputs. It can be also pointed out connections between these processes or objects. Connection means that outputs of some objects are inputs of another object. Many examples of such complex systems can be found in chemical industry, for example chemical process of sulphuric acid production [Osowski 2007] or ammonium nitrite process production [Drałus and Świątek 2009]. Many mathematical methods allow us to model simple plants. However, the modeling of dynamic complex systems is a very important and difficult problem so far has not been well enough solved. In complex systems, we do not have to
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 477–495. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
478
G. Drałus
deal with particular simple plants, but with static or dynamic simple plants which are interconnected into a complex system. Additionally, there exist numerous interactions between units of complex systems. One of the basic problems is taking into consideration the quality of the system model as a whole and providing a suitable approximation quality of particular subsystems. Neural networks are investigated in an application to the identification and modeling complex process exhibiting nonlinearities and typical disturbances. Neural networks offer a flexible structure that can map arbitrary nonlinear functions, making them ideally suited for the modeling and control of complex, nonlinear systems [Hunt et al., 1992]. They are particularly appropriate for multivariable applications, where they can readily characterize the interactions between different inputs and outputs. A further benefit is that the neural architecture is inherently parallel, and can be applied to real time implementations. Nowadays neural networks have many applications in science particularly in modeling and control of systems. Neural networks can be used for modeling and identification of simple static and dynamic plants owing to their ability to approximate nonlinear functions [Hornik 1989]. However, modeling simple objects that are part of complex system are inadequate to modern science. Accurate and detailed models of complex systems are strong expected today. One of the tools that is suitable for modeling of complex systems are just the neural networks [Narendra and Parthasarathy 1990; Dahleh and Venkatesh 1997; Drapała and Świątek 2006; Drałus and Świątek 2009]. In this paper, modeling of dynamic complex systems by multilayer neural network is discussed. To model an input-output complex system was built an adequate complex neural model. The complex neural model is non-typical multilayer feedforward neural networks which are suitable to the complex system. The complex model is the global model. In the global model can be indicated parts of the model which are adequate to the simple plants in the complex system. These parts of the model are called dynamic sub-models. The influence of quality of submodels on the global model quality is discussed.
2 Models of Dynamic Complex Systems 2.1 Description of Dynamic Complex Systems Naturally, there are many structures of complex system. A very important case of complex system is the cascade complex system (series connection of each unit of a complex system). Each unit of complex system is a dynamic simple plant. There is known a method of modeling of dynamic simple plant called the series-parallel model of identification [Narendra and Parthasarathy 1990]. The idea of the seriesparallel model of identification of simple dynamic plants is developed in this paper for identification and modeling of complex dynamic systems.
Complex Neural Models of Dynamic Complex Systems
479
In the series-parallel model, the present and past input and output signals are transmitted via tapped delay lines (TDL). In this case there is no feedback from the network, but from the plant, when, the multilayer neural networks are in the series-parallel stage (learning mode). Because at this stage the model is a multilayer feedforward neural network so the networks can be learned according to the static back propagation learning algorithms. After the learning stage, if errors of modeling are sufficiently small, the series-parallel model is replaced by the parallel model with feedback loop from the output of the model (work mode). By this the dynamic models are obtained. Let’s consider a complex system which consists of R simple plants (see Fig. 1). A global model of the complex system also consists of R sub-models, respectively, denoted as M1..MR. The global model is composed in such a way which is suitable to the complex system. In that model the previous sub-model output is the input to the next sub-model.
Fig. 1 Dynamic complex system and their dynamic global model using TDL (series-parallel model)
In such a structure the output previous sub-model is the input to next submodel. The output of t-th sub-model is calculated by:
yˆ (r ) (k + r ) = f r ( yˆ ( r −1) (k + r − 1), w (r ) ) = f r ( f r −1 (" f1 (u (k ), w (1) ), ", w ( r −1) ), w ( r ) ) = fˆr (u (k ), w (1) , …, w ( r ) )
(1)
where: u(k ) is the external input for the global model and the complex system,
w (r ) are the parameters (weights) of r-th sub-model. The error for k-th step of input signal for r-th sub-model is the difference between the output
(r ) yˆ (k + r ) of the r-th sub-model (Mr) and the output
y ( r ) (k + r ) of simple plant (Pr) of the complex system, respectively, is defined as the following:
480
G. Drałus
e ( r ) (k + r ) = yˆ ( r ) (k + r ) − y ( r ) (k + r )
(2)
The performance index for each dynamic sub-model is defined as the sum of square errors for all K discrete time samples:
Qd ( r ) ( w ( r ) ) =
1 K J r (r ) 1 K (r ) T 2 (e(k + r )) e(k + r ) = ( yˆ j (k + r ) − y j (k + r )) 2 k =1 j =1 2 k =1
(3)
where K is the number of discrete time samples, J r is the number of outputs of the r-th plant, w (r ) is the set of parameters of r-th sub-model, yˆ ( r ) - output of r-th (r ) sub-model, y - the output of r-th simple plant.
Global quality assessment criterion for the global dynamic model is weighted sum of all dynamic sub-models performance indices (3) as the following: R
Qd (W ) = β r Qd( r ) ( w( r ) ) = r =1
Jr 1 K R (r ) (r ) 2 β r ( yˆ j (k + r ) − y j (k + r )) 2 k =1r =1 j =1
(4)
where W = [ w (1) , w ( 2) , … , w ( R ) ] is the set of parameters (weights) of the global model divided into subsets of sub-models, β r ∈ [0,1] are weight coefficients and
rR=1 β r = 1 .
The coefficients β determine the impact of particular sub-models on the global model quality. The influence of these coefficients on the global models quality will be investigated. In the particular case, considered in this paper, the complex system consists of two simple plants (R=2), thus the influence of two weight coefficients β1 and β2 on the quality of the global model will be investigated. 2.2 Motivation
The global quality criterion Qd (formula (4)) will be used to develop learning algorithms for complex neural models. Learning algorithms allow us to obtain the desired parameters of the complex neural model e.g. the weight of neural submodels. The global quality criterion Qd contains β coefficients, which determine the influence of sub-models quality (3) on the global quality criterion. The question left is how to select β coefficients properly. In case when complex system consists of two simple plants there are only two coefficients β1 and β2 in the global criterion (4). So should we select β1=0.5 and β2=0.5 e.g. which would mean an equal influence of both sub-models on the global model. Maybe it will be better to select following values β1=0.25 and β2=0.75 e.g. which means that the second sub-model has a greater impact on the global model than the fist sub-model.
Complex Neural Models of Dynamic Complex Systems
481
The basic question is what influence on the global model quality these coefficients have e.g. how sub-models quality influence on the global model quality? Also what influence on the quality of sub-models they have? The study is carried out in a such way that for determined neural architecture of the global model and under the same initial conditions, in constant amount of learning steps in each particular case we conduct a parameter model selection of coefficient for changing β1 and β2 participation in the global criterion (4), ranging from 0.001 to 0.999 (as per formula β1+β2 =1). Many results were received during simulations. Results of simulations show sufficiently the influence of the β coefficients on the global model quality as well as sub-models quality. Knowing the results, e.g. how β coefficients influence the quality of the model can be consciously selected β1 and β2 values, to obtain an optimal model. Under that conscious selection we may better control the learning process. Thus, faster and more certainly we may achieve the required quality of the global model. 2.3 The Complex Gradient Backpropagation Learning Algorithm for Dynamic Complex Models
To develop the learning algorithm for multilayer neural networks, of which dynamic global model is constructed, it was necessary to modify and adapt the common gradient backpropagation algorithm [Gupta and al. 2003]. This modification must consider the fact that the complex model consists of static neural networks with tapped delay lines and feedback loop from the complex system under the learning stage and feedback loop from the complex model when the model works. The preparation of learning data using past samples of input and output signals must be done using TDL. To minimize the global quality criterion (4), the complex backpropagation learning algorithm for multilayer neural network was developed [Drałus 2004]. The change of the global model parameters e.g. the weights of neural networks were achieved by gradient computation by the following calculation:
Δw ji = −η
∂Qd ( W ) ∂w ji
(5)
The changes of weights in the layers after gradient calculations of the formula (5) are computed as follow:
482
G. Drałus
• for the output layer: K
,k + R Δw (jiR ),M = −η f ′( z M ) β R ( yˆ (jR ) ( k + R ) − y (jR ) ( k + R )) u iM −1, ( k + R − 1) j
(6)
k =1
• for the hidden layers: K
I m +1
k =1
l =1
Δw (jir ),m = −η f ′( z mj,k + r ) δ l( r ),m+1,k + r wlj( r ),m+1 u im−1 (k + r − 1)
(7)
• for the „binding” hidden layers, (e.g. output’s layers of sub-models in global complex model): K Im+1 Δw(jir ),m = −η f ′( z mj,k ) δ l(r +1),m+1,k wlj(r +1),m+1 + k =1 l =1 β r ( yˆ (jr ) (k + r ) − y (jr ) (k + r ) ⋅ uim−1 (k + r − 1)
(8)
The global model parameters W are adopted for the constant learning rate η and particular layers according to: w (jir ),m (k + 1) = w (jir ),m (k ) + ηΔw (jir ),m (k )
(9)
The obtained formulas (6-9) (which minimize the quality criterion (4)) are called the complex gradient backpropagation learning algorithm. This algorithm is very slow. However, it is the base for other learning algorithms which may be used for adjusting the neural global model parameters in the future. 2.4 The Complex Delta-Bar-Delta Learning Algorithm
Learning algorithms should be convergent and fast. Fast learning algorithms should be also developed for complex neural networks. There are many fast learning algorithms for simple neural networks that can be used to develop algorithms for complex network. One of these algorithms is the Delta-Bar-Delta learning algorithm [Jacobs 1988]. On the basis of Delta-Bar-Delta (DBD) algorithm and global quality criterion Qd the new complex Delta-Bar-Delta learning (complex DBD) algorithm for complex neural models was developed [Drałus 2010]. In this algorithm the learning rate is adaptive so speed of learning is fast.
Complex Neural Models of Dynamic Complex Systems
483
In the complex DBD algorithm the w parameters in the m-th layer of the neural model for k+1 learning step are given by the following formula: w mji (k + 1) = (1 − μ ) ⋅η mji (k )Δw (jir ),m (k ) + μ ⋅ w mji (k )
(10)
where µ is momentum term over the interval 0-1 and the change of the weights
Δw(jir ),m are calculated according to formulas (6-8). The adaptive learning rate η for the m-th layer in k+1 step of the learning is calculated as the following:
η mji (k + 1) = η mji (k ) + Δη mji (k )
(11)
The change of the learning rate Δη is given by:
a if S mji (k − 1) ⋅ D (jim) (k ) > 0 Δη mji (k ) = − b ⋅η (jim) (k − 1) if S mji (k − 1) ⋅ D (jim) (k ) < 0 if S mji (k − 1) ⋅ D (jim) (k ) = 0 0
(12)
The component S m ji ( k ) in formula (12) is calculated by the following formula: S mji (k ) = (1 − γ ) D mji (k ) + γS mji (k − 1) where D m ji =
∂Qd (W (k )) ∂w ji
(13)
, the γ coefficient in formula (13) is value over the in-
terval 0-1 (in the simulations γ=0.75), coefficients a and b in formula (12) are equal to a=0.002, b=0.2, respectively. In this algorithm the learning rate is adaptive and much faster than the learning rate in the complex gradient one. This algorithm allows us to find the proper W parameters of the global complex neural model much faster than the complex gradient one.
3 Simulation Study Let’s consider a dynamic nonlinear complex system which consists of two dynamic nonlinear simple plants connected in series. Both simple plants (denotes as P1 and P2, see Fig.2) of the complex system are described by the second-order nonlinear difference equations.
484
G. Drałus
Fig. 2 Dynamic discrete time complex system
The output of the first simple plant P1 (see Fig.2) is described by the following difference equation [Narendra and Parthasarathy 1990]: y (1) (k + 1) = f1 ( y (1) (k ), y (1) (k − 1), y (1) (k − 2), u(k ), u(k − 1))
(14)
The output of the second simple plant P2 is described by the following formula: y (2) (k + 2) = f 2 ( y (2) (k + 1), y (2) (k ), y (2) (k − 1), y (1) (k + 1), y (1) (k ))
(15)
The nonlinear functions f1 of the first simple plant P1 is given by the following formula: f1 ( x1 , x 2 , x3 , x 4 , x5 ) =
x1 x 2 x3 x5 ( x3 − 1) + x 4 1 + x 22 + x32
(16)
The second nonlinear functions f2 is given by the following formula: f 2 (v1 , v 2 , v3 , v 4 , v5 ) =
v 2 v3 v5 (v1 − 1) + v 4 1 + 2v12
(17)
As a global model of the considered complex system a 6-layer feedforward neural network was used (see Fig. 3). The global model which is the complex neural network has the following structure: one external input, 5 inputs neurons; 20 neurons (of the hyperbolic tangent transfer function) in the first hidden layer; 10 neurons in the second hidden layer; 1 linear neuron in the third layer called “binding” layer; 20 and 10 neurons in the fourth and the fifth layer, respectively; 1 linear neuron in the sixth (output) layer of the complex model (shortly, 1(5)-20-101(5)-20-10-1). The left part of the global model is the first sub-model (see Fig. 3) and the right part of the global model is the second sub-model. In the complex neural model there exist non typical hidden layers called “binding” hidden layers. The “binding” layer is the layer which connects sub-models in the complex model. The output of this layer is suitable to the output of the corresponding simple plant. In this model the “binding” hidden layer is the third layer. The architecture of the model in learning mode allows us to use the complex learning algorithms (developed above) for the multilayer neural networks.
Complex Neural Models of Dynamic Complex Systems
485
Fig. 3 The neural network as the global model of the complex system (series-parallel model in learning mode)
Fig. 4 The neural network as the global model of the complex system (parallel model in work mode)
After the learning stage (after adjusting the model parameters) the global model is switched to a work mode (the parallel model, see Fig. 4). This model utilizes one delay elements in the external input line and three delay elements in the output line. In such kind of architecture, the model inputs depend on the neural model delayed values. This architecture of the model allows us to approximate the true dynamic of the complex system.
486
G. Drałus
Learning data, containing 500 data points, random uniformly was distributed in the interval [-1, 1]. Initial weights were randomly generated, according to Nguyen–Widrow rule [Gupta et al. 2003]. For simulation study three leaning algorithms: the complex gradient, the complex DBD and DCRprop [Drałus and Świątek 2009] were used. The DCRprop is the heuristic algorithm and is the fastest algorithm with the previously mentioned. Three complex networks of the same architecture were learnt during 1000 epoch for the complex DBD and DCRprop algorithms and 2000 epochs for the complex gradient algorithm. All algorithms start from the same initial parameter of the model e.g. from the same initial weights of the complex network. For the system and the model inputs testing signal is given by the following formula:
u ( k ) = sin( 2πk / 250) for k ≤ 250 u ( k ) = 0.8 sin( 2πk / 250) + 0.2 sin( 2πk / 25) for 250 < k ≤ 500
(18)
The weights in the neural network were adjusted in one learning step after presentation of all 500 discrete time samples. The momentum µ in formula (10) is equal to 0.02 and η=0.002 in formula (9). All simulations were made on self-neural simulation tool. The results of the simulations for three learning algorithms are presented in numerical (tables) and graphic form (figures) for both learning and working mode of neural models. The performance indices Qd(1) and Qd(2) values and the global quality criterion Qd values in dependence on the coefficient β1 (where β1+β2=1) values are shown. The performance index Qd directly depends on the β1 and β2 coefficients. However, the β coefficients indirectly flow across the model parameters (1) (2) W on the indices Qd and Qd . 3.1 Simulation Results
The global criterion Qd, performance indices Qd(1) and Qd(2) after 2000 epochs of learning for the complex gradient learning algorithm are shown in Table 1 for learning (the series-parallel model, feedback from the simple plants) and for testing data (the parallel model, feedback from the sub-models). The values of the quality criteria for the complex DBD learning algorithm after 1000 learning epochs of are shown in Table 2. The global criterion Qd, performance indices Qd(1) and Qd(2) after 1000 epochs for DCRprop algorithm are shown in Table 3. All quality indicators were calculated for learning data when the model was in a learning mode. However, for test data indicators were calculated when the model was in a work mode.
Complex Neural Models of Dynamic Complex Systems
487
Table 1 Values of performance indices for learning and testing data after 2000 epochs for the complex gradient learning algorithm
β1 0.01 0.1 0.3 0.5 0.7 0.9 0.99
Qd(1)
Qd(2) data for learning
Qd
Qd(1)
16.96 3.428 2.835 2.306 1.592 9.080 1871
1.796 1.802 2.156 2.438 2.125 7.580 7194
1.921 1.965 2.360 2.372 1.752 8.937 1854
35.77 3.786 2.086 2.675 4.012 14.99 1770
Qd(2) Qd data for testing 2.077 2.479 1.633 1.234 0.896 3.591 107
2.414 2.610 1.769 1.954 3.077 13.85 1760
Table 2 Values of performance indices for learning and testing data after1000 epochs for the complex DBD learning algorithm
β1 0.001 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 0.999
Qd(1)
Qd(2) data for learning
Qd
Qd(1)
126.0 67.69 0.2480 0.1590 0.1180 0.0631 0.0406 0.0349 0.0347 0.0290 0.0250 0.0175 0.0170
0.0504 0.2260 0.0517 0.0739 0.0714 0.0315 0.0228 0.0262 0.0536 0.0637 0.1020 0.3260 0.6300
0.1770 0.9010 0.0714 0.0909 0.0853 0.0442 0.0317 0.0314 0.0404 0.0400 0.0326 0.0206 0.0177
812.0 565.0 4.365 2.595 2.750 2.800 3.520 3.807 3.001 3.456 1.485 4.282 4.904
Qd(2) Qd data for testing 5.820 1.886 0.298 0.695 0.366 0.388 0.365 0.362 0.673 0.920 0.920 1.013 2.689
6.630 7.530 0.704 1.075 1.080 1.353 1.942 2.429 2.302 2.949 1.429 4.249 4.902
Table 3 Values of performance indices for learning and testing data after1000 epochs for the CDRprop learning algorithm
β1
Qd(1)
Qd(2) data for learning
Qd
Qd(1)
0.001 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 0.999
299 142 0.386 0.214 0.196 0.174 0.150 0.132 0.114 0.142 0.109 0.091 0.142
0.155 0.302 0.0627 0.0613 0.0738 0.0847 0.0910 0.0866 0.0842 0.0903 0.0835 0.0735 0.0925
0.451 1.722 0.0951 0.0919 0.111 0.120 0.120 0.114 0.105 0.132 0.106 0.091 0.142
433 247 0.803 1.06 1.021 0.316 0.488 0.520 0.919 0.788 0.398 0.395 0.577
Qd(2) Qd data for testing 9.22 51.71 0.0850 0.0999 0.113 0.0266 0.0591 0.0512 0.0968 0.0859 0.0340 0.0211 0.1067
9.56 53.6 0.156 0.293 0.385 0.142 0.247 0.332 0.672 0.648 0.362 0.392 0.576
488
G. Drałus
A relative percentage error RPE as an additional performance index of modeling is introduced for each output of r-th sub-model: RPE ( r ) =
(r ) (r ) kK=1 yˆ ( k ) − y ( k ) (r ) kK=1 y ( k )
(19)
⋅ 100%
Values of RPE errors after 2000 epochs of learning are for the model learnt by the complex gradient algorithm are presented in Table 4. Values of RPE errors after 1000 epochs for the model learnt by the complex DBD and the DCRprop learning algorithm are presented in Table 5 and Table 6, respectively. Table 4 RPE errors in the global complex model for learning and testing data after 2000 epochs for the complex gradient learning algorithm
β1
0.01 0.1 0.3 0.5 0.7 0.9 0.99
RPE(1) [%]
RPE(2) [%]
RPE(1) [%]
RPE(2) [%]
data for learning
data for testing
38.0 16.8 16.2 14.5 11.6 29.5 2000
49.0 14.4 12.3 13.3 14.0 34.2 1200
16.2 16.1 18.9 18.1 17.5 33.3 234
17.0 19.0 15.2 12.5 12.3 23.3 130
Table 5 RPE errors in the global complex model for learning and testing data after 1000 epochs for the complex DBD learning algorithm
β1
0.001 0.010 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 0.999
RPE(1) [%]
RPE(2) [%]
RPE(1) [%]
RPE(2) [%]
data for learning
data for testing
257 188 10.5 7.80 6.75 5.20 4.15 3.82 3.80 3.60 3.31 2.73 2.69
260 221 14.2 10.3 11.5 12.5 13.6 16.8 12.3 13.8 9.60 14.7 15.3
5.70 11.5 5.66 6.51 6.00 4.27 3.80 3.89 5.75 6.08 7.62 14.1 33.2
21.0 18.2 7.05 10.6 8.05 8.25 7.57 13.4 10.1 12.0 11.2 12.4 18.0
Complex Neural Models of Dynamic Complex Systems
489
Table 6 RPE errors in the global complex model for learning and testing data after 1000 epochs for DCRprop learning algorithm
β1
RPE(1) [%]
RPE(2) [%]
RPE(1) [%]
data for learning
0.001 0.010 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 0.999
171 121 5.50 4.20 3.92 3.80 3.53 3.35 2.97 3.40 2.98 2.80 3.36
data for testing
177 140 4.70 4.64 5.32 3.47 3.80 3.77 4.60 4.07 3.04 3.19 4.28
4.50 6.45 2.85 2.80 3.02 3.25 3.35 3.36 3.15 3.40 3.24 3.11 3.43
a)
RPE(2) [%] 38 95 2.55 2.47 2.93 2.04 2.80 2.66 3.40 3.33 2.19 1.67 5.16
b)
Fig. 5 Values of performance indices Qd(1), Qd(2) and Qd for the complex gradient learning algorithm a) for learning data; b) for testing data
490
G. Drałus
a)
b)
Fig. 6 Values of performance indices Qd(1), Qd(2) and Qd for the complex DBD learning algorithm a) for learning data; b) for testing data
a)
b)
Fig. 7 Values of performance indices Qd(1), Qd(2) and Qd for DCRprop learning algorithm a) for learning data; b) for testing data
a) (1)
b) (2)
Fig. 8 Values of RPE and RPE learning data; b) for testing data
errors for the complex gradient learning algorithm a) for
Complex Neural Models of Dynamic Complex Systems
a) (1)
b)
Fig. 9 Values of RPE and RPE learning data; b) for testing data
(2)
a) (1)
Fig. 10 Values of RPE data; b) for testing data
491
errors for the complex DBD learning algorithm a) for
b)
and RPE
(2)
errors for DCRprop learning algorithm a) for learning
Fig. 11a shows signals on the outputs of the first sub-model and first simple plant for testing data. The output signals of the first sub-model and simple plant are the input signals to the second sub-model and the second simple plant, respectively. Fig. 11b also shows signals on the outputs of the second sub-model and second simple plant also for testing data. The output signals of the second submodel are also the output signals of the global model of the complex system.
492
G. Drałus
a)
b)
Fig. 11 The first simple plant and first sub-model outputs and second simple plant and second sub-model outputs for testing data in the complex model learnt by DCRprop learning algorithm
3.2 Analysis of the Impact of β Coefficients on the Complex Neural Model Quality for the Complex Gradient Learning Algorithm
The lowest values of Qd(1) index are achieved in the wide middle of β1 coefficient. At the end of the range of β1 the Qd(1) index rapidly increases for learning and testing data (see Fig. 5 and Table 1). The quality index Qd(2) is the lowest at the beginning and it increases while β1 grows for learning data. However, for testing data Qd(2) has minimum while β1=0.7. The Qd index achieved the lowest values at the beginning of β1 value and at the end of the range e.g. β1=0.999. The shapes of RPE error are similar to the shapes of performance indices for testing and learning data (see Fig.8). Values of RPE errors and performance indices are much higher for this complex gradient learning algorithm than the other algorithms. In this algorithm the learning rate η is constant therefore, this algorithm converges very slowly. 3.3 Analysis of the Impact of β Coefficients on the Complex Neural Model Quality for the DBD Learning Algorithm
The increase of β1 coefficient values (the decrease of β2) causes the monotonic decrease of the quality index Qd(1) for learning data. The smallest value of Qd(1) index is achieved for the maximum value of β1 coefficient (β1 is equal 0.999, see Fig. 6a and Table 2). The quality index Qd(2) has its minimum in the middle of the range of β1 (e.g. when β1=β2=0.5). While β1 increases from 0.6 to 0.999 then the quality index Qd(2) also increases to the maximum value.
Complex Neural Models of Dynamic Complex Systems
493
The global quality index Qd has minima while β1 is in the middle of the range (β1=0.5-0.6). The quality index Qd achieved the largest value while β1 coefficient is at the end of range e.g. β1=0.999. For very small values of β1, all quality indices proceed to very high values so it is not possible to adjust parameter of the first sub-model model and the global model is not reached either. Thus the quality sub-models have an influence on the global model quality. The plots of all performance indices have different shapes for testing and learning data (see Fig. 6). The quality indices Qd(2) and Qd have its minimum while β1 is equal to 0.1 (β2=0.9). The quality index Qd(1) has its minimum while β1 is equal to 0.9 (β2=0.1,see Table 2). Fig. 9 shows values of RPE errors for learning and testing data. In the middle range of β coefficients the RPE errors for both sub-models are on average about 5 percent for learning data and about 10 percent for testing data. The RPE errors are calculated for learning data when the model is in the seriesparallel mode, but the errors for testing data are calculated when the model is in work mode. Therefore the RPE errors are about twice higher for testing data than for learning data. 3.4 Analysis of the Impact of β Coefficients on the Complex Neural Model Quality for DCRprop Learning Algorithm
For learning data the quality index Qd(1) is the largest for the lowest value of β1 equal 0.001 and it is the lowest for the largest value of β1=0.999. The Qd(2) and Qd indices have minima while β1=0.2 and while β1=0.99 (see Table 3 and Fig. 7). However, for testing data Qd(1) and Qd indices achieved the global minima in the (1) middle of the range e.g. β1=0.4 but the Qd performance index achieved the local minimum while β1=0.4 and the global minimum while β1=0.9 (e.g. β2=0.1). The plots of RPE errors show the quality degree of the sub-models. The average levels of RPE errors for learning and testing data are similar (about 3 percent see Fig. 10). At the beginning of range of β coefficients for both sub-models RPE errors are larger than in the middle and near the end of the range of β coefficient. However, RPE errors increased at the end e.g. β1=0.999.
4 Summary Three types of learning algorithms used in simulations gave different results. The worst results were obtained for the complex gradient algorithm. The average level of RPE errors is about 15 percent. The obtained best result of RPE errors are equal about 11.6 percent for learning data and about 12.4 percent for testing data (see Fig. 8). For the complex DBD algorithm with adaptive learning rate the obtained results were much better. The average percentage RPE error was at around 6 percent for learning data and at around 10 percent for testing data. For testing data, the lowest RPE(1) error value of the first sub-model is 9.6 percent and the lowest value of RPE(2) is 7.05 percent for the second sub-model (see Fig. 9).
494
G. Drałus
However, the best results were obtained for DCRprop learning algorithm. The average RPE errors rate was approximately 3 percent for both learning and testing data. The lowest value of the error RPE(1) is equal to 2.80 percent and the error of the RPE(2) is also equal to 2.80 percent but for different values of the β1 coefficient (for learning data, Fig. 10a). Similarly, for testing data, the lowest value of RPE(1) error is equal to 3.04 percent but RPE(2) error is equal to 1.67 percent (Fig. 10b). For DCRprop algorithm, the best results of quality index Qd(1) were obtained for high values of β1 for learning data, and for β1 value less than half (β1=0.4) for testing data. Similarly, the quality of the second sub-model is low for large values of β2 (small of β1) for learning data and for testing data are low for small values of β2. The minimum values of index Qd(2) for learning and testing data are similar and the shape of the graph of index Qd(2) is also similar. However, the shape and minimum values for the quality index Qd(1) come together well for learning and testing data. The overall global quality index Qd is similar to Qd(2) index for small β1 but for large β1 is similar to Qd(1) in accordance with formula (4). Also for the complex DBD algorithm, the course of Qd(1) index for learning data differs significantly from the course for testing data. The course of Qd(2) index is similar for both learning and testing data. Levels of quality indices for testing data are significantly higher than for learning data. The model trained by the complex DBD learning algorithm is sensitive to type of data and the change of the configuration of the model. The model trained by complex gradient algorithm (the model is not fully trained) quality indicators and their shape for learning data are quite similar as for testing data. Unfortunately, this algorithm is not very efficient. Thus, the most effective learning rate and quality of the model is DCRprop learning algorithm. As is apparent from the obtained results a neural model quality depends not only on the β weight coefficients but also on the applied learning algorithms. Of course the quality of the model also depends on the length of the training steps and the representation of the training set, but these issues are not dealt with this article.
5 Conclusions The global model of the dynamic input-output complex system was presented. The dynamic complex system which consists of two nonlinear discrete time subsystems was considered. Multilayer neural networks which have non-typical structure were introduced as models of complex systems. The influence of β coefficients of the global performance index on the global model quality was investigated. The results of the investigation were shown for three learning algorithms and both learning and testing data. The β coefficient directly influence on the global quality criterion according to formula (4). The obtained results shows, that there exists the impact of
Complex Neural Models of Dynamic Complex Systems
495
β coefficients on the quality of sub-models of simple plants. After these simulations, there are known the courses of quality criteria as the functions β weight factors. Quality of the global model depends on the performance indices of sub-models and β weight coefficients according to formula (4). Formula (4) is a weighted sum of the quality of the models and the weighting factors. The obtained results confirm this formula. Knowledge about the courses of quality criteria allows us to choose the values of the β coefficients to obtain the optimal global model as well as a better sub-model quality. On the other hand we can focus on the quality of one or two of sub-models according to the global model quality. This algorithms and architecture of the global neural models allow us to adjust the parameters of the global model effectively. The presented approach of modeling is useful for computer control systems of complex systems.
References [Dahleh and Venkatesh 1997] Dahleh, M.A., Venkatesh, S.: System Identification of complex systems; Problem formulation and results. In: Proc. of 36th Conf. on Decision & Contol, San Diego, CA, pp. 2441–2446 (1997) [Drałus 2004] Drałus, G.: Modeling of dynamic nonlinear complex systems using neural networks. In: Proc. of the 15th International Conference on Systems Science, Wroclaw, Poland, vol. III, pp. 87–96 (2004) [Drałus and Świątek 2009] Drałus, G., Świątek, J.: Static and dynamic complex models: comparison and application to chemical systems. Kybernetes: The Int. J. of Systems & Cybernetics 38(7/8) (2009) [Drałus 2010] Drałus, G.: Study on quality of complex models of dynamic complex systems. In: 3rd Conference on Human System Interactions, Digital Object Identifier, pp. 169–174 (2010), doi:10.1109/HSI.2010.5514570 [Drapała and Światek 2006] Drapała, J., Światek, J.: Modeling of dynamic complex systems by neural networks. In: Proc. of 18th Int. Conf. on Systems Engineering, Coventry University, UK, pp. 109–112 (2006) [Gupta et al. 2003] Gupta, M.M., Jin, L., Homma, N.: Static and dynamic neural networks – from fundamentals to advanced theory. John Wiley & Sons, Inc., Chichester (2003) [Hornik 1989] Hornik, K.: Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366 (1989) [Hunt and all, 1992] Hunt, K.J., Sbarbaro, D., Zbikowski, R., Gawthrop, P.J.: Neural networks for control systems – A survey. Automatica 28(8), 1083–1112 (1992) [Narendra and Parthasarathy 1990] Narendra, K.S., Parthasarathy, K.: Identification and control of dynamic systems using neural network. IEEE Trans. on Neural Networks 1(1), 4–27 (1990) [Jacobs 1988] Jacobs, R.A.: Increased rates of convergence through learning rate adaptation. Neural Networks 1, 295–307 (1988) [Osowski 2007] Osowski, S.: Modeling and simulation of dynamic systems and processes. Warsaw University of Technology Publishing House (2007)
Author Index
Adamczyk, K.
205
Machnicka, Z. 417 Małysiak-Mrozek, B. Materka, A. 3 Milik, A. 325 Mroczek, T. 147 Mrozek, D. 395 Mu˜noz, L.M. 15 Musetti, A. 359
Balestra, A. 159 Barkana, D. Erol 75 Bieda, R. 131 Bobrowski, L. 443 Byczuk, M. 3 Casals, A. 15 Chojnacki, S. 429 Cudek, P. 125
Nauth, P. 41 Nowak, L. 111 Noyes, E. 277
David, R.C. 223 Deligiannidis, L. 277 Di Iorio, A. 359 Dragos¸, C.A. 223, 261 Drałus, G. 477 Dwulit, M.P. 345 Fryc, B.
Hanada, H. 57 Hara, T. 57 Hippe, Z.S. 125, 147 Jaszuk, M. Jurczak, P.
Ogorzałek, M. Orio, S. 159 Ota, Y. 31
417
Gomuła, J. 191 Grzymała-Busse, J.W.
175 147
Kaszuba, K. 295 Kitani, M. 57 Kłopotek, M.A. 429 Kostek, B. 295 Kozielski, S. 395
395
125, 147
111
Paja, W. 191 Pałasi´nski, M. 417 Pancerz, K. 191 Pazzaglia, R. 159 Peroni, S. 359 Petriu, E.M. 223, 261 Ponsa, P. 15 Poryzała, P. 3 Precup, R.E. 223, 261 Preitl, S. 223, 261 Przystalski, K. 111 Pułka, A. 325 Pyzik, L. 237 R˘adac, M.B. 223 Rakytyanska, H.B. 375
498 Ravarelli, A. 159 Rotshtein, A.P. 375 Sakaino, S. 91 Sato, T. 91 Sawada, H. 57 Sp˘ataru, S.V. 223 Sur´owka, G. 111 ´ Swito´ nski, A. 131 Szkoła, J. 191 Szostek, G. 175 Szyma´nski, Z. 345
Author Index Vitali, F.
359
Walczak, A. 175, 205 Wilamowski, B.M. 313, 463 Wojciechowski, K. 131 Xie, T.T.
313
Yakoh, T. 91 Yu, H. 313, 463 Zanetti, M.A.
159
Subject Index
Adamczyk, K.
205
Machnicka, Z. 417 Małysiak-Mrozek, B. Materka, A. 3 Milik, A. 325 Mroczek, T. 147 Mrozek, D. 395 Mu˜noz, L.M. 15 Musetti, A. 359
Balestra, A. 159 Barkana, D. Erol 75 Bieda, R. 131 Bobrowski, L. 443 Byczuk, M. 3 Casals, A. 15 Chojnacki, S. 429 Cudek, P. 125
Nauth, P. 41 Nowak, L. 111 Noyes, E. 277
David, R.C. 223 Deligiannidis, L. 277 Di Iorio, A. 359 Dragos¸, C.A. 223, 261 Drałus, G. 477 Dwulit, M.P. 345 Fryc, B.
Hanada, H. 57 Hara, T. 57 Hippe, Z.S. 125, 147 Jaszuk, M. Jurczak, P.
Ogorzałek, M. Orio, S. 159 Ota, Y. 31
417
Gomuła, J. 191 Grzymała-Busse, J.W.
175 147
Kaszuba, K. 295 Kitani, M. 57 Kłopotek, M.A. 429 Kostek, B. 295 Kozielski, S. 395
395
125, 147
111
Paja, W. 191 Pałasi´nski, M. 417 Pancerz, K. 191 Pazzaglia, R. 159 Peroni, S. 359 Petriu, E.M. 223, 261 Ponsa, P. 15 Poryzała, P. 3 Precup, R.E. 223, 261 Preitl, S. 223, 261 Przystalski, K. 111 Pułka, A. 325 Pyzik, L. 237 R˘adac, M.B. 223 Rakytyanska, H.B. 375
500 Ravarelli, A. 159 Rotshtein, A.P. 375 Sakaino, S. 91 Sato, T. 91 Sawada, H. 57 Sp˘ataru, S.V. 223 Sur´owka, G. 111 ´ Swito´ nski, A. 131 Szkoła, J. 191 Szostek, G. 175 Szyma´nski, Z. 345
Subject Index Vitali, F.
359
Walczak, A. 175, 205 Wilamowski, B.M. 313, 463 Wojciechowski, K. 131 Xie, T.T.
313
Yakoh, T. 91 Yu, H. 313, 463 Zanetti, M.A.
159
Subject Index
A ABCD rule 125 Analyze medical texts 175 Antilock braking system 223 Approximate reasoning modeling 417 Artificial neural networks 111, 477 Asymmetry 739 Autistic spectrum disorders (ASD) 159 Autonomous humanoid robots 41 B Back propagation learning algorithms 477 BCI performance 3 Biofeedback method 295 Bipartite graphs 429 BlackBoard platform 237 Boolean recommenders 429 Brain stroke 147 computer interface 3 C Cancer detection 131 Colonoscopy diagnostician 131 Complex neural networks 477 Computer assisted robotic systems 75 Convex and piecewise linear (CPL) 443 Copernicus system 191 CPL models 443
Digital images 125, 205 DWT (discrete wavelet transform) 295 Dwulit’s hull 345 Dynamic complex systems 477 E Edge-directed interpolation 205 Education in control engineering 261 EEG (electroencephalography) 295 Effective learning 295 E-learning platforms 237 Endoscopy diagnosis 131 Entrepreneurship education 277 EOG (electrooculogram) 295 Ergonomic haptic interaction 15 Eye tracking 159 F FDL model 325 Flash cards 295 Force-sensorless bilateral control 91 FPGA Xilinx Virtex5 device 325 Fuzzy control systems 223 default logic (FDL) 325 If-Then rules 377 model 223 relational calculus 375 relational equations 378 relational identification 375 relational matrix 377 systems 313 G
D Data mining system 148 Dermatoscopic images 111
Gastroscopy diagnostician 131 Generalized rules 417 Genetic algorithm 379 Glasgow outcome scale (GOS) 147
500 Goal understanding 41 Gradient vector 463
Subject Index
Haptic interface 15 Healthcare support 31 Hemispherical synchronization 295 Hessian matrix 463 Human computer interaction 277 machine interaction 159 robot systems 15 Hyperlexia profile 159
Melanoma diagnosis 125 Mental diseases 191 Mimicking human vocalization 57 MIMO object identification 376 Mind map 295 Minnesota multiphasic personality inventory (MMPI) 191 ML2EM laboratory equipment 261 Modeling of approximate reasoning 417 Modified Rankin scale (mRS) 147 Moodle platform 237 Multilayer neural networks 477 Multispectral imaging 131 objects detection 131
I
N
ILIAS platform 237 Image interpolation 205 Intelligent control architecture 75 robot 41 Interpolation method 205 Interval regression approach 443 uncertainty 443 Inverse problem 376
JSEG image segmentation 111
Nearest neighbor (NN) classification 345 Neural models 477 networks 111, 477 training 375 systems 313 Neuro-fuzzy systems 313 NGTS 148 Non-linear control surfaces 313 notes 295 Not-natively-semantic wikis 359 Null solution 380 Nursing and healthcare support 31
K
O
kNN algorithm 345 Knowledge extraction 375
Language impairments 159 Linear quadratic regulator (LQR) 223
Ontology design 175 led creation 359 Optimization problem 379 Orthopedic robotic system 75 Orthoroby 75 oWiki 359
M
P
Magnetic levitation system 261 Malignant melanoma 111 Manufacturing support 31 Medical diagnostic knowledge 175 Melanocytic skin lesions 125
Partner robots 31 Photodynamic diagnosis 131 Pigment skin lesions 111 Production deficits 159 Protein structure similarity searching 395
H
J
L
Subject Index R Real-life characteristics 429 Real-time experiments 261 Remote robots 91 teaching 237 Rule-based analysis 191 S Second order algorithms 463 Self generating will 41 Semantic analysis 111 data 359 model 175 Server-side query language 395 Short-distance personal mobility 31 Singing robot 57 Skin cancer images 111 Solution set 380 Solving fuzzy relational equations 378 Spectral pixel signatures 131 Spectrum estimation 131 Speech recognition 41 SQL language 395 Steady-state visual evoked potentials (SSVEPS) 3 Stimulus parameters 3
501 Stolz strategy 125 Support vector machines 111 T Takagi-Sugeno (T-S) fuzzy models 223 Talking robot 57 Telerobotic applications 15 Text comprehension 159 Thrust wires 91 Two electromagnets (MLS2EM) 261 Two-degree-of-freedom (two-DOF) 91 V VEPs spectral 3 Vocal cords 57 tract 57 Z ZSZN platform 237 2 2D anisotropic wavelet edge extractors 205 2-D visualizations 277 3 3-D visualizations 277