e-Learning Understanding Information Retrieval Medical
SERIES ON SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING Series Editor-in-Chief S K CHANG (University of Pittsburgh, USA)
Vol. 1
Knowledge-Based Software Development for Real-Time Distributed Systems Jeffrey J.-P. Tsai and Thomas J. Weigerf (Univ. Illinois at Chicago)
VOl. 2
Advances in Software Engineering and Knowledge Engineering edited by Vincenzo Ambriola (Univ. Pisa) and Genoveffa Torfora (Univ. Salerno)
VOl. 3
The Impact of CASE Technology on Software Processes edited by Daniel E. Cooke (Univ. Texas)
Vol. 4
Software Engineering and Knowledge Engineering: Trends for the Next Decade edited by W. D. Hurley (Univ. Pittsburgh)
VOl. 5
Intelligent Image Database Systems edited by S. K. Chang (Univ. Pittsburgh), 15.Jungerf (Swedish Defence Res. Establishment) and G. Torfora (Univ. Salerno)
Vol. 6
Object-Oriented Software: Design and Maintenance edited by Luiz F. Capretz and Miriam A. M. Capretz (Univ. Aizu, Japan)
VOl. 7
Software Visualisation edited by P. Eades (Univ. Newcastle) and K. Zhang (Macquarie Univ.)
Vol. 8
Image Databases and Multi-Media Search edited by Arnold W. M. Smeulders (Univ. Amsterdam) and Ramesh Jain (Univ. California)
VOl. 9
Advances in Distributed Multimedia Systems edited by S. K. Chang, T. F. Znati (Univ. Pittsburgh) and S. T. Vuong (Univ. British Columbia)
Vol. 10 Hybrid Parallel Execution Model for Logic-Based Specification Languages Jeffrey J.-P. Tsai and Sing Li (Univ. Illinois at Chicago) Vol. 11 Graph Drawing and Applications for Software and Knowledge Engineers Kozo Sugiyama (Japan Adv. Inst. Science and Technology) Vol. 12 Lecture Notes on Empirical Software Engineering edited by N. Jurist0 & A. M. Moreno (Universidad Politecrica de Madrid, Spain) Vol. 13 Data Structures and Algorithms edited by S. K. Chang (Univ. Pittsburgh, USA) Vol. 14 Acquisition of Software Engineering Knowledge SWEEP: An Automatic Programming System Based on Genetic Programming and Cultural Algorithms edited by George S. Cowan and Robert G. Reynolds (Wayne State Univ.) Vol. 15 Image: E-Learning, Understanding, Information Retieval and Medical Proceedings of the First International Workshop edited by S. Vitulano (Universita di Cagliari, Italy)
ngineering and Knowledge Engineering Series on Software Engineering Proceedings of the First International Workshop Cagliari, ltak
e-Learning Understanding Information Retrieval Medical
edited by Sergio Vitulano Uvliversitd degli Studi di Cagliari, Italy
r LeWorld Scientific
NewJersey London Singapore Hong Kong
9 - 10 June 2003
Published by World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224 USA ofice: Suite 202, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library
IMAGE: E-LEARNING, UNDERSTANDING, INFORMATION RETRIEVAL AND MEDICAL Proceedings of the First International Workshop Copyright 0 2003 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereoj may not be reproduced in any form or by any means, electronic or mechanical, includingphotocopying, recording or any information storage and retrieval system now known or to be invented, without written permissionfrom the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-238-587-8
Printed in Singapore by Mainland Press
This page intentionally left blank
PREFACE The role played by images in several human activities, that ranges from entertainment to studies and covers all phases of the learning process, is ever more relevant and irreplaceable. The computer age may be interpreted as a transformation of our social life in its working and leisure aspects. In our opinion this change is so relevant that it could be compared with the invention of printing, of the steam-engine or the discovery of radio-waves. While for a long time images could only be captured by photography, we are now able to capture, to manipulate and to evaluate images with the computer. Since original image processing literature is spread over many disciplines, we can understand the need to gather into a specific science all the knowledge in this field. This new science takes into account the image elaboration, transmission, understanding, ordering and finally the role of image in knowledge as a general matter. This book aims a t putting as evidence some of the above listed subjects. First of all we wish to emphasize the importance of images in the learning process and in the transmission of knowledge (e-Learning section). How much and what kind of information contents do we need in image comprehension? We try to give an answer, even if partially, in the Understanding section of this book. The big amount of images used in internet sites requires the solution of several problems. Their organization and the transmission of their content is the typical field of interest of information retrieval, which studies and provides solution to this specific problem. In the last two decades the number and the role played by images in the medical field has become ever more important. At the same time the physicians require methodologies typical of Computer Science for the analysis, the organization and for CAD (Computer-Aided Design) purposes applied t o medical images treatment. The Medical section of this volume gives examples of the interaction between computer science and medical diagnosis. This book tries to offer a new contribution to computer science that will inspire the reader to discover the power of images and to apply the new knowledge of this science adequately and successfully to his or her research area and to everyday life. Sergio Vitulano
vii
This page intentionally left blank
CONTENTS
vii
Preface
Medical Session Chairman: M. Tegolo An Introduction to Biometrics and Face Recognition
1
F. Perronnin, Jean-Luc Dugelay The Use of Image Analysis in the Early Diagnosis of Oral Cancer
21
3.Serpico, M. Petruzzi, M. De Benedittis Lung Edge Detection in Poster0 Anterior Chest Radiographs
27
P. Campadelli, E. Casimghi Discrete Tomography from Noisy Projections
38
C. Valenti An Integrated Approach to 3D Facial Reconstruction from Ancient Skull
46
A. F. Abate, M. Nappi, S. Ricciardi, G. Tortora e-Learning Session Chairman: M. Nappi The e-Learning Myth and the New University
60
V. Cantoni, M. Porta, M. G. Semenza e-Learning - The Next Big Wave: How e-learning will enable the transformation of education
69
R. Straub, C. Milani Information Retrieval Session Chairman: V. Cantoni Query Morphing for Information Fusion S.-K. Chang
ix
86
X
Image Representation and Retrieval with Topological Trees C. Grana, G. Pellacani, S. Seidenari, R. Cucchiara
112
An Integrated Environment for Control and Management of Pictorial Information Systems A . F. Abate, R. Cassino, M. Tucci
123
A Low Level Image Analysis Approach to Starfish Detection V. Di Geszi, D. Tegolo, F. Isgrd, E. Trucco
132
A Comparison among Different Methods in Information Retrieval F. Cannavale, V. Savona, C. Scintu
140
HER Application on Information Retrieval A . Casanova, M . Praschini
150
Understanding Session Chairman: Jean-Luc Dugalay Issues in Image Understanding V , Di Geszi
159
Information System in the Clinical-Health Area G. Madonna
178
A Wireless-Based System for an Interactive Approach to Medical Parameters Exchange G. Fenu, A . Crisponi, S. Cugia, M. Picconi
200
1
AN INTRODUCTION TO BIOMETRICS AND FACE RECOGNITION
F. PERRONNIN*AND J.-L. DUGELAY Eurecom Institute Multimedia Communications Department 2229, route des Crgtes - B. P. 193 06904 Sophia-Antipolis ce'dex - France E-mail: {perronni,dugelay} @eurecom.fr
We present in this paper a brief introduction to biornetrics which refers to the problem of identifying a person based on his/her physical or behavioral characteristics. We will also provide a short review of the literature on face recognition with a special emphasis on frontal face recognition, which represents the bulk of the published work in this field. While biornetrics have mostly been studied s e p arately, we also briefly introduce the notion of multirnodality, a topic related to decision fusion and which has recently gained interest in the biometric community.
1. Introduction to Biometrics
The ability to verify automatically and with great accuracy the identity of a person has become crucial in our society. Even though we may not notice it, our identity is challenged daily when we use our credit card or try t o gain access to a facility or a network for instance. The two traditional approaches t o automatic person identification, namely the knowledge-based approach which relies on something that you know such as a password, and the token-based approach which relies on something that you have such as a badge, have obvious shortcomings: passwords might be forgotten or guessed by a malicious person while badges might be lost or stolen '. Biometrics person recognition, which deals with the problem of identifying a person based on his/her physical or behavioral characteristics, is an alternative to these traditional approaches as a biometric attribute is inherent to each person and thus cannot be forgotten or lost and might be difficult t o forge. The face, the fingerprint, the hand geometry, the iris, 'This work was supported in part by France Telecom Research
1
L
etc. are examples of physical characteristics while the signature, the gait, the keystroke, etc. are examples of behavioral characteristics. It should be underlined that a biometric such as the voice is both physical and behavioral. Ideally a biometric should have the following properties: it should be universal, unique, permanent and easily collectible 2 . In the next three sections of this introductory part, we will briefly describe the architecture of a typical biometric system, the measures to evaluate its performance and the possible applications of biometrics.
1.1. Architecture
A biometric system is a particular case of a pattern recognition system Given a set of observations (captures of a given biometric) and a set of possible classes (for instame the set of persons that can be possibly
’.
identified) the goal is to associate to each observation one unique class. Hence, the main task of pattern recognition is to distinguish between the intru-class and inter-class variabilities. Face recognition, which is the main focus of this article, is a very challenging problem as faces of the same person are subject to variations due to facial expressions, pose, illumination conditions, presence/absence of glasses and facial hair, aging, etc. A biometric system is composed of at least two mandatory modules, the enrollment and recognition modules, and an optional one, the adaptation module. During enrollment, the biometric is first measured through a sensing device. Generally, before the feature eotraction step, a series of pre-processing operations, such as detection, segmentation, etc. should be applied. The extracted features should be a compact but accurate representation of the biometric. Based on these features, a model is built and stored, for instance in a database or on a smart card. During the recognition phase, the biometric characteristic is measured and features are extracted as done during the enrollment phase. These features are then compared with one or many models stored in the database, depending on the operational mode (see the next section on performance evaluation). During the enrollment phase, a user friendly system generally captures only a few instances of the biometric which may be insufficient to describe with great accuracy the characteristics of this attribute. Moreover, this biometric can vary over time in the case where it is non-permanent (e.g. face, voice). Adaptation maintains or even improves the performance of the system over time by updating the model after each access to the system.
3
~I-~I_ICUIIIIIF*llOh.EXTRACTION
ID
Figure 1. Architecture of a biometric system.
1.2. Performance Evaluation Generally, a biometric system can work under two different operational modes: identification or verification. During identification, the system should guess the identity of person among a set of N possible identities (1:N problem). A close-set is generally assumed, which means that all the trials will be from people which have a model in the database and the goal is hence to find the most likely person. During verification, the user claims an identity and the system should compare this identity with the stored model (1:l problem). This is referred as an open-set as persons which are not in the database may try to fool the system. One can sometimes read claims that identification is a more challenging problem than verification or vice-versa. Actually, identification and verification are simply two different problems. As it may not be enough to know whether the top match is the correct one for an identification system, one can measure its performance through the cumulative match score which measures the percentage of correct answers among the top N matches. Also one could use recall-precision curves as is done for instance to measure the performance of database retrieval systems. The FERET face database is the most commonly used database for assessing the performance of a system in the identification mode. A verification system can make two kinds of mistakes: it can reject a rightful user, often called client, or accept an impostor. Hence, the performance of a verification system is measured in terms of its false rejection rate ( F R R ) and false acceptance rate (FAR). A threshold is set to the scores obtained during the verification phase and one can vary this threshold t o
4
obtain the best possible compromise for a particular application depending on the required security level. By varying this threshold, one obtains the receiver operating curve (ROC), i.e. the FRR as a function of the FAR. To summarize the performance of the system with one unique figure, one often uses the equal error rate (EER) which corresponds to the point FAR=FRR. The M2VTS database and its extension, the XM2VTSDB 5 , are the most commonly used databases for assessing the performance of a system in the verification mode. The interested reader can also refer to for an introduction to evaluating biometric systems.
1.3. Applications
There are mainly four areas of applications for biometrics: access control, transaction authentication, law enforcement and personalization. Access control can be subdivided into two categories: physical and uirtual access control The former controls the access to a secured location. An example is the Immigration and Naturalization Service’s Passenger Accelerated Service System (INSPASS) deployed in major US airports which enables frequent travelers to use an automated immigration system that authenticates their identity through their hand geometry. The latter one enables the access to a resource or a service such as a computer or a network. An example of such a system is the voice recognition system used in the MAC 0s 9. Transaction authentication represents a huge market as it includes transactions at an automatic teller machine (ATM) , electronic fund transfers, credit card and smart card transactions, transactions on the phone or on the Internet, etc. Mastercard estimates that a smart credit card incorporating finger verification could eliminate 80% of fraudulent charges 8 . For transactions on the phone, biometric systems have already been deployed. For instance, the speaker recognition technology of Nuance is used by the clients of the Home Shopping Network or Charles Schwab. Law enforcement has been one of the first applications of biometrics. Fingerprint recognition has been accepted for more than a century as a means of identifying a person. Automatic face recognition can also be very useful for searching through large mugshot databases. Finally, personalization through person authentication is very appealing in the consumer product area. For instance, Siemens allows to personalize one’s vehicle accessories, such as mirrors, radio station selections, seating
’.
5
positions, etc. through fingerprint recognition
lo
In the following subsections, we will provide to the reader a brief review of the literature on face recognition. This review will be split into two parts: we will devote the next section to frontal face recognition which represents the bulk of the literature on and the “other modalities”, corresponding to different acquisition scenarios such as profile, range images, facial thermogram or video, will be discussed in section 3. The interested reader can refer to l1 for a full review of the literature on face recognition before 1995. We should underline that specific parts of the face (or the head) such as the eyes, the ears, the lips, etc. contain a lot of relevant information for identifying people. However, this is out of the scope of this paper and the interested reader can refer to l2 for iris recognition, t o l 3 for ear recognition and l4 for lips dynamics recognition. Also we will not review a very important part of any face recognition system: the face detection. For a recent review on the topic, the reader can refer to 1 5 .
2. Frontal Face Recognition
It should be underlined that the expression “frontal face recognition” is used in opposition to “profile recognition”. A face recognition system that would work only under perfect frontal conditions would be of limited interest and even “frontal” algorithms should have some view tolerance. As a full review, even of the restricted topic of frontal face recognition, is out of the scope of this paper, we will focus our attention on two very successful classes of algorithms: the projection-based approaches, i.e. the Eigenfaces and its related approaches, and the ones based on deformable models such as Elastic Graph Matching. It should be underlined that the three top performers at the 96 FERET performance evaluation belong t o one of these two classes ‘. 2.1. Eigenfaces and Related Approaches In this section, we will first review the basic eigenface algorithm and then consider its extensions: multiple spaces, eigenfeatures, linear discriminant analysis and probabilistic matching. 2.1.l. Eigenfaces Eigenfaces are based on the notion of dimensionality reduction. first outlined that the dimensionality of the face space, i.e. the space of variation
6
between images of human faces, is much smaller than the dimensionality of a single face considered as an arbitrary image. As a useful approximation, one may consider an individual face image to be a linear combination of a small number of face components or eigenfaces derived from a set of reference face images. The idea of the Principal Component Analysis (PCA) 17, also known as the Karhunen-Loewe Transform (KLT), is to find the subspace which best accounts for the distribution of face images within the whole space. Let { O i } i E [ l , be ~ ~the set of reference or training faces, 0 be the average face and Oi = Oi - 0. Oi is sometimes called a caricature image. Finally, if 0 = [ O 1 , 0 ~..., O N ] ,the scatter matrix S is defined as: N
S=
-pi@ =
0 0 T (1) i=l The optimal subspace PPCAis chosen to maximize the scatter of the projected faces:
P ~ C A= argmax P
~PSP~I
(2)
where 1.1 is the determinant operator. The solution to problem (2) is the subspace spanned by the eigenvectors [el, e2, ...e ~ ]also , called eigenfaces, corresponding t o the K largest eigenvalues of the scatter matrix S. It should be underlined that eigenfaces are not themselves usually plausible faces but only directions of variation between face images (see Figure 2). Each face image is represented by a point PPCAx Oi = [w:, w f ,...]w: in
Figure 2. (a) Eigenface 0 (average face) and (b)-(f) eigenfaces 1 to 5 as estimated on a subset of the FERET face database.
the K-dimensional space. The weights wk’s are the projection of the face image on the k - th eigenface ek and thus represent the contribution of each eigenface to the input face image.
7
To find the best match for an image of a person’s face in a set of stored facial images, one may calculate the Euclidean distances between the vector representing the new face and each of the vectors representing the stored faces, and then choose the image yielding the smdlest distance 18. 2.1.2. Multiple Spaces Approaches
When one has a large amount of training data, one can either pool all the data to train one unique eigenspace, which is known as the parametric approach or split the data into multiple training sets and train multiple eigenspaces which is known as the view-based approach. The latter approach has been designed especially to compensate for different head poses. One of the first attempts to train multiple eigenspaces was made in 19. This method, consists in building a separate eigenspace for each possible view 19. For each new target image, its orientation is first estimated by projecting it on each eigenspace and choosing the one that yields the smallest distance from face to space. The performance of the parametric and viewbased approaches were compared in l9 and the latter one seems to perform better. The problem with the view-based approach is that it requires large amounts of labeled training data to train each separate eigenspace. More recently Mixtures of Principal Components (MPC) were proposed to extend the traditional PCA An iterative procedure based on the Expectation-Maxamazationalgorithm was derived in both cases to train automatically the MPC. However, while 2o represents a face by the best set of features corresponding to the closest set of eigenfaces, in 21 a face image is projected on each component eigenspace and these individual projections are then linearly combined. Hence, compared to the former approach, a face image is not assigned in a hard manner to one eigenspace component but in a soft manner to all the eigenspace components. 21 tested MPC on a database of face images that exhibit large variabilities in poses and illumination conditions. Each eigenspace converges automatically to varying poses and the first few eigenvectors of each component eigenspace seem to capture lightning variations. 2oi21.
2.1 3. Eigenfeatures An eigenface-based recognition system can be easily fooled by gross variations of the image such as the presence or absence of facial hair 19. This shortcoming is inherent to the eigenface approach which encodes a global representation of the face. To address this issue, l9 proposed a modular or
8
layered approach where the global representation of the face is augmented by local prominent features such as the eyes, the nose or the mouth. Such an approach is of particular interest when a part of the face is occluded and only a subset of the facial features can be used for recognition. A similar approach was also developed in 22. The main difference is in the encoding of the features: the notion of eigenface is extended to eigeneyes, eigennose and eigenmouth as was done for instance in 23 for image coding. For a small number of eigenvectors, the eigenfeatures approach outperformed the eigenface approach and the combination of eigenfaces and eigenfeatures outperformed each algorithm taken separately. 2.1.4. Linear Discriminant Approaches
While PCA is optimal with respect to data compression 16, in general it is sub-optimal for a recognition task. Actually, PCA confounds intra-personal and extra-personal sources of variability in the total scatter matrix S. Thus eigenfaces can be contaminated by non-pertinent information. For a classification task, a dimension reduction technique such as Linear The idea Discriminant Analysis (LDA) should be preferred to PCA of LDA is to select a subspace that maximizes the ratio of the inter-class variability and the intra-class variability. Whereas PCA is an unsupervised feature extraction method, discriminant analysis uses the category information associated with each training observation and is thus categorized as supervised. Let O i , k be the k-th picture of training person i, Ni be the number of training images for person i and o i be the average of person i. Then SB and S,, respectively the between- and within-class scatter matrices, are given by: 24125726.
C
i= 1
i = l k=l
The optimal subspace PLDA is chosen to maximize the between-scatter of the projected face images while minimizing the within-scatter of the projected faces:
9
The solution to equation (5) is the sub-space spanned by [ e l , e n , . . . e ~ ] , the generalized eigenvectors corresponding to the largest eigenvalues of the generalized eigenvalue problem: SBek = /\kSwek
k= 1 , ... K
(6)
However, due to the high dimensionality of the feature space, Sw is generally singular and this principle cannot be applied in a straightforward manner. To overcome this issue, generally one first applies PCA to reduce the dimension of the feature space and then performs the standard LDA 24,26. The eigenvectors that form the discriminant subspace are often referred as Fisherfaces 24. In 2 6 , the space spanned by the first few Fisherfaces are called the m o s t discriminant features (MDF) classification space while PCA features are referred as m o s t expressive features (MEF). It should be
Figure 3. (a) Fisherface 0 (average face) and (b)-(f) Fisherfaces 1 to 5 as estimated on a subset of the FERET face database.
underlined that LDA induces non-orthogonal projection axes, a property which has great relevance in biological sensory systems 2 7 . Other solutions to equation 5 were suggested 27,28,29.
2.1.5. Probabilistic Matching While most face recognition algorithms, especially those based on eigenfaces, generally use simple metrics such as the Euclidean distance, 30 suggests a probabilistic similarity based on a discriminative Bayesian analysis of image differences. One considers the two mutually exclusives classes of variation between two facial images: the intra-personal and extra-personal variations, whose associated spaces are noted respectively RI and RE. Given two face images 01 and 0 2 and the image difference A = 01 - 0 2 , the similarity measure is given by P ( R I ~ A ) Using . Baye’s rule, it can be trans-
10
formed into:
The high-dimensionality probability functions P(AlR1) and P ( A ~ R E are ) estimated using an eigenspace density estimation technique 31. It was observed that the denominator in equation (7) had a limited impact on the performance of the system and that the similarity measure could be reduced to P(A\O,) with little loss in performance, thus reducing the computational requirements of the algorithm by a factor two. 2.2. Deformable Models
As noted in 32, since most face recognition algorithms are minimum distance pattern classifiers, a special attention should be paid to the definition of distance. The distance which is generally used is the Euclidean distance. While it is easy to compute, it may not be optimal as, for instance, it does not compensate for the deformations incurred from different facial expressions. Face recognition algorithms based on deformable models can cop with this kind of variation. 2.2.1. Elastic Graph Matching
Elastic Graph Matching algorithm (EGM) has roots in the neural network community 3 3 . Given a template image FT,one first derives a face model from this image. A grid is placed on the face image and the face model is a vector field 0 = { o i , j } where oi,j is the feature vector extracted at position ( i , j ) of the grid which summarizes local properties of the face (c.f. Figure 4(a). Gabor coefficients are generally used but other features, like morphological feature vectors, have also been considered and successfully applied to the EGM problem 34. Given a query image 3Q, one also derives a vector field X = { q j } but on a coarser grid than the template face (c.f. Figure 4(b)). In the EGM approach, the distance between the template and query images is defined as a best mapping M * among the set of all possible mappings { M }between the two vector fields 0 and X . The optimal mapping depends on the definition of the cost function C. Such a function should keep a proper balance between the local matching of features and the requirement t o preserve spatial distance. Therefore, a proper cost function should be of
11
...................................
Figure 4. (a) Template image and (b) query image with their associated grids. (c) Grid after deformation using the probabilistic deformable model of face mapping (c.f. section 2.2.3). Images extracted from the FERET face database.
the form:
where C, is the cost of local matchings, Ce the cost of local deformations and p is a parameter which controls the rigidity of the elastic matching and has to be hand-tuned. As the number of possible mappings is extremely large, even for lattices of moderate size, an exhaustive search is out of the question and an approximate solution has to be found. Toward this end, a two steps procedure was designed: 0
0
rigid matching: the whole template graph is shifted around the query graph. This corresponds to p + 00. We obtain an initial mapping M o . deformable matching: the nodes of the template lattice are then stretched through random local perturbations to reduce further the cost function until the process converges to a locally optimal mapping M * , i.e. once a predefined number of trials have failed to improve the mapping cost.
The previous matching algorithm was later improved. For instance, in 34 the authors argue that the two-stage coarse-to-fine optimization is sub-optimal as the deformable matching relies too much on the success of the rigid matching. The two stage optimization procedure is replaced with a probabilistic hill-climbing algorithm which attempts to find at each
12
iteration both the optimal global translation and the set of optimal local perturbations. In 35, the same authors further drop the C, term in equation (8). However, to avoid unreasonable deformations, local translations are restricted to a neighborhood.
2.2.2. Elastic Bunch Graph Matching
elaborated on the basic idea of EGM with the Elastic Bunch Graph Matching (EBGM) through three major extensions: 36
While the cost of local matchings in C, only makes use of the magnitude of the complex Gabor coefficients in the EGM approach, the phase information is used to disambiguate features which have a similar magnitude, but also to estimate local distortions. The features are no longer extracted on a rectangular graph but they now refer to specific facial landmarks called fiducial points. A new data structure called bunch graph which serves as a general representation of the face is introduced. Such a structure is obtained by combining the graphs of a set of reference individuals.
It should be noted that the idea of extracting features at positions which correspond t o facial landmarks appeared in earlier work. In 37 feature points are detected using a Gabor wavelet decomposition. Typically, 35 to 50 points are obtained in this manner and form the face graph. To compare two face graphs, a two-stage matching similar to the one suggested in 33 is developed. One first compensates for a global translation of the graphs and then performs local deformations for further optimization. However, another difference with 33 is that the cost of local deformations (also called topology cost) is only computed after the features are matched which results in a very fast algorithm. One advantage of 36 over 37 is in the use of the bunch graph which provides a supervised way to extract salient features. An obvious shortcoming of EGM and EBGM is that C,, the cost of local matchings, is simply a sum of all local matchings. This contradicts the fact that certain parts of the face contain more discriminant information and that this distribution of the information across the face may vary from one person t o another. Hence, the cost of local matchings at each node should be weighted according to their discriminatory power 38y39134935.
13
2.2.3. Probabilistic Deformable Model of Face Mapping
A novel probabilistic deformable model of face mapping 40, whose philosophy is similar to EGM 33, was recently introduced. Given a template face &-, a query face FQ and a deformable model of the face M , for a face identification task the goal is to estimate P(.TTI.FQ,M). The two major differences between EGM and the approach presented in 40 are: 0
0
In the use of the HMM framework which provides efficient formulas M ) and train automatically all the paramet o compute P(FTIFQ, ters of M . This enables for instance to model the elastic properties of the different parts of the face. In the use of a shared deformable model of the face M for all individuals, which is particularly useful when little enrollment data is available.
3. Other “Modalities” for Face Recognition
In this section we will very briefly review what we called the “other modalities” and which basically encompass the remaining of the literature on face recognition: profile recognition, recognition based on range data, thermal imagery and finally video-based face recognition. 3.1. Profile Recognition
The research on profile face recognition has been mainly motivated by requirements of law enforcement agencies with their so-called mug shot databases ”. However, it has been the focus of a relatively restricted number of papers. It should be underlined that frontal and profile face recognition are complementary as they do not provide the same information. A typical profile recognition algorithm first locates on the contour image points of interest such as the nose tip, the mouth, chin, etc. also called jiducial points and then extracts information such as the distances, angles, etc. for the matching (see 41 for an example of an automatic system based on this principle). An obvious problem with such an approach is the fact that it relies on an accurate feature extraction. Alternative approaches which alleviate this problem include (but are not limited to) the use of Fourier descriptors for the description of closed curves 42, the application of Eigenfaces to profiles l9 and, more recently, an algorithm based on string matching 4 3 .
14
3.2. Range Data
While a 2-D intensity image does not have direct access to the 3-D structure of an object, a range image contains the depth information and is not sensitive to lightning conditions (it can even work in the dark) which makes range data appealing for a face recognition system. The sensing device can be a rotating laser scanner which provides a very accurate and complete representation of the face as used for instance in However, such a scanner is highly expensive and the scanning process is very slow. In 46 the authors suggested the use the coded light approach for acquiring range images. A sequence of stripe patterns is projected onto the face and for each projection an image is taken with a camera. However, for shadow regions as well as regions that do not reflect the projected light, no 3-D data can be estimated which results in range images with a lot of missing data. Therefore, the authors decided to switch to a multi-sensor system with two range sensors acquiring the face under two different views. These two sets of range data are then merged. Although these sensing approaches reduce both the acquisition time and cost, the user of such a system should be cooperative which restricts its use. This may explain the fact that little literature is available on this topic. In 4 4 , the authors present a face recognition system based on range data template matching. The range data is segmented into four surface regions which are then normalized using the location of the eyes, nose and mouth. The volume between two surfaces is used as distance measure. In 45 the face recognition system uses features extracted from range and curvature data. Examples of features are the left and right eye width, the head width, etc. but also the maximum Gaussian curvature on the nose ridge, the average minimum curvature on the nose ridge, etc. In 4 6 , the authors apply and extend traditional 2-D face recognition algorithms (Eigenfaces and HMMbased face recognition 47) to range data. More recently, 48 point signatures are used as features for 3-D face recognition. These feature points are projected into a subspace using PCA. 44145.
3.3. Facial Therrnogmm
The facial heat emission patterns can be used to characterize a person. These patterns depend on nine factors including the location of major blood vessels, the skeleton thickness, the amount of tissue, muscle, and fat 4 9 . IR face images have the potential for a good biometric as this signatures is unique (even identical twins do not share the same facial thermogram)
15
and it is supposed to be relatively stable over time. Moreover, it cannot be altered through plastic surgery. The acquisition is done with an infrared (IR) camera. Hence, it does not depend on the lightning conditions, which is a great advantage over traditional facial recognition. However, 1R imagery is dependent on the temperature and IR is opaque to glass. A preliminary study 50 compared the performance of visible and IR imagery for face recognition and it was shown that there was little difference in performance. However, the authors in 50 did not address the issue of significant variations in illumination for visible images and changes in temperature for IR images.
3.4. Video-Based Recognition
Although it has not been a very active research topic (at least compared to frontal face recognition), video-based face recognition can offer many advantages compared to recognition based on still images: 0
0
0
Abundant data is available at both enrollment and test time. Actually one could use video at enrollment time and still images at test time or vice versa (although the latter scenario would perhaps make less sense). However, it might not be necessary to process all this data and one of the tasks of the recognition system will be the selection of an optimal subset of the whole set of images which contains the maximum amount of information. With sequences of images, the recognition system has access to dynamic features which provides valuable information on the behavior of the user. For instance, the BioID system l4 makes use of the lip movement for the purpose of person identification (in conjunction with face and voice recognition). Also dynamic features are generally more secure against fraud than static features as they are harder to replicate. Finally the system can try to build a model of the face by estimating t h 3-D depth of points on objects from a sequence of 2-D images which is known as structure from motion ll.
Video-based recognition might be extremely useful for covert surveillance, for instance in airports. However, this is a highly challenging problem as the system should work in a non-cooperative scenario and the quality of surveillance video is generally poor and the resolution is low.
16
4. Multimodality
Reliable biometric-based person authentication systems, based for instance on iris or retina recognition already exist but the user acceptance for such systems is generally low and they should be used only in high security scenarios. Systems based on voice or face recognition generally have a high user acceptance but their performance is not satisfying enough. Multimodality is a way to improve the performance of a system by combining different biometrics. However, one should be extremely careful about which modalities should be combined (especially, it might not be useful to combine systems which have radically different performances) and how to combine them. In the following, we will briefly describe the possible multimodality scenarios and the different ways to fuse the information. 4.1. Different Multirnodality Scenarios
We use here the exhaustive classification introduced in
51 :
(1) multiple biometric systems: consists in using different biometric attributes, such as the face, voice and lip movement 14. This is the most commonly used sense of the term multimodality. (2) multiple sensors: e.g. a camera and an infrared camera for face recognition. (3) multiple units of the same biometn'c: e.g. fusing the result of the recognition of both irises. (4) multiple instances of the same biometn'c: e.g. in video-based face recognition, fusing the recognition results of each image. (5) multiple algorithms on the same biometric capture.
We can compare these scenarios in terms of the expected increase of performance of the system over the monomodal systems versus the increase of the cost of the system, which can be split into additional software and hardware costs. In terms of the additional amount of information and thus in the expected increase of the performance of the system, the first scenario is the richest and scenarios (4) and (5) are the poorest ones. The amount of information brought by scenario (2) is highly dependent on the difference between the two sensors. Scenario (3) can bring a large amount of information as, for instance, the two irises or the ten fingerprints of the same person are different. However, if the quality of a fingerprint is low for a person, e.g. because of a manual activity, then the quality of the other
17
fingerprints is likely t o be low. The first two scenarios clearly introduce an additional cost as many sensors are necessary t o perform the acquisitions. For scenario (3) there is no need for an extra sensor if captures are done sequentially. However, this lengthens the acquisition time which makes the system less user-friendly. Finally, scenarios (1) and (5) induce an additional software cost as different algorithm are necessary for the different systems. 4.2. Information Fusion
As stated at the beginning of this section, multimodality improves the performance of a biometric system. The word performance includes both accuracy and eficiency. The assumption which is made is that different biometric systems make different types of errors and thus, that it is possible to use the complementary nature of these systems. This is a traditional problem of decision fusion 53. Fusion can be done at three different levels 52 (by increasing order of available information): 0
0
At the abstract level, the output of each classifier is a label such as the ID of the most likely person in the identification case or a binary answer such as accept/reject in the verification case. At the rank level the output labels are sorted by confidence. At the measurement level, a confidence measure is associated to each label.
Commonly used classification schemes such as the product rule, sum rule, min rule, max rule and median rule, are derived from a common theoretical framework using different approximations 54. In 5 5 , the authors evaluated different classification schemes, namely support vector machine (SVM), multi layer perceptron (MLP), decision tree, Fisher’s linear discriminant (FLD) and Bayesian classifier and showed that the SVM- and Bayesian-based classifiers had a similar performance and outperformed the other classifiers when fusing face and voice biometrics. In the identification mode, one can use the complementary nature of different biometrics to speed-up the search process. Identification is generally performed in a sequential mode. For instance, in 56 identification is a two-step process: face recognition, which is fast but unreliable is used to obtain an N-best list of the most likely persons and fingerprint recognition, which is slower but more accurate, is then performed on this subset.
18
5. Summary
We introduced in this paper biometrics, which deals with the problem of identifying a person based on his/her physical and behavioral characteristics. Face recognition, which is one of the most actively research topic in biometrics, was briefly reviewed. Although huge progresses have been made in this field for the past twenty years, research has mainly focused o n frontal face recognition from still images. We also introduced the notion of multimodality as a way of exploiting t h e complementary nature of monomodal biometric systems.
References 1. S. Liu and M. Silverman, “A practical guide to biometric security technology”, I T Professional”, vol. 3, no. 1, pp. 27-32, Jan/Feb 2001.
2. A. Jain, R. Bolle and S. Pankanti, “Biometrics personal identification in networked society”, Boston, MA: Luwer Academic, 1999. 3. R. 0. Duda, P. E. Hart and D. G. Stork, “Pattern classification”, 2nd edition, John Wiley & Sons, Inc. 4. P. J. Phillips, H. Moon, S. Rizvi and P. Rauss, “The FERET evaluation methodology for face recognition algorithms”, IEEE Tbans. on PAMI, 2000, vol. 22, no. 10, October. 5. K. Messer, J. Matas, J. Kittler and K. Jonsson, “XMZVTSDB: the extended M2VTS database”, AVBPA’99, 1999, pp. 72-77. 6. P. J. Phillips, A. Martin, C. L. Wilson and M. Przybocki, “An introduction t o evaluating biometric systems”, Computer, 2000, vol. 33, no. 2, pp. 56-63. 7. INSPASS, http://www.immigration.gov/graphics/howdoi/inspass.htm 8. 0. O’Sullivan, “Biometrics comes t o life”, Banking Journal, 1997, January. 9. Nuance, http://www.nuance.com 10. Siemens Automotive, http://media.siemensauto.com 11. R. Chellappa, C. L. Wilson and S. Sirohey, “Human and machine recognition of faces: a survey”, Proc. of the IEEE, 1995, vol. 83, no. 5, May. 12. J. Daugman, “HOWiris recognition works” ICIP, 2002, vol. 1, pp. 33-36. 13. B. Moreno, A. Sanchez and J. F. Velez, “On the use of outer ear images for personal identification in security applications”, IEEE 3rd Conf. on Security Technology, pp. 469-476. 14. R. W. Fkischholz and U.Dieckmann, “BioID: a multimodal biometric identification system”, Computer, 2000, vol. 33, no. 2, pp. 64-68, Feb. 15. E. Hjelmas and B. K. Low, “Face detection: a survey”, Computer Vision and Image Understanding, 2001, vol. 83, pp. 236-274. 16. M. Kirby and L. Sirovich, “Application of the karhunen-lohe procedure for the characterization of human faces,” IEEE Bans. on PAMI, vol. 12, pp. 103-108, 1990. 17. I. T. Joliffe, “Principal Component Analysis”, Springer-Verlag, 1986. 18. M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” in IEEE
19 Conf. on CVPR, 1991, pp. 586-591. 19. A. Pentland, B. Moghaddam and T. Starner, “View-based and modular eigenspaces for face recognition,” IEEE Conf. on CVPR, pp. 84-91, June 1994. 20. H.-C. Kim, D. Kim and S. Y. Bang, “Face recognition using the mixture-ofeigenfaces method,” Pattern Recognition Letters, vol. 23, no. 13, pp. 15491558, Nov. 2002. 21. D. S. Turaga and T. Chen, “Face recognition using mixtures of principal components,” IEEE Int. Conf. on IP, vol. 2, pp. 101-104, 2002. 22. R. Brunelli and T. Poggio, “Face Recognition: Features versus Templates”, IEEE Trans. o n PAMI, 1993, vol. 15, no. 10, pp. 1042-1052, Oct. 23. W. J. Welsh and D. Shah, “Facial feature image coding using principal components,” Electronic Letters, vol. 28, no. 22, pp. 2066-2067, October 1992. 24. P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection,” IEEE Transaction on PAMI, vol. 19, pp. 711-720, Jul 1997. 25. K. Etermad and R. Chellappa, “Face recognition using discriminant eigenvectors,” ICASSP, vol. 4, pp. 2148-2151, May 1996. 26. D. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE n u n s . on PAMI, vol. 18, no. 8, pp. 831-836, August 1996. 27. C. Liu and H. Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition,” IEEE P a n s . on IP, vol. 11, no. 4, pp. 467-476, Apr 2002. 28. L.-F. Chen, H.-Y. M. Liao, M.-T. KO, J.-C. Lin and G.-J. Yu, “A new ldabased face recognition system which can solve the small sample size problem,” Pattern Recognition, vol. 33, no. 10, pp. 1713-1726, October 2000. 29. J. Yang and J.-Y. Yang, “Why can Ida be performed in pca transformed space?,” Pattern Recognition, vol. 36, no. 2, pp. 563-566, February 2003. 30. B. Moghaddam, W. Wahid and A. Pentland, “Beyond eigenfaces: Probabilistic matching for face recognition,” IEEE Int. Conf. o n Automatic Face and Gesture Recognition, pp. 30-35, April 1998. 31. B. Moghaddam and A. Pentland, “Probabilistic visual learning for object recognition,” Int. Conf. o n Computer Vision, 1995. 32. J. Zhang, Y. Yan and M. Lades, “Face recognition: Eigenface, elastic matching, and neural nets,” Proc. of the IEEE, vol. 85, no. 9, Sep 1997. 33. M. Lades, J. C. Vorbriiggen, J. Buhmann, J. Lange, C. von der Malsburd, R. Wiirtz and W. Konen, “Distortion invariant object recognition in the dynamic link architecture,” IEEE Trans. o n Computers, 1993, vol. 42, no. 3. 34. C. L. Kotropoulos, A. Tefas and I. Pitas, “Frontal face authentication using discriminant grids with morphological feature vectors,’’ IEEE P a n s . on Multimedia, vol. 2, no. 1, pp. 14-26, March 2000. 35. A. Tefas, C. Kotropoulos and I. Pitas, “Using support vector machines to enhance the performance of elastic graph matching for frontal face recognition,” IEEE Trans. o n PAMI, vol. 23, no. 7, pp. 735-746, Jul 2001. 36. L. Wiskott, J. M. Fellous, N. Kriiger and C. von der Malsburg, “Face recognition by elastic bunch graph matching,” IEEE Trans. on PAMI, vol. 19, no.
20
7, pp. 775-779, July 1997. 37. B. S. Manjunath, R. Chellappa and C. von der Malsburg, “A feature based approach to face recognition,” Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 373-378, 1992. 38. N. Kriiger, “An algorithm for the learning of weights in discrimination functions using a priori constraints,” IEEE Trans. on PAMI, vol. 19, no. 7, Jul 1997. 39. B. DGc, S. Fischer and J. Bigiin, “Face authentication with gabor information on deformable graphs,” IEEE Trans. on IP, vol. 8, no. 4, Apr 1999. 40. F. Perronnin, Jean-Luc Dugelay and K. Rose, “Deformable Face Mapping for Person Identification”, ICIP, 2003. 41. C. Wu and J. Huang, “Human Face profile recognition by computer”, Pattern Recognition, vol. 23, pp. 255-259, 1990. 42. T. Aibara, K. Ohue and Y. Matsuoka, “Human face recognition of P-type Fourier descriptors”, SPIE Proc., vol 1606: Visual Communication and Image Processing, 1991, pp. 198-203. 43. Y. Can, M. Leung, “Human face profile recognition using attributed string”, Pattern Recognition, vol. 35, pp. 353-360. 44. G. Gordon, “Face recognition based on depth maps and surface curvature”, SPIE Proc., vol. 1570, pp. 234-247, 1991. 45. G. Gordon, “Face recognition based on depth and curvature features”, IEEE Conf on CVPR, 1992, pp. 808-810, 15-18 Jun. 46. B. Achermann, X. Jiang and H. Bunke, “Face recognition using range images”, VSMM, 1997, pp. 129-136, 10-12 Sep. 47. F. S. Samaria, “Face recognition using hidden Markov models”, Ph. D. thesis, University of Cambridge, 1994. 48. Y . Wang, C.-S. Chua and Y.-K. Ho, “Facial feature detection and face recognition from 2D and 3D images”, Pattern Recognition Letters, 2002, vols. 23, pp. 1191-1202. 49. M. Lawlor, “Thermal pattern recognition systems faces security challenges head on”, Signal Magazine, 1997, November. 50. J. Wilder, P. J. Phillips, C. Jiang and S. Wiener, “Comparison of visible and infra-red imagery for face recognition”, Int. Conf. on Automatic Face and Gesture Recognition, 1996, pp. 182-187, 14-16 Oct. 51. S. Prabhakar and A. Jain, “Decision-level fusion in biometric verification”, Pattern Recognition, 2002, vol. 35, no. 4, pp.861-874. 52. R. Brunelli and D. Falavigna, “Person identification using multiple cues”, IEEE Trans. on PAMI, 1995, vol. 17, no. 10, pp. 955-966, Oct. 53. B. V. Dasarathy, “Decision fusion”, IEEE Computer Society Press, 1994. 54. J. Kittler, M. Hatef, R. Duin and J. Matas, “On combining classifiers”, IEEE Trans. on PAMI, 1998, vol. 20, no. 3, pp. 226-239, 55. S. Ben-Yacoub, Y. Abdeljaoued and E. Mayorz, “Fusion of face and speech data for person identity verification”, IEEE Trans. on NN, 1999, vol. 10, no.5, Sept. 56. L. Hong and A. Jain, “Integrating faces and fingerprints for personal identification”, IEEE Trans. on PAMI, 1998, vol. 20, no. 12, pp. 1295-1307.
21
THE USE OF IMAGE ANALYSIS IN THE EARLY DIAGNOSIS OF ORAL CANCER R. SERPICO, M. PETRUZZI AND M DE BENEDI'ITIS Department of Odontostomatology and Surgery University of Bari. p.zza G. Cesare 11- Bari- ITALY E-mail: r.serpico@doc,uniba.it
Oral squamous cell carcinoma (OSCC) is a malignant neoplasm revealing a poor prognosis. Despite of the site where such disease arises, there are several cases where OSCC is not early detected by clinicians. Moreover, diagnostic delay shortens the prognosis. In literature several tools, with variable specificity and sensibility, of image analysis have been proposed in order to detect OSCC. Lesional autofluorescence analysis of OSCC has revealed effective, however different methods used to evoke the fluorescence. On the other hand, vital staining, such as toluidine blu, requires only a clinical assessment of the degree to detect the lesions. No studies have been performed by using a computerized analysis of OSCC images or a neural networks. The screening tool for an early OSSC detection should be inexpensive, easy to use and reliable. We hope for the use information development in OSCC lesions analysis to make its diagnosis early in order to extend the prognosis.
1. Definition and epidemiology of oral carcinoma Recently, it has been estimated that oral squamous cell carcinoma ( OSCC) represents 3% of all malignant neoplasms . OSCC, usually, affects more men than women so is considered the 6" most frequent male malignant tumour and the 12" female one. In U.S.A about 21.000 new cases of OSCC are diagnosed every year and 6.000 people die because of this disease. In the last decade OSCC has gone on developing. This has caused a terrible increase of under 30 individuals affected by oral carcinoma. A serious data concerns prognosis in these patients. If the neoplasm is detected within its 1'' or 2"d stage, the probability of living for five years will be 76%. This value will go down 41% , if the malignant tumour is diagnosed withm its 3rdstage. Only 9% of the patients goes on living after five years since OSCC diagnosis during its 4" stage. The diagnostic delay is caused by different reasons: carcinoma development. OSCC, during its manifestation, doesn't reveal any particular symptom or painful. So, the patient tends to ignore the lesion and hardly goes to the dentist to ask for a precise diagnosis; the polymorfism that the oral lesions often show. For example, an ulcer can appear similar to a trauma, aphtae major or carcinoma;
22
the doctors in charge who aren’t used to examining the oral cavity during the routine check-up. So, recent researches has proved that a person suffering from mucous lesions in the oral cavity goes first to h s family doctor who advises him a dermatological visit. Usually, carcinoma is detected after 80 days after its first symptoms, so, this delay even is responsible for the short OSCC prognosis . 2.
Fluorescence methodologies
Optical spectroscopy autofluorescence tissue is a sensitive, not invasive methodology sensitive, easy to use and capable of detecting possible alterations of the tissue. Autofluorescence results from the presence of porphyrin connected with neoplasm growth. Fluorescence given out the sound tissues, reveals a colour different from that one observed on tissues affected by carcinoma. This autofluorescence can be also stimulated by irradiations through laser, xenon light or halogen lamps.
Fig. 1. Fluorescence of OSCC localized at the border of tongue.(Oral Oncology 39 (2003) 150-156. ) Recently, it has showed a particular program which permit us to read digitalized images of fluorescing lesions. This system uses the following operating algorithm: 1. RGB FLUORESCENCE IMAGE 2. CONTRAST ENHANCEMENT 3. HUE EXTRACTION 4. HISTOGRAM THRESHOLDING 5. SEGMENTATION 6 . QUANTITATIVE PARAMETERS EXTRACTION 7. DIAGNOSTIC ALGORITHM 8. COMPARE WITH GOLD STANDARD
23
9. TISSUE CLASSIFICATION. These methodologies reveal an high sensibility ( about 95%) but a specificity of 51-60%. Scientific literature shows some researches on the use of neural networks able to make a good judgement on autofluorescence caused by dubious lesions, Using these neural networks, it’s possible to distinguish a sound tissue from a neoplasm one with a sensibility of 86% and a specificity of 100%. In realty, it has been proved that these methodologies are ineffective because aren’t able to identify the various mucous areas with their different dysplasia levels.
Fig. 2. Example of mean neural work input curves grouped according to the clinical diagnosis. (Oral Oncology 36 (2000) 286-293) Onizawa and his collaborators have tested the use of fluorescence methodologies on 55 patients suffering from OSCC. According to their research, 90% of the cases analysed has resulted positive to fluorescence. So, they have found out that the lesion staging is as major as the sensibility and specificity methodology.
3.
Toulidine blue
Toulidine blue is a metachromatic vital staining. Years ago it was employed by the gynaecologists but today is considered a good methodology for diagnosing OSCC.
24
Because of the colorant is particularly similar to acid, it can combine directly with genetic material ( DNA, RNA) of cells keeping on reproducing. So, it’s possible to note DNA and RNA synthesis in neoplasm clones increasing where neoplasm grows. This methodology is easy, inexpensive and doesn’t cause any physical discomfort. So, the patient must only rinse his oral cavity with acetic acid ( 1%) in order to remove cellular residues and all that’s on the lesion. Successively, it’s possible apply toulidine blue (1%) on the lesion for 30 seconds.
Fig. 3. Example of neoplastic lesion stained by using toluidine blue. Areas with more active mitosis stain more with toluidine blue. The patient rinses again the lesion with acetic acid to remove the excessive and not fixed colour. In this moment the clinician can detect the lesion according to the colour even though OSCC diagnosis depends largely on histology report. So, the coloured lesion can be defined : a) TRUE POSITIVE : the lesion has absorbed the colour and is an OSCC from an histological point of view; b) FALSE POSITIVE: the lesion has absorbed the colour but isn’t an OSCC from an histological point of view;
c) TRUE NEGATIVE: the lesion doesn’t absorb the colour and isn’t an OSCC from an histological point of view; d) FALSE NEGATIVE: the lesion doesn’t absorb the colour but is an OSCC from an histological point of view.
25
Fig.4. Example of traumatic lesion: even though stained by toluidine blue, the lesion is not a carcinoma (false positive).
In realty, this methodology is sensible but not particularly specific. The number of coloured lesions, even though aren’t cancerous, is large. Scientific literature shows different researches on the reliability of this methodology. The case histories reveal encouraging data about the diagnostic power of toulidine blue but no study has still considered the use of a digital reading of the lesion. Employing digital methodologies could make more reliable this test and , for example, it’s possible to use the different blue gradations invisible to the naked eye. The reading of the coloured lesions making use of toulidine blue aims to offer the dentists another diagnostic tool. It is inexpensive, easy to use and not invasive, so, can be normally used like screening for the patients who are used to go to the dentist. On the other hand this methodology makes possible an on-line communication of the digital images to specialized centres in order to have other consultations. Actually, there isn’t a screening methodology with a sensibility and specificity of 100%. However, the use of data processing system improves the reliability in diagnostic methodologies and offers an objective analysis. 4.
Conclusions
Scientific literature hasn’t showed trial which have compared the efficacy of the different methodologies used to analyse the image in OSCC diagnosis. We hope we will use an univocal, reliable and inexpensive reading methodology of the lesion. The information development should help clinic-medical diagnosis. It could be the ideal way to have an early diagnosis. This will cause a prognosis improvement which will make the relationship between medicine and computer science extraordinary.
26
Acknowledgments Authors are grateful to Annalisa Chiala for reviewing this paper. References 1. Benjamin S, Aguirre A and Drinnan A, Dent. Today. 21( 11):116 (2002).
2.
Llewellyn CD, Johnson NW and Warnakulasuriya KA, Oral. Oncol. 37(5):401 (2001).
3.
Neville, Damm, Allen, Bouquot: Oral & Maxillofacial Pathology. Saunders press. 2"d Edition- USA (2002).
4.
Onizawa K, Okamura N, Saginoya H and Yoshida H. Oral. Oncol. 39(2): 150 (2003).
5.
Onofre MA, Sposto MR and Navarro CM. Oral. Surg. Oral. Med. Oral. Pathol. Oral. Radiol. Endod. 91(5):535(2001).
6.
Porter SR and Scully C. Br. Dent. J. 25;185(2):72 (1998),
7.
Reichart PA. Clin. Oral. Investig. 5(4):207 (2001).
8. van Staveren HJ, van Veen RL, Speelman OC, Witjes MJ, Star WM and Roodenburg JL. Oral. Oncol. 36(3):286 (2000). 9.
Zheng W, So0 KC, Sivanandan R and Olivo M. Znt. J. Oncol. 21(4):763 (2002).
27
LUNG EDGE DETECTION IN POSTER0 ANTERIOR CHEST RADIOGRAPHS
PAOLA CAMPADELLI Dipartamento di Scienze dell 'Informazione, Universitd degli Studi d i Milano, Via Comelico, 39/41 20135, Milano, Italy E-mail: campadelli0dsi.unimi.it ELENA CASIRAGHI Dipartimento di Scienze dell 'Informazione, Uniuersitd degli Studi d i Milano, Via Comelico, 39/41 80135, Milano, Italy E-mail: casiraghiQdsi.unimi.it The use of image processing techniques and Computer Aided Diagnosis (CAD) systems has proved to be effective for the improvement of radiologists' diagnosis, especially in the case of lung nodules detection. The first step for the development of such systems is the automatic segmentation of the chest radiograph in order to extract the area of the lungs. In this paper we describe our segmentation method, whose result is a close contour which strictly encloses the lung area.
1. Introduction
In the field of medical diagnosis a wide variety of ima-ging techniques is currently avalaible, such as radiography, computed tomography (CT) and magnetic resonance ima-ging (MRI). Although the last two are more precise and more sensitive techniques, the chest radiography is still by far the most common type of procedure for the initial detection and diagnosis of lung cancer, due to its noninvasivity characteristics, radiation dose and economic consideration. Studies by [20] and [ll]explain why chest radiograph is one of the most challenging radiograph to produce technically and to interpret diagnostically. When radiologists rate the severity of abnormal findings, large interobserver and intraobserver differences occur. Moreover several studies in the last two decades, as for example [B] and [2], calculated an av-
28
erage miss rate of 30% for the radiographic detection of early lung nodules by humans. In a large lung cancer screening program 90% of peripheral lung cancers have been found to be visible in radiographs produced earlier than the date of the cancer discovery by the radiologist. This results showed the potentiality of improved early diagnosis, suggesting the use of computer programs for radiograph analysis. Moreover the advent of digital thorax units and digital radiology departments with Picture Archiving Communication Systems (PACS) makes it possible to use computerized methods for the analysis of chest radiographs as a routine basis. The use of image processing techniques and Computer Aided Diagnosis (CAD) systems has proved to be effective for the improvement of radiologists’ detection accuracy for lung nodules in chest radiographs as reported in [15]. The first step of an automatic system for lung nodule detection, and in general for any further analysis of chest radioraphs, is the segmentation of lung field so that all the algorithms for the identification of lung nodules will be applied just to the lung area. The segmentation algorithms proposed in the literature t o identify the lung field can be grouped into: rule based systems ([l],[21], [22], [7], [4], [14], [5], [3]), pixel classification methods including Neural Networks ([13], [12], [9], [IS]) and Markov random fields ([18] and [19]),active shape models ([S]) and their extensions ([17]). In this paper we describe an automatic segmentation method which identifies the lung area in Postero-anterior (PA) digital radiographs. Since the method is thought as the first step of an automatic lung nodule detection algorithm, we choose to include in the area of interest also the bottom of the chest and the region behind the heart; they are usually excluded by the methods presented in the literature. Besides, we tried t o avoid all kind of assumptions such as the position and orientation of the thorax: we work with images where the chest is not always located in the central part of the image, it can be tilted and it can have structural abnormalities. The method is made of two steps. First, the lungs are localized using simple techniques (section 4), then their borders are more accurately defined and fitted with curves and lines in order to obtain a simple close contour (section 5).
2. Materials
Our database actually contains 11 1 radiographs of patients with no disease and 13 of patients with lung nodules. They have been acquired in the
29
Department of Radiology of the Niguarda Hospital in Milan. The images were digitized with a 0.160 mm pixel size, a maximum matrix size of 2128 by 2584, and 4096 grey levels. Before processing they have been downsampled to a dimension of 300 by 364 pixels, and filtered with a median filter of 3 pixel size. In the following sections we will refer to these images as the origina2 images. 3. Coarse lung border detection
3.1. Iterative thresholding
Since both the background of the image and the central part of the lungs are charachterized by the highest grey va-lues, while the tissues between them are very dark, we use an iterative thresholding technique to obtain a first classification of the pixels as belonging to lung, body or background regions. Before applying the thresholding procedure, we enhance the image contrast by means of a non linear extreme value sharpening technique:
GN(z,y) =
max iff Jmax-G(x, y)J 5 5 lmin -G(z, min otherwise
Y)I
(1)
where min and max are the minimum and maximum grey values computed on a window Win(x, y) centered in ( 2 , ~ )The . window size used is 5 pixel. We choose this operator because it has the effect of increasing the contrast where the boundaries between objects are characterized by gradual changes in the grey levels. In the case of chest radiographs we often find this situation in the peripheral area of the lung and sometimes on the top regions and costophrenic angles. We then perform a linear transformation on the enhanced image with 4096 grey levels, to get an image with 256 grey levels, and start the iterative thresholding at an initial high threshold value of 235. At each step we lower the threshold by 1 and classify the regions formed by the pixels with grey value higher than the threshold into background and lung regions. We consider background regions those attached to the borders of the image or those at a distance of 1 pixel from other border regions, the others are identified as lung. The algorithm stops when two regions classified differently at the previous step fuse.
30
To obtain a finer approximation of the lung region we repeat the described iterative procedure for three times; each time the input is the original 8-bit image where the lung pixels found at the previous iteration are set t o 0. In [Fig.l] (left) a lung mask image is shown. The background is red coloured, the body part is black, the lung regions are blue.
3.2. Edge detection
At this stage we look for rough lung borders. To obtain an initial edge image (see [Fig.l] (center)) we use the simple but efficient Sobel operator, select 18% of the pixels with the highest gradient and delete those corresponding t o the background. We then mantain only the connected edge pixels regions which intersect the lung region previously identified. To delete or to separate from the lung borders edge pixels belonging t o other structures such as collarbones, neck, or clavicles we use a morphological opening operator. The regions disconnected either from the lung mask border or from the edges selected are eliminated if their localisation satisfies one of the following conditions: they are attached t o the borders of the image or t o background regions, their bottommost pixel is located over the topmost pixel of the lung regions, they are totally located in the space between the two lung areas. If the area covered by the remaining edge pixels is less extended than the one occupied by the lung mask .we look for new edge pixels in the lung regions. This is done by considering in the initial edeg image a bigger percentage of pixel with the highest grey value and adding them until either the edge pixels cover the whole lung area or the the percentage reaches a value of 40%. In [Fig.l] we show an example of the initial edge image (center) and the extracted lung edge image, E , (right). As can be seen further processing is necessary since some lung border may still be missing (the top or bottom parts, the costophrenic angles,..), or wrong edge pixels (belonging to the neck or collarbones) can still be present. To solve this problem we search for the axis of the thorax. We can thus delete, if they are present, the edges belonging to neck or collarbones and estabilish if the thorax has a non vertical position.
31
Figure 1. lung mask image, initial edge image and edge image
3.3. Axis finder To find the axis of the chest we use a binary image obtained by an OR operation between the lung edge image, E , and the lung mask image. For each horizontal line of this new image, we find the pixel in the center of the segment connecting the leftmost and rightmost pixel and sign it if the extremes of the segment do not belong to the same lung region. Moreover, we consider the inclination of the line connecting one central pixel ($0, yo) to the following (z1,yl) and discard it if the value (21- y o ) / ( z l - z o ) is less then 1.5; a lower value means that probably (z1,yl) has been computed from two outmost pixels that are not symmetric with respect to the real axis. The Hough transform to search for lines, and a polynomial fitting method that minimizes the chi-square error statistic, is used to find two possible axis of the image. The one that fits the central pixels better is then chosen as chest axis. In figure [Fig.2] (left) the central points used to find the axis and the corresponding lateral points are signed with the blue and red color respectively; on the right the axis dilated is showed. 3.4. Edge refinement
The axis found is usually located in the center of the dorsal column. This fact allows us to delete edges in E that belong to the dorsal column or to the neck. They are tipically little edge regions (with less than 200 pixel), crossing the axis itself or, more often, located in a region around it. We defined this region as a stripe which width is of 1/25 of the width of the originaE image (see [Fig21 on the right). We then delete all the regions with less than 200 pixels that cross this stripe. If some lung edge is wrongly
32
Figure 2. axis points and neck stripe
cancelled it will be recovered in the next steps. It can happen that the top parts of the lungs are detected by the Sobel operator but they are not included in the lung edge image E because in the lung m a s k they are not labelled as lung regions. The axis can help to verify this condition since the apex point of the lung should be located close to it. Consider the left lung (in the image), let ( z py,) be coordinates of the leftmost edge pixel with the lowest y coordinate, and let ( z a ,y,) be the coordinates of the axis in the same row; if lzp - zal is bigger than 1/4 of the total image width, we add those pixels that in the initial edge i m a g e are contained in a stripe extending from the x, to x a , with an height of y,/lO. The same operation is done for the right lung. We can also verify a simmetry condition between the two lung top pixels; if more that one pixel with the lowest y coordinate is found on each side, the central is taken. We evaluate the euclidean distance between one top and the simmetric of the other with respect to the axis; if this distance is greater than 20 we are allowed to think that there is no simmetry between the lungs edges found, and that the wrong top pixel is the one with the higher vertical coordinate. We therefore use this top pixel and the simmetric of the other one as vertices of a rectangular search area in the inatial edge image, and add the edge pixels found to E. The bottom part of the lungs are often charachterized by very low contrast and therefore also in this region we look for edge pixels to be added to E. In this case we use more accurate edge detectors, such as the directional gaussian filters. We limit the processing to a stripe centered around the bottommost edge pixel and with an height fixed at 1/8 of the vertical dimension of the original image. We work separately on the left and right lung sub-images, applying a locally adaptive scaling operator described in [lo], followed by the histogram equalisation. On these enhanced data we search in the left lung for edges oriented at 90" and 45", and in the right
33
lung for those oriented at 90" and 135'. We filter the image with a gaussian filter at scale c,related to the stripe dimension, take the vertical derivative and mantain the 5% of the pixels with the highest gradient value. These edge pixels, which often belongs to the lung borders, are added to the edge image. Since the costophrenic angle can still be missing we filter the image at a finer scale 0/2, take the derivative at 135" and 45" (depending on the side) and mantain the 10% of the edge pixels. A binary image that may represent the costophrenic angles is obtained combining this information with the 10% of the pixels with the highest value in the vertical direction. The regions in the binary image just created are added to the lung edge i m a g e E if they touch, or are attached to, some edge pixels in it. At this stage most of the edge pixels belonging to the lung borders should have been determined; the image can hence be reduced defining a rectangular bounding box slightly greater than the lung area defined by the lung edge image E . 4. Lung area delineation
4.1. Final contour refinement To obtain more precise and continuos contours we process the reduced image but with 4096 grey levels. We enhance it with a locally adaptive scaling algorithm and apply histogram equalization to the result. On the grey level enhanced image we identify the pixels that in the lung edge image E constitutes the lung extremes; for each side they are the leftmost and rightmost pixel in each row and the topmost and bottommost pixel for each column (they are red coloured in [Fig31 (left)). These are the seeds of the following region growing procedure: for each seed with grey value G(x,y), we select in its 8 neighborhood, and add to E , all the pixels in the range [G(z, y - lo), G(z, y) lo]. If their number is greater than 4 we select the pixels whose grey value is closest to G(z,y) and iterate the procedure unless a background pixel is identified or the selected element is another seed or 20 iteration steps have been done. This procedure creates thick contours, that now reach the external border of the lung, often much better defined especially on the top and bottom; however the lateral lung contours are often still discontinuos, especially in the right lung (see also Fig.31 (center)). We improve their definition calculating the horizontal derivative of the enhanced image, and keeping 15% percent of the pixels with the maximum value for the right lung, and 10% for the left. We tken delete those pixels internal to the lung or background regions; the
+
34
regions in this image intersecting edge pixels are added to the lung edge i m a g e (the result of this addition is shown in [Fig.3] (right)).
Figure 3. enhanced image with the seed points, edge image after growing, edge image after the last regions added
At this point we can define the close contour of the area containing the lungs, fitting the borders found with curves and lines. We describe the operation on the left lung only, referring to the binary image of its edges as left edge image El. We noticed that the shape of the top part of the lung could be well fitted by a second order polynomial function. To find it we use the Hough transform to search for parabolas, applied to the topmost points of each column in El. The fitted parabola is stopped, on the right side of its vertex, in the point where it crosses a line parallel to the axis and passing through the rightmost pixel; on the left side it is stopped where it crosses the left edge image; if more than one point is found we select the one with the lowest y coordinate. To find a close contour approximating the lateral borders we consider the set U composed by selecting for each row in El the leftmost pixel if it is located at the left side of the top one. Since we noticed that the orientation of the left border can change starting from the top to the bottom, we extracted from U three subsets u1, u2, u3 with an equal number of elements and containing the points located respectively in the upper, central and bottom part of the image. These subsets are fitted separately with different functions. We use one parabola to fit the points in u1: this allow us to recover errors in case the parabola used to fit the top points was too narrow (in the central image in [Fig41 an example of this fact is shown). A second line is used to fit the points in 212. The set u3 often contains the lateral points of both the lateral border of the lung and the lateral border of the
35
costophrenic angles; we noticed that in some cases the contours of these borders have different inclinations. We therefore fit with two different lines the points in the upper and bottom part of u g . We define as boundary in the bottom part the horizontal line that crosses the bottommost pixel of the edge image 5. Results
We detected small errors in 4 of the 124 images in our database, where we consider as error the fact that a part of the lung has not been included by the lung contours defined. The part missed by the algorithm is the border of the costophrenic angle. The algorithm anyway shows to be robust to structural abnormalities of the chest. ([Fig.$]). The algorithm has been implemented in IDL, an interpreted language and, when executed on a Pentium N with 256 Mb of RAM, it takes from 12 seconds (for images of patients with little sized lung that can be cutted as described in section 4.4) to 20 seconds (for images of big sized lung).
(b)
(c)
Figure 4. resulting images
References 1. S.G. Armato, M.Giger, and H.MacMahon. Automated lung segmentation in digitized posteroanterior chest radiographs. Academic radiology, 5:245-255, 1998. 2. J.H.M. Austin, B.M. Romeny, and L.S. Goldsmith. Missed bronchogenic carcinoma: radiographic findings in 27 patients with apotentially resectable lesion evident in retrospect. Radiology, 182:115-122, 1992.
36 3. M.S. Brown, L.S. Wilson, B.D. Doust, R.W. Gill, and CSun. Knowledgebased method for segmentation and analysis of lung boundaries in chest xrays images. Computerized Medical Imaging and Graphics, 22:463-477, 1998. 4. F.M. Carrascal, J.M. Carreira, M. Souto, P.G. Tahoces, L. Gomez, and J.J. Vidal. Automatic calculation of total lung capacity from automatically traced lung boundaries in postero- anterior and lateral digital chest radiographs. Medical Physics, 25:1118-1131, 1998. 5. D. Cheng and M. Goldberg. An algorithm for segmenting chest radiographs. Proc SPIE, pages 261-268, 1988. 6. T. Cootes, C. Taylor, D. Cooper, and J . Graham. Active shape models-their training and application. Comput. Vzs, Image Understanding, 61:38-59, 1995. 7. J. Duryea and J.M. Boone. A fully automatic algorithmfor the segmentation of lung fields in digital chest radiographic images. Medical Physics, 22:183191, 1995. 8. J. Forrest and P. Friedman. Radiologic errors in patient with lung cancer. West Journal on Med., 134485-490, 1981. 9. A. Hasegawa, S.-C. Lo, M.T. Freedman, and S.K. Mun. Convolution neural network based detection of lung structure. Proc. SPIE 2167, pages 654-662, 1994. 10. R. Klette and P.Zamperoni. Handbook of image processing operators. Wiley, 1994. 11. H. MacMahon and K. Doi. Digital chest radiography. Clan. Chest Med., 12:19-32, 1991. 12. M.F. McNitt-Gray, H.K. Huang, and J.W. Sayre. Feature selection in the pattern classification problem of digital chest radiographs segmentation. IEEE Duns. on Med. Imaging, 14:537-547, 1995. 13. M.F. McNitt-Gray, J.W. Sayre, H.K. Huang, and M. Razavi. A pattern classification approach to segmentation of chest radiographs. PROC SPIE 1898, pages 160-170, 1993. 14. E. Pietka. Lung segmentation in digital chest radiographs. Journal of digital imaging, 7:79-84, 1994. 15. T.Kobayashi, X.-W. Xu, H. MacMahon, C. Metz, and K. Doi. Effect of a computer-aided diagnosis scheme on radiologists’ performance in detection of lung nodules on radiographs. Radiology, 199:843-848, 1996. 16. 0. Tsuji, M.T. Freedman, and S.K. Mun. Automated segmentation of anatomic regions in chest radiographs using an adaptive-sized hybrid neural network. Med. Phys., 25:998-1007, 1998. 17. B. van Ginneken. Computer-aided diagnosis in chest radiographs. P.h.D. dissertation, Utrecht Univ., Utrecht, The Nederlands, 2001. 18. N.F. Vittitoe, R.Vargas-Voracek, and C.E. Floyd Jr. Identification of lung regions in chest radiographs using markov random field modeling. Med. Phys., 25:976-985, 1998. 19. N.F. Vittitoe, R. Vargas-Voracek, and C.E. Floyd Jr. Markov random field modeling in posteroanterior chest radiograph segmentation. Med. Phys., 26:1670-1677, 1999. 20. Cj Vyborny. The aapm/rsna physics tutorial for residents: Image quality and
37 the clinical radiographic examination. Radiographics, 17:479-498, 1997. 21. X.-W. Xu and K. Doi. Image feature anlysis for computer aided diagnosis: accurate determination of ribcage boundaries chest radiographs. Medical Physics, 22:617-626, 1995. 22. X.-W. Xu and K. Doi. Image feature anlysis for computer aided diagnosis: accurate determination of right and left hemidiaphragm edges and delineation of lung field in chest radiographs. Medical Physics, 23:1616-1624, 1996.
38
DISCRETE TOMOGRAPHY FROM NOISY PROJECTIONS
C . VALENTI Dipartimento di Matematica ed Applicazioni Universitci degli Studi d i Palermo Via Archirafi 34, 90123 Palermo - Italy E-mail:
[email protected] The new field of research of discrete tomography will be described in this paper. It differs from standard computerized tomography in the reduced number of projections. It needs ud hoc algorithms which usually are based on the definition of the model of the object to reconstruct. The main problems will be introduced and an experimental simulation will prove the robustness of a slightly modified version of a well known method for the reconstruction of binary planar convex sets, even in case of projections affected by quantization error. To the best of our knowledge this is the first experimental study of the stability problem with a statistical approach. Prospective applications include crystallography, quality control and reverse engineering while biomedical tests, due to their important role, still require further research.
1. Introduction
Computerized tomography is an example of inverse problem solving. It Usually consists of the recovering of a 3D object from its projections this object is made of materials with different densities and therefore it is necessary t o take a number of projections ranging between 500 and 1000. When the object is made of just one homogeneous material, it is possible to reduce the number of projections to no more than four, defining the so called discrete tomography '. In such a case we define a model of the body, assuming its shape. For example, we may know about the types of atoms to analyze, the probability to find holes inside the object and its topology (e.g. successive slices are similar to each other or some configurations of pixels are energetically unstable) '. Though this assumptions may be useful when considering applications
'.
39
such as nondestructive reverse engineering, industrial quality control, electron microscopy, X-rays crystallography, data coding and compression, they become almost unacceptable when the data to analyze come from biomedical tests. Nevertheless the engagements required by the present technology are too restrictive for real tasks and the state-of-the-art algorithms let mainly reconstruct simulated images of special shapes. Aim of this work is the description of an extensive simulation to verify the robustness of a modified version of a well known method for the reconstruction of binary planar convex sets. In particular, we will face the stability problem under noisy projections due to quantization error. Section 2 introduces formal notations and basic problems. Section 3 gives a brief description of the algorithm. Section 4 concludes with experimental results and remarks.
2. Basic notations and issues Discrete tomography differs from computerized tomography in the small variety of density distribution of the object to analyze and in the very few angles of the projections to take. From a mathematical point of view we reformulate this reconstruction problem in terms of linear feasibility (Figure 1):
A: = p-t , A E (0, a: E (0, l}n,p E Nr where the binary matrix A represents the geometric relation between points in Z2 and the integer valued vector p- represents their projections. 1 3
1 3 2 Figure 1 . A subset of ' 2 and its corresponding linear equation system. The black disks (+)and the small dots (+) represent the points of the object and of the discrete lattice, respectively.
Main issues in discrete tomography arise from this dearth of the input data. In 1957 a polynomial time method to solve the consistency problem (i.e. the ability to state whether there exists any A compatible with a given p-) has been presented 4 .
40
The uniqueness problem derives from the fact that different A’s can satisfy the same p . For example, two A’s with the same horizontal and vertical projections can be transformed one into each other by a finite sequence of switching operations (Figure 2). Moreover, there is an exponential number of hv-convex polyominoes (i.e. 4-connected sets with 4-connected rows and columns) with the same horizontal and vertical projections 5 .
Figure 2.
Three switches let get these tomographically equivalent objects.
Lastly, the stability problem concerns how the shape of an object changes while perturbing its projections. In computerized tomography the variation in the final image due to the fluctuation in one projection sample is generally disregarded, since it forms independently, as one of many, the result and the effect is therefore distributed broadly across the reconstructed image 6 . This is not true in the discrete case and the first theoretical analysis to reconstruct binary objects of whatever shape has proved that this task is instable and that it very hard to obtain a reasonably good reconstruction from noisy projections Here we will describe how our experimental results show that it possible to get convex binary bodies from their perturbed projections, still maintaining a low reconstruction error.
’.
3. Reconstruction algorithm In order to verify the correctness of the algorithm we have generated 1900 convex sets with (10 x 10,15x 15,. . . , 100 x 100) pixels. Further 100 convex sets with both width and height randomly ranging between 10 and 100 have been considered too. Their projections have been perturbed 1000 times by incrementing or decrementing by 1 the value of some of their samples, randomly chosen. This is to estimate the effect of errors with absolute value 0 5 E 5 1, so simulating a quantization error. The number of the samples has been decided in a random way, but if we want to let the area of the reconstructed body be constant, we add and subtract the same amount of pixels in all projections.
41
The algorithm introduced in lets reconstruct hv-convex polyominoes in polynomial time, starting from a set of pixels, called spine, that surely belong to the object to be reconstructed. This method makes a rough assumption of the shape of the object and then adds pixels t o this core through an iterative procedure based on partial sums of the projection values. Usually the spine covers just a small part of the object and therefore it is necessary to expand it by applying the filling operations (Figure 3). The underlying idea is the recursive constraint of convexity on each line and along each direction till the core of pixels satisfies the projections (Figure 4). Should it not happen, then no convex polyomino is compatible with those projections.
Figure 3. The first two filling operations are not based on the projection value. The circles (+) represent pixels not yet assigned to the core.
We have generalized this algorithm by weakening the convexity constraint. This means that as soon as it is not possible to apply a certain filling operation, due to an inconsistency between the value of the projection and the number of pixels already in the considered line of the core, we skip that line and process the rest of the projection, so reaching a solution that we called non-convex. It may happen that the ambiguity will be reduced when processing the core along other directions. Besides the horizontal and vertical directions, we have also considered the following ones d = ( ( l , O ) , (0,-1), (1,-2), (2, l), (-1, -l), (1,1)),in a number of projections chosen between 2 and 4, according to the sets { { d l , d z } , {d3,d4}, ( d 5 , d6), (6,d 2 , d 5 ) , {dl, d2, d3, d4), {dl, d2, d 5 , d6), (d3, d4, d5, d s } } . The particular directions we used are indicated in the upper right corner of each of the following figures. Since we are dealing with corrupt projections, most of the ambiguity zones are not due t o complete switching components. Just in case of complete switches we link the processing of the remaining not yet assigned pixels to the evaluation of a corresponding boolean 2-CNF formula (i.e. the .and. of zero or more clauses, each of which is the .or. of exactly
42
Figure 4. Convex recover through {dlrd2,d5}. The spine is showed in the first two steps, the filling operations in the remaining ones. T h e grey pixels are not yet assigned.
two literals) lo. This complete search has exponential time complexity, but it has been proved that these formulas are very small and occur rarely, especially for big images 11, In order t o measure the difference between the input image taken from the database and the obtained one, we have used the Hamming distance (i.e. we have counted the different homologous pixels), normalized according t o the size of the image. Most of times we have obtained non-convex solutions for which the boolean evaluation involves a bigger average error. Due to this reason, we have preferred not to apply the evaluation on the ambiguous zones, when they were not due to switching components. We want to emphasize that these pixels take part in the error computation only when compared with those of the object. That is, we treat these uncertain pixels, if any, as belonging to the background of the image.
Figure 5 . Non-convex recover (upper right) from a binarized real bone marrow scintigraphy (left) with 1 pixel added/subtracted along { d 3 , d 4 } and without spine. T h e final reconstructed image (lowerright) is obtained by deleting all remaining grey pixels. T h e input image is utilized and reproduced with permission from the MIR Nuclear Medicine digital teaching file collection at Washington University School of Medicine. MIR and Washington University are not otherwise involved in this research project.
4. Experimental results
This final section summarizes the most important results we obtained, giving also a brief explanation. The average error rate increases when the number of modified samples
43
increases. Obviously, the more we change the projections, the harder is for the algorithm to reconstruct the object (Figure 6a). Many non-convex sets suffers from a number of wrong pixels lower than the average error. Despite the algorithm couldn’t exactly reconstruct the convex set, the forced non-convex solutions still keep the shape of the original object. For example, there are about 66.11% of non-convex solutions, marked in grey, with fixed 100 x 100 size, 1 pixel addedlsubtracted along directions ( d 3 , d 4 , d 5 , d s ) , and error smaller than the 0.34% average error (Figure 6b). In the case of convex solutions, the spine construction lets reduce the number of unambiguous cells for the successive phase of filling. In the case of non-convex solutions, the spine usually assumes an initial object shape that produces solutions very different from the input polyomino. An example of non-convex set obtained without spine preprocessing is shown in Figure 5. The choice of the horizontal and vertical directions { d l , d z } is not always the best one. For example, ( 4 ,d 4 ) and ( d 5 , d ~ )let recover more nonconvex solutions with a smaller error. This is due t o the higher density of the scan lines, that corresponds to a better resolution. More than two directions improve the correctness of the solutions, thanks t o the reduced degree of freedom of the undetermined cells. The following tables concisely reports all these results, obtained for objects with 100 x 100 pixels, with our without the spine construction, along different directions and by varying the number of perturbed samples. To the best of our knowledge this is the first experimental study of the stability problem with a statistical approach. Our results give a quantitative esteem for both the probability of finding solutions and of introducing errors at a given rate. We believe that a more realistic instrumental noise should be introduced, considering also that the probability of finding an error with magnitude greater than 1 usually grows in correspondence of the samples with maximum values. Moreover, though the convexity constraint is interesting from a mathematical point of view, at present we are also dealing with other models of objects to reconstruct, suitable for real microscopy or crystallography tools.
Acknowledgements The author wishes to thank Professor Jerold Wallis bution in providing the input image of Figure 5.
12
for his kind contri-
44
301
p
20
+Isamples
%
error
Figure 6. a: Average (*) minimum (D) and maximum (0) error versus number of modified samples, for non-convex solutions with fixed 100 x 100 size, directions {dl,dz} and spine preprocessing. Linear least-square fits are superimposed. b: Number of nonconvex solutions versus error, for fixed 100 x 100 size and 1 pixel added/subtracted along directions {d3,d4,d5,d6) without spine. The dashed line indicates the average error. Table 1. + I / - 1 samples (constant area).
Directions
{dl,da,ds,ds} ld1.dz)
Spine no no no no no no no Yes Yes yes yes
Average error 0.34% 0.35% 0.54% 0.64% 0.71% 0.79% 1.57% 4.81% 4.83% 5.03% 5.44%
Number of solutions 66.11% 68.06% 64.71% 71.01% 77.40% 72.91% 73.29% 38.53% 38.03% 39.11% 37.47%
Table 2. Random samples (non constant area).
Directions
Spine no no no no no no no Yes Yes Yes yes
Average error 5.43% 5.66% 5.71% 5.86% 6.24% 8.53% 9.84% 10.67% 10.78% 10.87% 11.94%
Number of solutions 67.48% 69.85% 69.51% 58.34% 62.93% 58.75% 75.11% 28.42% 29.92% 28.67% 28.32%
References
1. KAK A.C. AND SLANEY M., Principles of Computerized Tomography Imaging. IEEE Press, New York, 1988.
45
2. SHEPPL., DIMACS Mini-Symposium on Discrete Tomography. Rutgers University, September 19, 1994. 3. SCHWANDER P., Application of Discrete Tomography to Electron Microscopy of Crystals. Discrete Tomography Workshop, Szeged, Hungary, 1997. 4. RYSERH.J., Combinatorial properties of matrices of zeros and ones. Canad. J . Math., 9:371-377, 1957. 5. DAURATA,, Convexity in Digital Plane (in French). PhD thesis, Universite Paris 7 - Denis Diderot, UFR d’lnformatique, 1999. 6. SVALBEI. AND SPEK VAN DER D., Reconstruction of tomographic images using analog projections and the digital Radon transform. Linear Algebra and its Applications, 339:125-145, 2001. 7. ALPERSA., GRITZMANN P., AND THORENS L., Stability and Instability in Discrete Tomography. Digital and Image Geometry, LNCS, 2243:175-186, 2001. 8. BRUNETTIS., DEL LUNGOA., DEL RISTOROF., KUBAA,, AND NIVAT M., Reconstruction of 8- and 4-connected convex discrete sets from row and column projections. Linear Algebra and its Applications, 339:37-57, 2001. 9. KUBAA., Reconstruction in different classes of 2d discrete sets. Lecture Notes in Computer Science, 1568:153-163, 1999. 10. BARCUCCI E . , DEL LUNGOA , , NIVATM., AND PINZANI R., Reconstructing convex polyominoes from horizontal and vertical projections. Theoretical Computer Science, 155:321-347, 1996. 11. BALOGHE., KUBAA . , DBVBNYIC., AND DEL LUNGOA., Comparison of algorithms for reconstructing hv-convex discrete sets. Linear Algebra and its Applications, 339:23-35, 2001. 12. Mallinckrodt Institute of Radiology, Washington University School of Medicine, http://gamma.wustl.edu/home.html.
46
AN INTEGRATED APPROACH TO 3D FACIAL RECONSTRUCTION FROM ANCIENT SKULL A. F. ABATE, M. NAPPI, S. RICCIARDI, G. TORTORA Dipartimento di Matematica e Informatica, Universitri di Salem0 84081, Baronissi,, Italy E-mail:
[email protected]
Powerful techniques for modelling and rendering tridimensional organic shapes, like human body, are today available for applications in many fields such as special effects, ergonomic simulation or medical visualization, just to name a few. These techniques are proving to be very useful also to archaeologists and anthropologists committed to reconstruct the aspect of the inhabitants of historically relevant sites like Pompei. This paper shows how, starting from radiological analysis of an ancient skull and a database of modem individuals of the same aredgendedage, it is possible to produce a tridimensional facial model compatible to the anthropological and craniometrkal features of the original skull.
1.
Introduction
In the last years computer generated imaging (CGI) has been often used for forensic reconstruction [19], as an aid for the identification of cadavers, as well as for medical visualization [3,16], for example in the planning of maxillo-facial surgery [ 141. In fact, 3D modelling, rendering and animation environments today available have greatly increased their power to quickly and effectively produce realistic images of humans [8]. Nevertheless the typical approach usually adopted for modelling a face is often still too much artistic and it mainly relies on the anatomic and physiognomic knowledge of the modeller. In other terms computer technology is simply replacing the old process of creating an identikit by hand drawn sketches or by sculpting clay, adding superior editing and simulative capabilities, but often with the same limits in term of reliability of the results. The recent findings of five skulls [see Figure 11 and several bones (from a group of sixteen individuals in Murecine (near Pompei), offers the opportunity to use CGI, and craniographic methods [ 5 ] ,to reconstruct the aspect of the victims of this tremendous event. This paper starts assuming that, unfortunately, what is lost in the findings of ancient human remains, is lost forever. This means that by no way is possible to exactly reproduce a face simply from its skull, because there are many ways in which soft tissues may cover the same skull leading to different final aspects.
47
The problem is even more complicated in the (frequent) case of partial findings, because the missing elements (mandible or teeth for example) could not be derived from the remaining bones [7].
Figure 1. One of the skulls found in the archaeological site of Murecine, near Pompei. Nevertheless is true that the underlying skeleton affects directly the overall aspect of an individual, and many fundamental physiognomic characteristics are strongly affected by the skull. One of the main purposes of this study, is therefore to correlate ancient skulls to skulls of living individuals, trying, in this way, to replace lost information (for example missing bones and soft tissues) with new compatible data. Additionally, the physiognomic relevant elements that are too much aleatory to be derived from a single compatible living individual, are selected through a search in a facial database (built from classical art reproductions of typical Pompeians) and then integrated in the previous reconstruction. This paper is organized as follows. In Section 2 related works are presented. In Section 3 the proposed reconstruction approach is presented in detail. In Section 4 the results of the proposed method are presented and discussed. The paper concludes showing directions for future research in Section 5.
48
2.
Related Works
Facial reconstruction from skull begins has a long history, and begins around the end of nineteenth century. The reconstructive methodologies developed over more of a century [20] basically come from two main approaches:
. .
the study of human facial anatomy and relationships between soft tissues (skin, fat, muscles) and hard tissues (cranial bones), the collection of statistical facial data about individuals belonging to different races, sex and ages,
and they can be summarized as follow:
.
. .
.
2D artistic drawing [6], in which the contours fitting a set of markers positioned on the skull act as a reference for the hand drawing phase which involves the anatomic knowledge of the artist. photo or video overlay of facial images on a skull image [lo], aimed to compare a face to a skull to highlight matching features.
3D reconstruction both with manual clay sculpting or digital modelling. In the manual approach the artist starts from a clay copy of a skull, applies the usual depth markers (typically referred as landmarks) and then begins to model in clay a face fitting the landmarks. In digital modelling the first step is to produce a 3D reconstruction of the skull [ 151, typically starting from CT data [17], then a facial surface model is created from 3D primitives using the landmarks as a reference for the contouring curves. It is also possible to generate a solid reconstruction of the modelled face by stereolithographic techniques [9,11]. warping of 3D digital facial model [18, 211, which tries to deform (warp) a standard “reference” facial model, to fit the landmarks previously assigned on the digital model of the skull.
Many of the methods mentioned above rely on a large survey on facial soft tissue depth, measured in a set of anatomically relevant points. Firstly developed on cadavers, this measurement protocol has been improved 141 with data from other races, various body build, and even from living individuals by radiological and ultrasound diagnostic techniques.
49
3.
The proposed method
The whole reconstructive process is detailed below in sections 3.1 to 3.11.Two reference databases are used: the Craniometrical Database (CD) and the Pictorial Physiognomic Database (PPD). In sections 3.5 and 3.10 these databases are discussed in detail.
3.1. The skull We start selecting one dry skull among the five ones found in Murecine. This skull belonged to a young male, and it has been found without the mandible and with many teeth missing, but its overall state of conservation is fine. Unfortunately the absence of the mandible make the reconstruction of the lower portion of the face more complicated and less reliable, because in this case there is no original bone tissue to guide the process. The skull is photographed and then scanned via CT on the axial plane with a step of 1 millimetre and a slice thickness of 2 millimetres, so every slice overlaps by 1 millimetre with the following one. This hires scanning produce a set of images (about 250), as well as a 3D reconstruction of the skull. Additionally, three radiological images of the skull from three orthogonal planes are taken, corresponding to front, side and bottom views. The 3D mesh outputted by CT will be used as a reference to visually verify the compatibility of the reconstructed soft tissues to the dry skull.
3.2. The set of landmarks The next step is to define on each radiological image a corresponding set of anatomic and physiognomic relevant points, named landmarks, each one with a unique name and number in each view [see Figure 21.
Figure 2. Landmarks located on front and side view of skull and craniometrical tracing.
50
Because the landmarks are chosen according to their craniometrical relevance, they possibly could not correspond to the points for soft tissue thickness measurement indicated by Moore [4]. In this study we use a set of 19 landmarks, but this number could be extended if necessary. Alternatively it is possible to assign the landmarks directly on the 3D skull mesh produced by CT, in this case the following step (3.3) is not necessary because the landmarks already have tridimensional coordinates. A complete list of the landmarks used is showed in Table 1.
I
Table 1. List of landmarks . Landmark # 1 Location (front view) 1 Landmark # I
Location (side view)
I
3.3. Adding a third dimension to the set of landmarks Now we have the same set of points assigned to each of three views corresponding with plane XY, XZ and YZ. So it is easy to assign to each landmark Li its tridimensional coordinates (hi, Lyi,Lz,) simply measuring them on the appropriate plane with respect to a common axis origin. We can easily visualize the landmark set in the tridimensional space of our modeling environment and make any kind of linear or angular measurements between two or more landmarks. 3.4.
Extraction of craniometricalfeatures
Starting from the landmarks previously assigned we define the n-tuple of features (F,*,F2*,......, F,*) which are peculiar to this skull and results from the craniometrical tracing of the skull [see Figure 21. These features are consistent to the features present in CD, they includes angles and lenghts measured on front or side view and are listed in Table 2.
51
Table 2 List of Features (front and side view)
Because each feature has a different relevance from a physiognomic and craniometrical point of view, a different weight is assigned to each of them. The resulting n-tuples (wl, w2, ......,w,,), with 0 I w i 5 1and 1I j I n , contains the weights relative to
(F,*,F2*,......, F,*) . These weights are not
meant to be dependent from a particular set of features, and if
Fi= o
then
w i=o. 3.5.
Searching for sirnilanties in CD
The CD is built on data collected from a radiological survey [see Figure 31 conducted on thousands of subjects of different ages and sex but all coming from the same geographical area in which the remains were found: Pompei and its surroundings.
Figure 3. Samples of records used to built the CD. Each individual represent a record in the database, and each craniometrical feature, extracted with the same procedure showed before, is stored in a numeric field, as well as the 3D coordinates. Additionally we stored three photographic facial images of each subject, shoot from the same position and during the same
52
session of radiological images. This precise alignment of photo camera and radio-diagnostic device is necessary to allow a spatial correlation between the two different kind of images. If a digital CT equipment or even a 3D scanneddigitizer were available, an optional field could point to a facial 3D model of each subject, thus avoiding the need for steps 3.6 e 3.7. Once the database is built, it is possible to search through it to find the record (the modern Pompeian individual) whose craniometrical features are more similar to the unknown subject given in input. This task is accomplished by evaluating for each record i the Craniometrical Similarity Score (CSS) that is calculated as :
I
in
which
Fi,is
the
jcomponent
of
the
n-tuple
of
features
(Fi17 Fi2 7 . . . . . . , F i , ) , relative to record i , w,represent its weight and D, is
D2,......)0,) containing the max allowed the j component of an array (D,, difference between Fi, and F,* for each j . If any feature is not present in the input skull, due to missing elements for example, then the corresponding term in the CSS formula becomes zero. So is O<=CSS<=I, and a CSS of 1 means a perfect match (an almost impossible case) is found. Ideally CSS should be not less than 80%to use the face as a valid reference for the reconstruction.
3.6. Augmenting the set of landmarks The aim in craniometrical database search is to augment the set of landmarks with new landmarks relative to soft tissues coming from the individual with the highest CSS. In fact radiological and photographic images of a living individual, contains useful information about local thickness and shape of soft tissues, which can replace data missing in the dry skull. To retrieve this data we first normalize photographic images to match the radiological images, and then we blend each pair of images to highlight the facial contours on the underlying skull, thus revealing the soft tissue thickness in many relevant point of the head for each plane.
53
3.7. Modelling the facial surface The augmented set of landmarks and the set of photographic images can be used to guide the 3D modelling of the “best match” face. The simplest modelling technique is to visualize the landmarks as 3D points inside the modelling environment, mapping the three photo images on three orthogonal plane so that for each view all the landmarks are properly positioned. Using these visual references we can draw a sequence of cross sections whose interpolation result in a surface model of the head. B-patches as well as Nurbs can be used for this purpose. An interesting alternative to this manual modelling technique is the possibility to generate the model from a set of stereoscopic images of the head as in [13]. In this case for each record the CD should also contain three pairs of images acquired from a slightly different angles. Whatever the technique adopted, the final result is the 3D model [see Figure 41 of the head with the maximum CSS.
Figure 4.Rough face model.
3.8.
Warping the roughface model to fit the original set of landmarks
If the CSS of the reconstructed head is not equal to 1 (and this will probably always be true) then we would like to modify the shape of this model to better fit the craniometrical features of the found skull. This kind of tridimensional deformation of a mesh, based on vertex relocation by a specific transformation of coordinates, is usually referred as a “warping”. More precisely, we want to move every bone landmark Lj of the “best match” case for which result
I(Li -L *;)I
# 0 (where Ly is the corresponding landmark
54
on the dry skull) to a new position that correspond to the coordinates of
L;. The
purpose is to affect the polygonal surface local to Li using the landmark as an handle to guide the transformation. Many different algorithms are available to accomplish this task, but we chosen a free form deformation which simply works assigning to the input mesh a lattice withn control vertex (our landmarks&) and by moving them (to L;) it deforms smoothly the surrounding surface. After warping is applied, the face model fit better the dry skull, and this match can be easily verified visualizing at the same time the skull mesh (from CT 3D reconstruction) and the face model with partial transparency.
3.9.
Texturing and shading
At this point we can apply material shaders to the head model, to enhance the realism of the reconstruction. We define a material for skin, with a texture assigned for the diffuse channel and a shininess map to simulate the different reflectivity levels present on actual face skin. Both the textures are mapped spherically on the mesh. For the diffuse texture we could well use the photographic images relative to the best match case present in CD, simply editing them with a photo-retouching software. To fine tune the assignment of mapping coordinates to mesh vertices we found very useful to make an unwrap of the mesh, in this way has been possible to interactively edit a planar version of the facial mesh thus simplifying this task.
3.10. Searching for missing elements in the physiognomic database The result of the previous nine steps is the creation of a 3D model of a bald head whose craniometrical features are compatible to the ones belonging to the found skull, and whose soft tissue thickness come from a living individual probably with similar anthropological features. We want now integrate this tridimensional identikit of an unknown Pompeian with physiognomic element such as eyes, lips, nose and hairs coming from the only reliable source we have, the paintings and sculptures made from artists contemporary to Vesuvio eruption, who are supposed to be inspired, in their works, from typical local subjects. So we introduce the PPD, built as a collection of images reproducing Pompeian classical arts. This database is based on the work by [ l ] and it allows, via a query by pictorial example, to retrieve images [see Figure 51 whose physiognomic features are compatible with given craniometrical features. As a result of a search through PPD, we could have a set of physiognomic elements which can guide the refinement of the reconstruction.
55
Figure 5. Samples of PPD records.
3.11. Final reconstruction and rendering The final step is to locally modify the mesh produced in step 3.9, trying to integrate the facial features of ancient Pompeians, as resulting from the previous search. We used a non-linear free-form deformation applied to a vertex selection based on distance from landmarks to properly deform the areas corresponding to eyes, lips and nose. We also tried to generate a digital reconstruction of haircut and shave, because these physiognomic elements, although totally aleatory, help to visualize the subject as it more probably was. Haircut and facial hair have been applied and oriented using a specific modelling tool. After modelling phase is over we can fine tune the materials properties and then produce the final renderings [see Figure 61 of the reconstructed head in high resolution.
Figure 6. Final rendering of the reconstructed face.
56 4.
Discussion
The methodology presented above actually integrates some of the features typical of the classic reconstructive approaches listed in section 2, trying to maximize their results specially for archaeological applications. In fact the warping technique is common to other computerized methods, as it is the use of a “reference” facial mesh to be deformed to fit the found skull, or the positioning of a set of landmarks on the bone remaining to guide the warping. Nevertheless this methodology differs substantially from the other ones in the following fundamental aspects: The building of a custom craniometrical database based on the anthropological hypothesis that individuals with similar physiognomic and craniometrical features can still be present in the same area in which the remaining were found; The selection of a reference candidate through a search for craniometrical similarities in the CD and not just based on a generic racelgender criteria; The modelling of a 3D facial mesh by actual (photo, CT or 3D scan) data of the selected (living) reference candidate, and not by average soft tissue depths collected following a generic racelgender criteria and applied to the dry skull; The warping technique applied to the highest CSS subject mesh only to improve the reconstruction, instead of using it as the main tool to conform a generic facial mesh to the found skull. The use of PPD to refine the reconstruction adding compatible physiognomic elements (nose, eyes, lips ) often not defined with other approaches. These peculiarities lead to a precise applicative range for the proposed method, with advantages and limits respect to other methods presented. The proposed method works best on a complete skull, but even in the case of missing mandible it can still produce interesting results, using the remaining craniometrical measurements to search a similar subject in the CD, thus replacing (even if a major alea would arise ) the lost information. Another critical point about “warping methods” mentioned in section 2 is the reference face mesh to warp, because its physiognomic features affect the final result independently from the correctness of soft tissue depth in the discrete set of landmarks involved in the process. The basic classification for races (Caucasian, Afro, Asian, etc.), sex and build (fat, normal or thin) is often too generic to accurately reproduce the aspect of specific ethnic groups. The proposed method, based on the custom built CD containing records of anthropologically compatible individuals, uses as a reference mesh the 3D face model of the most similar subject in database, thus minimizing the amount of
57
interpolation between the landmarks and leading to a more accurate reconstruction. Finally, after the landmark based mesh warping is applied, the resulting reconstructed face does not include elements such as nose, lips, eyes, ears or hairs, which cannot be derived from soft tissues statistics, so, as proposed in [ 191, it is necessary to manually draw them onto a rendered front or side view of the head to obtain a complete identikit. The proposed method relies on the PPD to search anthropologically compatible facial features and to apply them on the reconstructed face by local deformations of mesh control points. Even if these added elements still remain aleatory, they could be very useful to visualize the possible aspect/s of the found subject. On the other side the use of CD and PPD could be a limit to the application of this technique or to the reliability of its results, if an appropriate radiological/photographic survey on a population anthropologically similar to the subject to be reconstructed could not be available.
5.
Conclusion
Facial reconstruction techniques have a long tradition both in forensic and archaeological fields, but, as long as anthropological studies and information technology help us to better identify and visualize a feasible reconstruction of an individual, given its skull, we have to remark that there is no way to exactly replace lost data. The approach presented in this paper can considerably enhance the likeness of the reconstructed face to the anthropological features of the ethnic group which found skull belonged to, but requires correctly built CD and PPD to achieve optimal results. Future developments of this method will try to use as reference not only the record with the highest CSS found searching through CD, but a set of records whose CSS is above or equal to a previously defined threshold. By averaging the mesh relative to each selected record, the resulting face could be a better candidate for the next steps of reconstruction, with probably a lower influence of random physiognomic features than in the case of a single best match.
References [ 11 A. F. Abate, G. Sasso, A. C. Donadio, F. Sasso, The riddles of murecine: the role of anthropological research by images and visual computing - MDIC 2001, LNCS 2 184, pp. 33-4 1 - Springer-Verlag
58 121 J.P. Moss, A.D. Linney, S.R. Grindrod, C.A. Mosse, A laser scanning system for the measurement of facial surface morphology, Optics Lasers Eng. 10 (1989) 179-190. [3] A.C. Tan, R. Richards, A.D. Linney, 3-D medical graphics - using the T800 transputer, in: Proceedings th of the 8 OCCAM User Group Technical Meeting, 1988, pp. 83-89. [4] J.S. Rhine, C.E. Moore, Facial reproduction tables of facial tissue thickness of American Caucasoids in forensic anthropology, in: Maxwell Museum Technical Series 1, Maxwell Museum, Albuquerque, New Mexico, ,1982. [5] R.M. George, The lateral craniographic method of facial reconstruction, J. Forensic Sci. 32 (1987) 1305-1 330. [6] R.M. George, Anatomical and artistic guidelines for forensic facial reconstruction, in: M.H. Iscan, R.P. Helmer (Eds.), Forensic Analysis of the Skull, Wiley-Liss, New York, 1993, pp. 215-227, Chapter 16. [7] H. Peck, S. Peck, A concept of facial aesthetics, Angle Orthodont. 40 (1970) 284-318. [8] K.Waters, D. Terzopoulos, Modelling and animating faces using scanned data, J. Visual. Graphics Image. [9] H. Hjalgrim, N. Lynnerup, M. Liversage, A. Rosenklint, Stereolithography: potential applications in anthropological studies, Am. J. Phys. Anthropol. 97 (1995) 329-333. [lo] A.W. Sharom, P. Vanezis, R.C. Chapman, A. Gonzales, C. Blenkinsop, M.L. Rossi, Techniques in facial identification: computer-aided facial reconstruction using a laser scanner and video superimposition, 1nt.J. Legal Med. 108 (1996) 194-200. 1111 N. Lynnerup, R. Neave, M. Vanezis, P. Vanezis, H. Hjalgrim, Skull reconstruction by stereolithography,th in: J.G. Clement, D.L. Thomas (Eds.), Let’s Face It! Proceedings of the 7 Scientific Meeting of the International Association For Cranofacial Identification, Local Organising Committee of the IACI, Melbourne, 1997, pp. 11-14. P.Vanezis et a1 . / Forensic Science International 108 (2000) 81- 95 95 [ 121 Gonzalez-Figueroa, An Evaluation of the Optical Laser Scanning System for Facial Reconstruction, Ph.D. thesis, University of Glasgow, 1998. [13] R. Enciso, J. Li, D.A. Fidaleo, T-Y Kim, J-Y Noh and U. Neumann, Synthesis Of 3d Faces - Integrated Media Systems Center, University of Southern California - Los Angeles. [14] M.W. Vannier, J.L. Marsh, J.O. Warren, Three dimensional CT reconstruction images for craniofacial surgical planning and evaluation, Radiology 150 (1984) 179-184. [ 151 S. Arridge, J.P. Moss, A.D. Linney, D.R. James, Three-dimensional digitisation of the face skull, J. Max.-fac. Surg. 13 (1985) 136-143.
59
1161 S.R. Arridge, Manipulation of volume data for surgical simulation, in: K.H. Hohne, H. Fuchs, S.M. Pizer (Eds.), 3D Imaging in Medicine, NATO AS1 Series F 60, Springer-Verlag, Berlin, 1990, pp. 289-300. [17] J.P. Moss, A.D. Linney, S.R. Grinrod, S.R. Arridge, J.S. Clifton, Three dimensional visualization of the face and skull using computerized tomography and laser scanning techniques, Eur. J. Orthodont. 9 (1987) 247-253. [ 181 J.P. Moss, A.D. Linney, S.R. Grinrod, S.R. Arridge, D. James, A computer system for the interactive planning and prediction of maxillo-facial surgery, Am. J. Orthodont. Dental-facial Orthopaed. 94 (1988) 469-474. [19] P. Vanezis, R.W. Blowes, A.D. Linney, A.C. Tan, R. Richards, R. Neave, Application of 3-D computer graphics for facial reconstruction and comparison with sculpting techniques, Forensic Sci. Int. 42 (1989) 69-84. [20] A.J. Tyrell, M.P. Evison, A.T. Chamberlain, M.A. Green, Forensic threedimensional facial reconstruction: historical review and contemporary developments, J. Forensic Sci. 42 (1997) 653-661. [211 G. Quatrehomme, S. Cotin, G. Subsol, H. Delingette, Y. Garidel, G. Grevin, M. Fidrich, P. Bailet, A.Ollier, A fully three-dimensional method for facial reconstruction based on deformable models, J. Forensic Sci. 42 (1997) 649-652.
60
THE E-LEARNING MYTH AND THE NEW UNIVERSITY VIRGIN10 CANTONI, MARC0 PORTA AND MARIAGRAZIA SEMENZA Dipartimento di Informatica e Sistemistica, Universitcidi Pavia Via A . Ferrata, I 271 00, PA VIA, Italy E-mail:
[email protected],
[email protected], mariagrazia.semenza@unipv. it
The role of Information and Communication Technologies (ICTs) in educational development is underlined and is established as a priority, in order ”to reinforce academic development, to widen access, to attain universal scope and to extend knowledge, as well as to facilitate education throughout life“. In fact, the development ICTs has had a significant impact on traditional higher education systems and the former dual system has been modified and the gap is closing, as the university of the 21st century takes shape.
1. Introduction Technological advances offer new paradigms for university training. In particular, multimediality has strengthened the distance learning approach, insomuch that, in a first phase, a clear dichotomy has emerged between the traditional in-presence modality and the more aloof distance modality. With effective metaphors, it has been used the terms “brick university” and “click university” to indicate this separation. Initially, the two paradigms were presented with opposed traits: i) while the in-presence modality is characterized by the class (often active in full-time), the distance modality is personalized for the student; ii) while the first is characterized by the teacher and is centered on him or her (who chooses topics and operational rules), the second is focused on the student and is directly controlled by him or her; iii) while the first has predefined schedules and time extents, the second occurs only when required and has the strictly necessary duration; iv) while one is based on the topic, which is discussed by voice, the other is centered on the project, in which one learns by doing; v) while the first is communicated through the technology (based on the teacher competence), the second is conveyed by means of the technology (based on the acquired knowledge), through a “query and discovery” process by the student;
61
vi) to conclude, we can say that while in the in-presence paradigm the student plays a reactive role, in the distance modality the student assumes a proactive role. The traditional university, as an institution offering on-site courses, to maintain their prestigious position, needs to know how to make the most of the opportunities being offered by new technologies. The challenge is to rethink their hgher education environment in the light of new technologies in order to meet the challenges of a global context. For this reason, several countries are promoting technological development measures for education policy, either from government or from university associations. This implies the establishment of strategic lines for the development of a more open education. 2.
The e-learning myth
After a first period in which several only-virtual universities were created (e.g. the British Open University, which has today 100000 students around the world and uses 7000 teachers distributed on the Great Britain territory; the Globewide Network Academy in Denmark; the World Lecture Hall of the University of Texas; the Athena University, etc.), some prestigious institutions have joined their efforts to build non-profit alliances aimed at creating distance-learning programs. A significant example is represented by the agreement among the Universities of Stanford, Princeton, Yale and Oxford, in October 2000. Subsequently, on-line education entrepreneurs and for-profit associations, with or without traditional university partners, have appeared: today, there are more than 700 university institutions of this kind (with initiatives distributed in all the continents), as well as more than 2000 corporate Universities. At the end of April 2001, the MIT announced that, within a ten-year program, its almost 2000 courses will be put online, available for free to everybody. In addition, technological advances have increased permanent education demands, which are becoming more and more frequent. This way, permanent links can be established between institutions and their graduates. Life cycles of new technologies not only require new teaching paradigms, but also recurrent refresher courses. According to Christopher Galvin, President and CEO of Motorola, “Motorola no longer wants to hire engineers with a four year degree. Instead, we want our employees to have a 40 year degree”. Thus, besides institutions providing certified courses with a final diploma, there is a growing number of university consortia, organizations, publishers and industries aimed at developing and distributing on-line permanent instruction programs. This lays the foundations for the development of open higher education, its main objective being to develop human capital in the new technology age. However, beyond the adoption of institutional measures for the technological development of education, the expansion of open universities, some of which have already become macro universities capable of overshadowing the classical university model, has transformed the traditional university, while at the same
62
time increasing the diversification and development of higher education models, whether at the third cycle, such as postgraduate courses, masters degrees, vocational training and skills recycling. The activity of the Information Technology industry in the multimedia instructional sector has been very intense in the last few years. Currently, on the market, more than 100 different Learning Management Systems administer libraries for course storage and production, provide related information, and control course distribution and student interactive access. Like for all technologies approaching maturity, standardization activity is very intense in this phase to assure interoperability and ease of update and reuse of multimedia instructional products. Major changes are then taking place also in classical higher education institutions and universities, owing to the impact of new technologies and on the basis of the newcomers in the field. Universities which have become pioneers in adapting to this new reality through the introduction of new technologies as a complement to on-site courses.
3. Advantages and benefits of e-learning Since they can customize the learning material to their own needs, students have more control over the learning process and can better understand the material, leading to a faster learning curve, compared to instructor-led training. The delivery of content in smaller units contributes further to a more lasting learning effect. Students taking an online course enter a risk-free environment in which they can try new things and make mistakes without exposing themselves. This characteristic is particularly valuable when trylng to learn soft skills, such as leadership and decision-making. A good learning program shows the consequences of students’ actions and wherelwhy they went wrong. E-learning builds on existing delivery methods to incorporate connectivity, whether through internal networks or the Internet. It removes the isolation that limited its predecessors to a market of enthusiasts and innovators. E-learning is used to drive strategic organizational goals. Unlike so much training in the past, it is tightly integrated into what the organization must achieve, not what individuals feel is good for them. Among the several benefits of e-learning, we can list the following: it’s usually less expensive to produce, it’s self-paced (most e-learning programs can be taken when needed), it moves faster (the individualized approach allows learners to skip material they already know), it provides a consistent message (elearning eliminates the problems associated with different instructors teaching slightly different material on the same subject), it can work from any location and any time (e-learners can go through training sessions from anywhere, usually at anytime), it can be updated easily and quickly (online e-learning sessions are especially easy to keep up-to-date because the updated materials are simply
63
uploaded to a server), it can lead to a increased retention and a stronger grasp on the subject (because of the many elements that are combined in e-learning to reinforce the message, such as video, audio, quizzes, interaction, etc.), it can be easily managed for large groups of students. E-learning can improve retention by varying the types of content (images, sounds and text work together), crating interaction that engages the attention (games, quizzes, etc.), providing immediate feedback (e-learning courses can build in immediate feedback to correct misunderstood material), encouraging interaction with other e-learners and e-instructors (chat rooms, discussion boards, instant messaging and e-mail all offer effective interaction for elearners). 4.
E-learning in Europe
The use of e-learning for enhancing quality and improving accessibility to education and training is generally seen as one of the keystones for building the European knowledge society. At the Member State level, most countries have their own Action Plan for encouraging the use of ICT in education and training: often involving direct support for local pilots of e-learning in schools and higher education. Evidence that true e-learning is being used in Europe is not easy to find, as it’s typical at this stage of an embryonic technology market for organizations to work through a series of internal pilots. Compared to the USA, in some ways Europe is following a different path: greater government involvement, more emphasis on creative and immersive approaches to learning, more blending of e-learning with other forms, a greater use of learning communities (mainly by southern European users), and (particularly in Scandinavia) a strong emphasis on simulation and mobile communications. E-learning standards are recognized as being useful, even essential, to encourage the reuse and interoperability of learning materials. It is important to sustain the exchange of experience within Europe on the use of ICT for learning and to develop a common understanding of what is good or Best Practice. We think that e-learning standards can only be established as the result of a profitable collaboration among varied entities, operating in different contexts, with different objectives. Only by sharing problems, solutions and evaluations of the various outcomes the real essence of potential drawbacks and advantages can be assessed. 5.
The new role of teachers
The technological revolution taking place in higher education is changing the classical models of on-site training and education. Educators cannot turn their
64
backs on information technologies when giving classes, students need to learn new technologies and, rather than accumulate knowledge, it is increasingly important to know where to find information. What is more, the university, as an institutional offering on-site courses, needs to know how to make the most of the opportunities being offered by new technologies, in order to broaden their market on the basis of thls new provision. The teacher plays in e-learning a new, different role. First of all, while devising a course, the teacher becomes the designer of experiences, processes and contexts for the learning activity; besides identifymg the contents, he has to focus on motivation and active learning processes. He probably has to devote a greater attention to the creation of what has never been done before than to the analysis of the previous experiences. Rather than a scientist who applies his analytics skills, the teacher seems to act like an artist. Even more important are the strategies for teaching at a distance. In few words, what is different: - classroom teachers rely on a number of visual and unobtrusive cues from their students. A quick glance, for example, reveals who is taking notes, pondering a difficult concept, or preparing to make a comment. The student who is confused, tired, or bored is equally evident. The attentive teacher receives and analyzes these visual cues and adjusts the delivery to meet the needs of the class during the lesson. - the distant teacher has not visual cues: without the use of a real-time visual medium the teacher receives no visual information from the distant sites. In any case if do exist, these cues are filtered through technology and it is difficult to carry on a stimulating teacher-class discussion when spontaneity is altered by technical requirements and distance. The teacher might never really know, for example, if students are paying attention, talking among themselves or even in the room. Separation by distance also affects the class environment: living in different communities deprives the teacher and students of a common community link. Therefore even with on-site teaching, but in deep way in the distant case, during the course, teachers are engaged as mentors in motivating students, in highlighting pros and cons and in detecting the causes of failures. The most advanced multimedia technology is not the one that artificially replaces reality or intelligence, but rather the one that increases our skills, adapting itself to the context and evolving while being used. Technology must fit the user, not the contrary: it is really effective when it is ergonomic, intuitive and transparent. By paraphrasing Wayne Hodgins, to be really efficient and effective in multimedia1 teaching it is therefore necessary: to choose “just the right CONTENT, to just the right PERSON, at just the right TIME, on just the right DEVICE, in just the right CONTEXT, and just the right WAY. Even if the so-called “digital divide” problem (that is the marginalization of computer science illiterates) is not as strong as it was in the past (especially in the young generations), there are still many people who have not approached the
65
new potentialities of technology and remain in the "cybercave" where they can see only the shadows of technology. Even the ones that are completely temped by the "hi-tech" can make a big mistake by forcing contents on technology: true effectiveness is only obtained by adapting technology to contents! 6.
The university with long tradition and the e-learning opportunities
After centuries of stable evolution, the academic system has entered a period of significant change, revolutionary in certain aspects. Market forces are increasingly interested in advanced education (mostly abroad, but recently in Italy as well), academic competition has increased and technology demonstrates a relevant innovative impact. Students are today more active and aware of technology capability, they are used to interaction, to plug-and-play experiences: they now constitute the digital generation and have easy access to the whole academic world (different universities are just a click away one from the other). This phase of rapid transformation presents broad perspectives and novel opportunities, but many challenges and risks as well. It is primarily essential to develop change capabilities, to help our institutions react to the rapid change necessities required by society. Secondarily, actions are not simply extractable from even glorious tradition, but need to be examined in the wide field of different future perspectives. The program adopted by the e-learning research unit of Pavia University includes several activities. In particular, the research unit will focus on the following themes. The definition of a common project methodology for modeling, classifying and archiving educational resources that guarantee an adequate level of interoperability and reusability, through the adoption of standards and tools for metadata description and packaging of 'learning objects'. With an explicit reference to the S C O W standard, the focus will be on the analysis and development of multi-resolution mechanisms for managing data from the atomic level (of individual assets and 'learning objects') to the higher levels of 'semantic resolution'. The resulting process should be applied for both classification and retrieval, through the management of learning object metadata, as well as for generating new educational units, through the assessment of techniques for the aggregation and recombination of basic constituents in lower semantic levels, down to the atomic level of learning objects. This hierarchical metaphor for knowledge management will be analyzed in order to develop a methodology, if not a theory, on the principles and procedures for the generative design of education units at the higher levels. Learning objects will be in fact the fundamental elements in the new overall learning model that originates from the object-oriented programming paradigm. Following this approach, the efforts will also aim to the creation of basic components, i.e. actual learning objects, which will be reusable in many different
66
contexts. In fact, the very notion of learning objects relies on this fundamental principle: when developing a new educational unit, the objective should be the construction of several basic components about the subject considered and these components should be reusable in other contexts and with different learning strategies. Eventually, all the educational units will be accessible through the Internet, i.e. they will be edited and used by many users simultaneously. It has to be remarked that the objective of the analysis and development presented is not that of achieving another 'Content Management System' but rather the definition of the guidelines for selecting the tools and educational resources that will eventually become the shared infrastructure. A final not negligible objective of the Pavia e-learning project is that of enlarging as much as possible the basis of potential teachers by stimulating and promoting the adoption the new activity, up to achieving course portfolio for the entire traditional university background that range from humanistic, to scientific, to all the applied sciences.
References
1. Ackermann, E. (1996). Tools for teaching: The World Wide Web and a Web Browser. (http://www.mwc.edu/ernie/facacad/WWWTeaching.htmI ) 2. Angelo, T. & Cross, P. (1993). Classroom assessment techniques: A handbook for college teachers. San Francisco: Josey-Bass Publishers. 3. Bernt, F.L. & Bugbee, A.C. (1993). Study practices and attitudes related to academic success in a distance learning programme. Distance Education, 14(1), 97-1 12. 4. A. Biancardi, V. Cantoni, and M. Pini (1996), A Training Environment for ISE Courses, Proc. ICIP '96, Lausanne, pp. 453456. 5. A. Biancardi, V. Cantoni, D. Codega, and M. Pini (1997), An Interactive Tool for C.V. Tutorials, CAMP 97, pp. 170-174. 6. Bruwelheide, J. H. (1994) In Willis, B. (Ed.) Distance Education: Copyright Issues. Distance Education: Strategies and Tools. Educational Technology Publications: Englewood Clus, NJ. 7. Burge, E.J., & Howard, J.L. (1990). Audio-conferencing in graduate education: A Case Study. The American Journal of Distance Education, 4(2), 3- 13.
67
8.
Cantoni, V. (200 1). L'universita nell'era digitale: tradizione e nuove opportunita. Prolusione d'apertura dell'anno accademico pavese. 9. Dick, W., & Carey, L. (1990). The systematic design of instruction (3rd ed.). Glenview, IL: Scott, Foresman, and Company. 10. Galbreath, J. (1995) Compressed Digital Videoconferencing. Educational Technology, 35( l), 3 1-38. 11. Ludlow, B.L. (1994). A comparison of traditional and distance education models. Proceedings of the Annual National Conference of the American Council on Rural Special Education, Austin, TX. (ED 369 599) 12. Misanchuk, E.R. (1992). Preparing instructional text: Document design using desktop publishing. Englewood Clifs, NJ: Educational Technology Publications. 13. Misanchuk, E.R. (1994). Print tools in distance education. In B. Willis (Ed.), Distance education: Strategies and tools (pp. 109129). Englewood Clifs, NJ: Educational Technology Publications. 14. Moore, M.G.,& Thompson, M.M., with Quigley, A.B., Clark, G.C., & Goff, G.G.(1990). The effects of distance learning: A summary of the literature. Research Monograph No. 2. University Park, PA: The Pennsylvania State Universig, American Center for the Study of Distance Education. (ED 330 321) 15. Morgan, A. (1991). Research into student learning in distance education. Victoria, Australia: University of South Australia, Underdale. (ED 342 371). 16. M. Mosconi, M. Porta (1999) Testing the Usability of Visual Languages: a Web-Based Methodology, Proceedings of the 8th International Conference on Human-Computer Interaction (HCI'99), 22-27 August 1999, Munich, Germany, Vol. 1, pp. 1053-1057. 17. Oliver, E.L.(1994). Video tools for distance education. In B. Willis (Ed.), Distance education: Strategies and tools (pp. 165195). Englewood Cl@s, NJ: Educational Technology Publications. 18. Schlosser, C.A., & Anderson, M.L. (1994). Distance education: A review of the literature. Ames, IA: Iowa Distance Education Alliance, Iowa State University. (ED 382 159) 19. Schuemer, R. (1993). Some psychological aspects of distance education. Hagen, Germany: Institute for Research into Distance Education. (ED 357 266). 20. Threlkeld, R., & Brzoska, K. (1994). Research in distance education. In B. Willis (Ed.), Distance Education: Strategies and
68
21. 22. 23.
24. 25.
Tools. Englewood Clifls, NJ: Educational Technology Publications, Inc. Verduin, J.R. & Clark, T.A. (1991). Distance education: The foundations of effective practice. San Francisco, CA:Jossey-Bass Publishers. Wileman, R. (1993). Visual communicating. Englewaod Clifs, NJ: Educational Technology Publications. Wilkes, C.W., & Burnham, B.R. (1991). Adult learner motivations and electronics distance education. The American Journal of Distance Education, 5( l), 43-50. Willis, B. (1993). Distance education: A practical guide. Englewood Cliffs, NJ: Educational Technology Publications. Willis, B. (Ed.) (1994). Distance education: Strategies and tools. Educational Technology Publications, Inc. : Englewood Clifls, N. J.
69
E-LEARNING - T H E NEXT BIG WAVE: HOW E-LEARNING WILL ENABLE THE TRANSFORMATION O F EDUCATION DR. RICHARD STRAUB IBM Emea, Learning Solutions Director CARLA MILAN1 IBM Emea, South Region, Learning Solutions
After having analyzed the motivations that lead to a revision of the teaching and learning models, baring in mind the European Union initiatives, we now analyze different possible views of e-learning, both from a technological perspective and through a more global and integrated approach. IBM has decided to participate in this transformation challenge and, after an accurate analysis of all aspects of the phenomenon and the realization inside the company of wide e-learning strategies, is ready for the education world, as a partner able to handle complex projects, both in the academic and corporate environments. We present the IBM role in the EU initiatives and in the public-private partnerships started by the commission. We also outline the IBM education model, useful to building learning projects with a “blended” methodology.
The provision of learning through our education systems is set to undergo a fundamental transformation - this evolution is currently in its early stages. The advent of the knowledge society makes education and learning a primary concern for governments and the population at large - learning is increasingly recognised as a lifelong process with the foundations being laid during the period of formal primary, secondary and tertiary education. Today e-learning is considered a key enabler for this transformation. There is a broad need for basic Information and Communication Technology (ICT) skdls in our society - ICT skills are becoming a new literacy skill. These skills are a basic prerequisite for leveraging the potential of e-learning. Yet, the real benefits of e-learning will flow from the enhancement it can bring to the process of teaching and learning, improving the speed and depth of knowledge and skills acquisition, increasing flexibility for the learner, personalising the learning path, innovating cross-discipline approaches by networked content and new
70
supporting collaborative learning methods mediated by technology. Overall, elearning provides a broad array of new options to shape, design and deliver learning resources and processes. The change of the education systems is one of the most important challenges for our society - the mobilisation and participation of all players is required. Education institutions have been conservative by culture and tradition - hence the challenge of the change process is second to none. This social transformation can only be successful if managed in a proactive and holistic way - taking into account all critical success factors, in particular the role of e-learning as the driver of change must be recognised and understood.
IBM has made a strategic decision to engage in this transformation process. IBM has a broad set of capabilities which positions it as a potential partner to governments, education institutions and other stakeholders. 1. What is e-learning?
The European Commission defines e-learning in the context of its e-learning initiative as, “the use of new multimedia technologies and the Internet to improve the quality of learning by facilitating access to resources and services as well as remote exchanges and collaboration”. E-learning in a broad sense embraces all these views and meanings. As such it can be conceived as a complex, integrated process, where the Internet enables social inclusion and social cohesion - enabling us to involve and connect people, pedagogy, processes, content and technology. E-learning is supporting the development, delivery, evaluation, management and commerce of learning in an integrated way. Understanding the complex nature of this new learning paradigm has led IBM to adopt a broad definition of e-learning, based on a total systems perspective. It is related to our notion of e-business, which is about transforming core business processes by leveraging the net. Typical core business processes are customer relationship management (CRM), Supply Chain Management and e-commerce. Since e-learning affects the core business processes and the business model relating to learning provision we define it as follows:
‘E-learning is the application of e-business technology and services to teaching and learning. It provides digital content and collaboration to support remote learning and to augment class-based learning. It includes infrastructure, elearning delively platforms, content development and management.
71
rt provides the collaborative framework to enable knowledge sharing and peer to peer learning hubs that can be further supported by mentors or coaches, thus supporting informal collaboration, sharing of knowledge and experiential learning. ’ 2.
E-learning provides a new learning environment
From this perspective e-learning creates a new learning universe - a learning environment for educators and learners with major sets of key elements: People-centric human elements, such as pedagogical and didactic approaches, personalised learner support, teacher quality and capabilities, learner preferences, cultural factors, new roles (for example, virtual tutors and community facilitators), social elements like group interaction, collaboration and knowledge sharing Content centric elements, such as rich media content, authoring tools, learning object management, flexible credits linked to learning objects, content repositories, user friendly content management - rich categorisation and search Process centric elements, such as security and privacy, learning management systems including enrolment facilities, online testing, reporting capabilities, efficiency and effectiveness measurements Technology centric elements, such as hardware and software infrastructure including servers, routers, end-user equipment, network bandwidth, databases, delivery platforms, mobile technologies and networking. Using this broader perspective for e-learning avoids known pitfalls deploying the latest technology does not solve a learning need if there is no sound pedagogical approach associated with it. The best multimedia programme will not produce the effects desired if the bandwidth of the network does not allow for sufficient transmission capacity towards the end-user. New learning programmes will not achieve acceptance in the user community if they are not compatible with the existing cultural environment, experiences and values.
72
All elements must be balanced in order to achieve desired results. A holistic view of e-learning- creating a new learning environment
Process
* *
*
Trackinglreporting Skills, planning and assessment SuppoNhelp
v implementation
Content
*
Instructional Interactional
IT infrastructure,content repositories, portals, learning management system, LCMS, authoring tools
It is clear from the previous page that technology is one of the necessary conditions to make e-learning work - though not a sufficient condition. However, without a sound technology strategy there is no way even to get started with an e-learning deployment. 3.
A technology view of e-learning
The base layer of the enabling technology is the network infrastructure. Our customers tell us they need a network infrastructure that is robust, reliable, scalable, secure and flexible - based on open standards. Availability, interoperability and manageability are also key requirements. The network infrastructure must allow for access by multiple devices ranging from laptop computers to mobile phones. The network infrastructure sets the basic capabilities and limitations as to what type of e-learning programmes can be provided. Another key technology element is the software, which underpins the elearning environment and ensures that the different application components function seamlessly together such as enrolment system and billing system or rich
73
media objects repositories and authoring tools. Software brings flexibility and innovation for the teachers in their course context and enables teachers and learners to collaborate synchronously or asynchronously and to establish work processes. Software also integrates and secures your existing environment with e-learning. Learning portals integrate the view and access of the learning environment from the user perspective and eventually enable the user to create a personalised ‘my.University’ or ‘mySchoo1’ based on the students, educators, administration staff, alumni and external stakeholders profile. 4.
IBMs capabilities as a technology partner
The e-learning provider marketplace is very fragmented today. Industry observers expect a shakeout during the next 12-18 months. There are highly specialised niche players, providing learning management and learning content management systems and delivery platforms - in most cases proprietary solutions - consulting companies, acting as integrators, information technology (IT) players, coming from the infrastructure side and telecommunication carriers. Recently major content providers (media companies and publishers) have zeroed in on e-learning as a key future market. We strongly believe that e-learning will drive increased co-operation between education institutions and industry, not only within the boundaries of a country but across Europe and even across continents. Students and lifelong learners will have to access the institutions and their resources anytime, from anywhere. IBM is committed to open standards to ensure long term vendor independence, scalability, interoperability and flexibility of solutions. The least desirable state in an e-learning environment are islands of incompatible implementations. The need for economies of scale of infrastructure, content and support place a heavy financial and productivity penalty on such an approach. IBM provides solutions for all technology layers of e-learning - be it with IBMs own products and services or with partners. Given the importance that we attribute to emerging e-learning requirements for education, we have created an IBM Institute to focus on this arena from a strategy and research perspective - the IBM Institute of Advanced Learning. It is a global virtual organisation with the leadership based in our Zurich laboratory in Rueschlikon. The Institute of Advanced Learning is focusing on the technology side as well as the human factors of e-learning.
74
A technology view of e-learning
5.
Key benefits that IBM can provide as a technology partner
We have capabilities to provide hardware, software and services in all technology domains relevant for e-learning, which we can leverage to the benefit of our customers and we feel strongly that we are best positioned as an integrator and total solutions partner in this marketplace.
As the premier IT infrastructure provider we can help our customers to plan, build and run the network infrastructure. We have developed offerings directed specifically to the education market. We are worlung with a number of partners in this arena, in particular with Cisco, where we have a strategic alliance that has now been extended to the education sector 0
IBM has unique capabilities and experiences as a systems integrator to make IBM and non-IBM components work together and to shield the customer from the complexity of multi-vendor systems With our consulting capabilities we can support customers in the development of a vision for an e-learning environment and to devise implementation and project management plans
0
IBM has long standing experience with integrating access devices into an e-learning environment. In a concept known as ‘Thinkpad* University’ we work with universities to implement an integrated
75
programme for the deployment and support of mobile computing for the students and faculty IBM has a strong commitment to open sourcing and open standards and has been strongly engaged in Linux, Extensible Markup Language (XML), JavaT"2 Platform, Enterprise Edition (J2EE), Web services and JavaTM. With the emergence of new standards in the field of e-learning IBM takes an active role in the international standardisation work such as the development of standards for learning object metadata 0
IBM has an unrivaled software portfolio, applicable to e-learning, covered in OW four major brands:
-Use data management software for development of a learning objects strategy by managing unstructured and structured information and knowledge -Develop collaboration and knowledge management into e-learning organisations with Lotus* software as a major requirement to e-learning success and learning quality improvement -Tivoli' software helps users manage, measure and secure a large distributed and heterogeneous learning environment to be sure the learners have the right level of system, use the appropriate software and reduce the costs and risks of software and hardware administration -Websphere* software enables users to integrate existing applications and to develop interactive Web applications and portals. With Lotus Learning Space we have a learning delivery platform which allows for collaboration between learners and takes e-learning to a high level of effectiveness. Combined with instant teamroom applications such as Lotus Quickplace*,realtime conferencing with Lotus Sametime* and other solutions, teachers can keep innovation and control of their pedagogy in a flexible approach of e-learning. To sum it up - With IBM develop e-learning for everyone, not just a small group and get a better return on investment (ROI).
76
6.
IBMs capabilities beyond technology
Technology is only one of the critical success factors for e-learning implementation. Successful deployment of an e-learning environment requires a thorough understanding of the interplay of technology culture and human elements. One of the most common pitfalls with e-learning is to undertake the digitisation of content without addressing the human element. E-learning requires a complete redesign and management of content and its embedding in a meaningful and motivating pedagogical pathway. This can only be achieved with adequate skills for instructional design, with a sound understanding of how various technologies can be used to support specific learning objectives within a framework for creating and maintaining motivation and interest by the learner and in measuring learning effectiveness. Also, education in the 21st century should address the role of preparing students to operate in an uncertain and ever-changing environment. What is needed is a toolkit to last through life, comprising such intra-personal elements as a values framework, self-knowledge, capacity for critical analysis, ability to learn, as well as communication and social skills. The use of new technologies in schools will free educators time for concentrating on these new core competencies and for ‘identifymg the strengths of individuals, to focus on them and to lead students to achievement’. The use of technology for e-learning also forces teachers to develop a new relationship with students - one in which teachers act as facilitator and mentor to the self-directed, independent and collaborative learning activities of students. The learning journey is becoming an interactive process, which at times demands self direction by the learner, at times is dependent on feedback from peers and tutors and at others is simply a function of instructor defined outcomes. It will be necessary to place comparable emphasis on technology and face-to-face interaction, to balance a teacher-directed and a facilitated, collaborative approach and to place equal importance on teacher delivery and learner exploration. All of this argues for what has been called a ‘high tech - high touch’ approach. Such paradigm shifts in teaching and learning require radical changes in the competencies of teachers and the attitudes of learners.
7.
Key benefits that IBM can provide
Our value proposition for education customers builds on four main sources:
77
1. IBM has invested $70m, since 1994, into it’s ‘Reinventing Education’
partnership programme with the objective of improving the quality of primary and secondary education. From the 28 installations around the world and the ensuing research projects, IBM has gained significant intellectual capital and has developed solutions for schools. More details about IBMs Reinventing Education Programme can be found at i bm .com/ibm/ibmgives.
2.
We are one of the most significant providers of enterprise-wide Learning Solutions. Our offerings in this arena span across the learning value-chain from planning the learning intervention, through design, development and implementation, to measurement - including the measurement of learning effectiveness. Most of the know-how of IBMs learning consultants (in our Learning Solutions organisation) gained fiom such engagements, is applicable to the education sector.
3. IBM is one of the most significant developers and consumers of e-learning for internal purposes. Today, some 40 per cent of IBMs internal education is delivered through e-learning. We have been repeatedly recognised as one of the most innovative e-learning companies in the world with numerous awards from the American Society for Training and Development (ASTD), the Corporate University Exchange (CUX), Deutscher Industrie und Handelstag and so on. 4.
A key element of IBMs investment in the development of e-business skills is a new programme for partnering with educational institutions. The IBM Scholars programme is wide-ranging, from the provision of software and educational resources to research, academic collaboration and curriculum development. i b m ,co m/software/info/university/scholarsprogram/
Some examples of specific solutions and intellectual assets derived from these activities and initiatives: Through the Re-inventing Education Programme IBM has supported a series of projects investigating effective practices in the use of technology in education. With eight projects in various countries around the world (including three in Europe) it has been possible to study an array of approaches to using technology to improve instructional practice across a variety of contexts. From online teacher professional development to online lesson planning to online teaching interventions and online authentic assessment, the projects have explored ways that ICT can enable transformed teaching and learning. Our
78
Learning Village solution incorporates the experiences and the findings of these projects. In our internal education programmes we have developed a conceptual model, the IBM 4-Tier Learning Model, which helps us to better align learning technologies with learning programmes to reach desired learning outcomes. T h s ‘IBM learning model’, has become a widely acclaimed and applied framework for e-learning inside and outside IBM. The model has been proven in areas of softskills training such as management development and sales training where it helped us to design continuous learning processes with a new, innovative ‘blend’ of technology and face-to-face learning.
Blending classroom with e-learning The ISM four-tier learning model
Learning methods
Technology
Get together
Try it. play it, experience it
8.
The political dimension - public private partnerships
A successful transition to a new model in education requires, as a starting point, a shared vision of how to design tomorrow’s education and training and shared commitment from the stakeholders involved. However, the initial priority may be to change perceptions and develop new mindsets.
79
Since the early days of the European Commission’s white papers ‘Growth, Competitiveness and Employment’ (1994) and ‘Teaching and Learning’ (1999, the European institutions have played an important role in actively tackling the challenges of the 2 1st century and challenging prevailing mindsets. These white papers set out the framework for subsequent commission documents, including the eEurope Initiative, the eEurope Action Plan, the e-learning initiative and the e-learning Summit and Action Plan (May 2001), the Memorandum on Lifelong Learning and the Report On The Concrete Future Objectives Of Education Systems. Attaining all the goals defined at the Lisbon European Council in March 2000, presupposes the committed involvement of all the players involved in education and training. “The fact is that in the future a society’s economic and social performance will increasingly be determined by the extent to which its citizens and its economic and social forces can use the potential of these new technologies, how efficiently they incorporate them into the economy and build up a knowledge-based society”. (Communication from the Commission: eLearning - Designing tomorrow’s Education, 2000)
9. IBMs role in pan-European public private partnerships IBM is engaged in a leading role in two major European initiatives relating to ICT skills and e-learning: The Career-Space Consortium and the E-learning Industry Group (eLIG). 10. The Career-Space project Career-Space was founded in late 1998 with support and sponsorship from the European Commission. Seven major ICT companies have been founding members - BT, IBM, Microsof<*, Nokia, Philips, Siemens and Thales (formerly Thomson CSF). This initiative was triggered by the structural shortage of qualified ICT personnel in Europe, which will impact the future prosperity of the content, if not addressed adequately. As a first compelling issue to be addressed in the context of the ICT skills gap, Career-Space has given a response from the industry perspective as to what generic skills will be needed in the future and should be built by the ‘suppliers’. Following the publication of 13 ‘generic ICT skills profiles’ by the end of 1999, new members joined the group - Cisco, IntelrM,Nortel Networks and Telefonica. In addition, the European ICT Industry Association (EICTA) and CEN/ISSS (the European standardisation organisation for ICT) joined the Steering Committee along with EUREL (the convention of
80
national societies of electrical engineers), who were given status of associate member. With this membership the consortium now has a significant credibility to articulate requirements from an industry perspective and to provide recommendations for action. Following the publication of the skdls profiles Career-Space has focused on the logical next step, for instance, it has moved from the demand side to the supply side. What should be the changes in future ICT curricula in content and structure to support the demands of the knowledge society? The recommendations have been produced in co-operation with over 20 universities and technical educational institutions across Europe. The output of this work effort, the new ‘Curriculum Development Guidelines - New ICT Curricula for the 2 1st Century, Designing Tomorrow’s Education’ - contain fundamental recommendations for change. Besides the obvious need to provide solid foundation skills from the engineering and informatics domains with a particular emphasis on a broad systems perspective, the guidelines point to the need of including non-technical skills in the curricula such as business skills and personal skills. There is no way to enforce implementation of these guidelines - however, a number of universities have already started to implement these guidelines on a pilot basis. 11. The European E-learning Summit 2001 and the E-learning Industry Group
Following the recognition of the importance of public private partnerships as a key element in the transformation process of the education system the European Commission invited IBM, Cisco Systems, Nokia, Sanoma WSOY and SmartForce to collaborate with a wide range of industry partners in organising the summit. The e-learning summit explored the challenges outlined in the European Commission’s e-learning action plan and presented an initial set of recommendations. The Summit has taken into account a broad systemic view of e-learning along the lines demonstrated initially in this article. As a consequence the recommendations issued by the Summit touch elements from infrastructure to digital content, pedagogy and professional development of teachers, to name a few. Particular focus is also given to financial incentives and funding schemes to progress actual implementation and diffusion of e-learning. The structural funds and loans from the European Investment Bank are major examples of sources to be broadly leveraged. Overall the summit working groups recommended pragmatic steps to move the e-learning agenda forward on a European level.
81
Following the e-learning summit a core group of companies has proposed the establishment of a standing body to provide advice to the European Commission and national governments across Europe. The objective is to accelerate the deployment of e-learning in line with the European Commission’s e-learning action plan. The founding members of this e-learning industry group are 3COM, Accenture, Apple, BT, Cisco, Digitalbrain, IBM, IntelTM,Line Communications, NIIT, Nokia, Online Courseware Factory, Sanoma WSOY, Sun Microsystems and Vivendi Universal Publishing. Dr. Richard Straub, Director of Learning Solutions, IBM Europe, Middle East and Africa has been elected Chairman of the group. In a meeting with the European Commissioner for Education and Culture, Viviane Reding, four initial projects were proposed by the group and have been reviewed and welcomed by the Commissioner. These projects will lead to specific recommendations for action and where feasible, support pilot implementations. 0
0
Connecting everyone and everything from everywhere - removing the barriers for access to interactive e-learning environments Adopt and participate in the development of open standards of e-learning Create the conditions to sustain a commercial market for e-learning content and development Increase investment in continuous professional development of teachers and trainers, enhancing their status and helping them develop and understand the principles for e-learning.
The E-learning Industry Group is an open group and welcomes involvement of other industry players. Other interested parties such as Associations and Government Agencies can participate in the ‘Consultation Group’, which represents the wider circle of the Industry Group. 12. Our vision for the future It is our vision that new value-nets will emerge, with government bodies, education institutions, corporations, technology providers, media companies and publishers joining forces to provide learning on demand - as a ‘utility’. Elearning utilities will serve schools, universities, small and medium enterprises, larger corporations and individuals to meet their learning needs, shield them from the complexities of the underlying infrastructures and systems and provide ubiquitous access to learning via education portals to learning experiences. The e-learning utility will also provide a managed environment for content producers to deliver their content to education and learning environments. The e-learning utility will be part of an overall learning environment, where the human elements
82
such as tutors, mentors and social interactions between groups will continue to play a vital role - yet in an effective blend with technology. The E-learning Utility will shield the education institutions from the complexity of the IT solution and from the load of building, running and maintaining it. It will allow focus on what is essential for the institutions - for instance, the provision of well orchestrated learning opportunities, strong curricula and adequate pedagogy and meeting the need of a diverse audience. The knowledge society enforces a new way of thinlung and acting about learning. We are at the very beginning of this journey, but it has definitely started. The winners will be those who get on the learning curve early.
E-utility for learning
Hostinglbandwidth
13. The IBM e-learning model
In order to obtain and master a competency, listening to someone speaking might not be sufficient: we also need to experiment. Surely, the acquisition of competencies can be reached through individual work, but people learn better when they study in a team, as it has been proved in many works of research.
)
83
The same concept applies to an e-learning environment. E-learning models centered on the teacher, as web lectures, are very effective in transferring information. However, to completely master a competency it is necessary that the model used allows the student to assume the control of his own learning as well as to practice. An effective learning model, that allows students to practically use the acquired competency, also requires an interaction. The simple interaction with the computer may not be sufficient: interaction and collaboration among several students or between the student and the teacher or both is also advisable. Finally, in order to really master competencies, the student needs to use it in a real situation. Hence, as we proceed in the education chain, the collaboration level must increase. Even in an e-Learning model the interaction and collaboration level must further increase when students need to master peculiar competencies. Interaction and collaboration are the most meaningful aspects that support the so-called IBM e-Learning model that the company uses both for its own internal education and for the management of large e-Learning projects within its own customers. It is a 4-tier model that begins from a lower tier based on information sharing and extends to a level of complete knowledge mastering: a model not 100% e-Learning based. As a matter of fact, it is not even plausible that eLearning could completely replace traditional, classroom based education: it will always be necessary, at some stage of knowledge building and development, to put students in front of an expert. In addition, the model lets you develop courses both horizontally and vertically. In other words, some courses can be based on a simple tier, using only one e-Learning methodology, while others need more levels and different methodologies. These latest solutions are called blended solutions Briefly, the four-tiers of the IBM model are: Tier 1: learning through information Reading, watching, listening. This is basic knowledge transfer, ideal for the launch of new initiatives, for announcing new company strategies, for the advertising of new rules, etc. The tools that are used in this tier are simple web lectures and web sites where students can quickly and easily find needed information.
84
Tier 2 : learning through interaction Testing, experimenting. Basic knowledge on new applications or simple procedural activities can be treated at this level. Some examples: CBT based or WBT based courses with application simulation. Tier 3 : learning through collaboration Discussing, practicing with others. Collaboration techniques, like chat lines, team-rooms and online interaction with teachers, allow students to learn inside the team and to share experiences. At this level, students can prepare group exercises, or they can use more sophisticated technologies where application sharing is possible. Tier 4 : learning together Finally, it is possible to use the traditional classroom with a mentoring activity. However, in an e-learning model, we use this tier only to obtain very advanced competencies and not to transfer or to acquire basic knowledge. The result is a time reduction for students outside the work schedule and an optimization of teachers’ precious time as well as expensive resources.
85
Web links: Information about our Learning Solutions for schools, higher education and government can be found at this Web site: ibm.com/learning
IBMs Reinventing Education Programme: ibm.com/ibm/ibmgives IBM Scholars Programme: i bm .com/software/info/university/scholarsprogram
86
QUERY MORPHING FOR INFORMATION FUSION SHI-KUO CHANG Department of Computer Science University of Pittsburgh Pittsburgh, PA, USA E-rnai1:chang@ cs.pitt.edu
An evolutionary query is a query that changes in time and/or space. For example, when
an emergency management worker moves around in a disaster area, an evolutionary query can be executed repeatedly to evaluate the surrounding area in order to locate objects of threat. Depending upon the position of the query originator, the time of the day and other factors such as feedback from sensors, the query can be modified. Incremental query modification leads to a query similar to the original query. Nonincremental query modification on the other hand may lead to a substantially different query. Query morphing includes both incremental query modification and nonincremental query modification. In sensor-based evolutionary query processing, through query morphing one or more sensor can provide feedback to the other sensors. The sensor dependency graph is used to facilitate query optimization because most sensors can generate large quantities of temporaVspatia1 information within short periods of time. Applications to multi-sensor information fusion in emergency management, pervasive computing and situated computing are discussed.
1. Evolutionary Queries There is an important class of queries for information fusion applications in emergency management, pervasive computing, situated computing [20] etc., which require novel query processing and information visualization techniques. We will call this class of queries evolutionary queries. An evolutionary query is a query that changes in time andlor space. For example, when an emergency rescue worker moves around in a disaster area, an evolutionary query can be executed repeatedly to evaluate the surrounding area in order to locate objects of threat, determine routing for rescue vehicles, etc. Depending upon the position of the person or agent, the time of the day and other factors such as feedback from sensors, the query can be different. The person or agent who issues the query is called the query originator. Depending upon the spatialltemporal coordinates of the query originator and feedback from sensors, an evolutionary query can be modified accordingly
87
Under normal circumstances the modified query is quite similar to the original query, differing mainly in the spatial constraints of the query. As explained above, incremental query modification where only the constraints are changed leads to a query similar to the original query. However, nonincremental query modification may lead to a substantially different query. For example, if the aircraft has entered a cloudy region or is flying at night, the query should be modified to consider only time sequenced laser radar images because the video sequence will yield little or no information. There are cases where even more substantial changes of the query are necessary. Query morphing includes both incremental query modification and non-incremental query modification.
In this paper we investigate query morphing for sensor-based evolutionary query processing, where one or more sensor may provide feedback to the other sensors through query morphing. The status information such as position, time and certainty can be incorporated both in the multi-level views and also in the morphed query. In order to accomplish sensor data independence, an ontological knowledge base is employed. The results of query processing are visualized so that the user can also manually modify the query. Further extension of the query morphing approach is discussed.
2. Background and Related Research Information fusion is the integration of information from multiple sources and databases in multiple modalities and located in multiple spatial and temporal domains. The fusion of multimedia information from multiple real-time sources and databases has become increasingly important because of its practical significance in many application areas such as telemedicine, community networks for crime prevention, health care, emergency management, e-learning and situated computing. The objectives of information fusion are: a) to detect certain significant events [24, 261 and b) to verify the consistency of detected events [ 11, 16, 211. In sensor-based query processing, the queries are applied to both static databases and dynamic real-time sources that include different type of sensors. Since most sensors can generate large quantities of spatial information within short periods of time, novel sensor-based query processing techniques to retrieve and fuse information from multiple sources are needed.
In our previous research, a spatiaVternpora1 query language called 2QL was developed to support the retrieval and fusion of multimedia information from
88
real-time sources and databases [5, 6, 9, 151. ZQL allows a user to specify powerful spatiaVtemporal queries for both multimedia data sources and multimedia databases, thus eliminating the need to write separate queries for each. ZQL can be seen as a tool for handling spatiaYtempord information for sensor-based information fusion, because most sensors generate spatial information in a temporal sequential manner [14]. A powerful visual user interface called the Sentient Map allows the user to formulate spatialhemporal crqueries using gestures [7, 81.
For empirical study we collaborated with the Swedish Defense Research Agency who has collected information from different type of sensors, including laser radar, infrared video (similar to video but generated at 60 framedsec), and CCD digital camera. When we applied ZQL to the fusion of the above described sensor data, we discovered that in the fusion process data from a single sensor yields poor results in object recognition. For instance, the target object may be partially hidden by an occluding object such as a tree, rendering certain type of sensors ineffective.
Object recognition can be significantly improved, if a modified query is generated to obtain information from another type of sensor, while allowing the target being partially hidden. In other words, one (or more) sensor may serve as a guide to the other sensors by providing status information such as position, time and certainty, which can be incorporated in multiple views and formulated as constraints in the modified query. In the modified query, the source(s) can be changed, and additional constraints can be included in the where-clause of the crquery. This approach provides better object recognition results because the modified query can improve the result from the various sensor data that will also lead to a better result in the fusion process. A modified query may also send a request for new data and thus lead to a feedback process.
In early research on query modification, queries are modified to deal with integrity constraints [22]. In query augmentation, queries are augmented by adding constraints to speed up query processing [12]. In query refinement [23] multiple term queries are refined by dynamically combining pre-computed suggestions for single term queries. Recently query refinement technique was applied to content-based retrieval from multimedia databases [3]. In our approach, the modified queries are created to deal with the lack of information from a certain source or sources, and therefore not only the constraints can be changed, but also the source(s). This approach has not been considered previously in database query optimization because usually the sources are
89
assumed to provide the complete information needed by the queries. Almost all previous approaches fall under the category of incremental query modification. For information fusion we must consider non-incremental query modification where not only the constraints but also the sources and even the query structure are modified. .It is for this reason we introduce the notion of query morphing.
In addition to the related approaches in query augmentation, there is also recent research work in agent-based techniques that are relevant to our approach. Many mobile agent systems have been developed [l, 2, 181, and recently mobile agent technology is beginning to be applied to information retrieval from multimedia databases [ 171. It is conceivable that sensors can be handled by different agents that exchange information and cooperate with each other to achieve information fusion. However, mobile agents are highly domain-specific and depend on adhoc, ‘hardwired’ programs to implement them. In contrast, our approach offers a theoretical framework for query optimization and is empirically applicable to different type of sensors, thus achieving sensor data independence.
3. The Sensor Dependency Graph The sensor dependency graph is proposed to facilitate sensor-based evolutionary query processing and optimization because most sensors can generate large quantities of spatial information within short periods of time. In database theory, query optimization is usually formulated with respect to a query execution plan where the nodes represent the various database operations to be performed [13]. The query execution plan can then be transformed in various ways to optimize query processing with respect to certain cost functions. In sensor-based query processing, a concept similar to the query execution plan is introduced. It is called the sensor dependency graph, which is a graph in which each node Pi has the following parameters: obj-typei is the object type to be recognized sourcei is either the information source or an operator for fusion, union or combination recog-algi is the object recognitiodfusion algorithm to be applied timei is the estimated computation time of the recognitiodfusion algorithm in seconds recog-cq is the certainty range [min, max] for the recognition of an object norecog-cri is the certainty range [min, max] for the non-recognition of an object sqoi is the spatial coordinates of the query originator tqoi is the temporal coordinates of the query originator
90
soii is the space-of-interest for object recognitiodfusion (usually an area-ofinterest) toiiis the time-of-interest for object recognitiodfusion (usually a timeinterval-of-interest) These parameters provide detailed information on a computation step to be carried out in sensor-based evolutionary query processing. As mentioned earlier the query originator is the persodagent who issues a query. For evolutionary queries, the spatialhemporal coordinates of the query originator are required. For other type of queries, these parameters are optional. If the computation results of a node PI are the required input to another node P2, there is a directed arc from PI to P2. Usually we are dealing with sensur dependency trees where the directed arcs originate from the leave nodes and terminate at the root node. The leave nodes of the tree are the information sources such as laser radar, infrared camera, CCD camera and so on. They have parameters such as (none, LR, NONE, 0, (1,1), (1,1), sqo,, tqoi, soiall,toiall). Sometimes we represent such leave nodes by their symbolic names such as LR, IR, CCD, etc. The intermediate nodes of the tree are the objects to be recognized. For example, suppose the object type is 'truck'. An intermediate node may have parameters (truck, LR, recog315, 10, (0.3, OS),(I,l), sqoi, tqoi, soill, toiall). The root node of the tree is the result of information fusion, for example, a node with parameters (truck, ALL, fusion7, 2000, (O,l), (O,]), sqoi, tqoi, soiall,toiall)where the parameter ALL indicates that information is drawn from all the sources. In what follows, some parameters such as the spatialhemporal coordinates sqoi and tqoi for the query originator, the allinclusive space-of-interest soialland the all-inclusive time-of-interest toiallwill be omitted for the same of clarity.
Query processing is accomplished by the repeated computation and updates of the sensor dependency graph. During each iteration one or more nodes are selected for computation. The selected nodes must not be dependent on any other nodes. After the computation, one ore more nodes are removed from the sensor dependency graph. The process then iterates. As an example, the query originator is an aircraft, and the evolutionary query is a query to find moving trucks. By analyzing the initial query, the following sensor dependency graph TI is constructed, where the sources are laser radar (LR), infrared (IR) and charged couple device camera (CCD): ( m d - S N O N E P O .I )ll,1))
(mle,ITCNONEPLl,l)Ll,lf)
(+mk,LR mmQ1520L0303LO 496)) ( ~ . A L L F u u o ~ ~ , ~ ~i(o,i)) (o,~
91
This means the information is from the three sources - laser radar, infrared camera and CCD camera - and the information will be fused for recognizing the object type 'truck'. Next, we select some of the nodes to compute. For instance, all the three source nodes can be selected, meaning information will be gathered from all three sources. After this computation, the processed nodes are dropped and the following updated sensor dependency graph T2 is obtained:
(truck,CCD, recogl1,100,(0.6,0.8),(0.1,0.3))
We can then select the next node(s) to compute. Since IR has the smallest estimated computation time, it is selected and recognition algorithm 144 is applied. The sensor dependency graph T3 is: (truck,LR, recog3 15,20,(0.3,0.5),(0.4,0.6))
(truck,ALL,fusion7,2OOO,(O,~),(0, I)) (truck,CCD, recogl1,100,(0.6,0.8),(0.1,0.3))
In the updated graph, the IR node has been removed. We now select the CCD node because it has much higher certainty range than LR and, after its processing, select the LR node. The sensor dependency graph T4 is:
I
I
(truck,LR, recog315,20,(0.3,0.5),(0.4,0.6))
I
1
(truck,ALL,fusion7,2OOO,(O,l),(0,1))
Finally the fusion node is selected. The graph T5 has only a single node: (truck,ALL,fusion7.2000,(0,I),(O, I))
After the fusion operation, there are no unprocessed (i.e., unselected) nodes, and query processing terminates.
I
I
92
4.
Query Morphing by Incremental Query Modification
In the previous section a straightforward approach of sensor-based evolutionary query processing is described. This straightforward approach misses the opportunity of utilizing incomplete and imprecise knowledge gained during query processing. Let us re-examine the above scenario. After IR is selected and recognition algorithm 144 applied, suppose the result of recognition is not very good, and only some partially occluded large objects are recognized. If we follow the original approach, the reduced sensor dependency graph becomes T3 as shown in Section 3. But this misses the opportunity of utilizing the incomplete and imprecise knowledge gained by recognition algorithm 144. If the query is to find un-occluded objects and the sensor reports only an occluded object, then the query processor is unable to continue unless we modify the query to find occluded objects. Therefore a better approach is to modify the original query, so that the recognition algorithm 144 is first applied to detect objects in a space-ofinterest soi-ALL (i.e., the entire area). Although algorithm 144 cannot detect an object, it is able to reduce the space of interest to a much smaller soi23, and T4 becomes T4'. (truck,LR,reco~315,20,(0.3,0.5),(0.4,0.6),soi-23)
(truck, ALL.fusion7,2000.(0. 1),(0. 1),soi-23)
The recognition algorithm 315 can now be applied to recognize objects of the type 'truck' in this smaller space-of-interest. Finally, the fusion algorithm fusion7 is applied. The query modification approach is outlined below, where italic words indicate operations for the second (and subsequent) iteration.
Step 1. Analyze the user query to generatelupdate the sensor dependency graph based upon the ontological knowledge base (see Section 6) and the multi-level view database (see Section 5) that contains up-to-date contextual information in the object view, local view and global view, respectively. Step 2. If the sensor dependency graph is reduced to a single node, perform fusion operation (if multiple sensors have been used) and then terminate query processing. Otherwise buildmodify the o-query based upon the user query, the sensor dependency graph and the multi-level view database. Step 3. Execute the portion of the o-query that is executable according to the sensor dependency graph.
93
Step 4. Update the multi-level view database and go back to Step 1. As mentioned above, if in the original query we are interested only in finding unoccluded objects, then the query processor must report failure when only an occluded object is found. If, however, the query is modified to "find both unoccluded and occluded objects", then the query processor can still continue. Evolutionary queries and query processing are also affected by the spatialhemporal relations among the query originator, the sensors and the sensed objects. Therefore in query processing the spatialltemporal relations must be taken into consideration in the constructiodupdate of the sensor dependency graph. The temporal relations include "followed by", "preceded by", and so on. The spatial relations include the usual spatial relations, and special ones such as "occluded by", and so on [ 191.
5. Multi-Level View Database A multi-level view database (MLVD) is proposed to support sensor-based query processing. The status information is obtained from the sensors, which includes object type, position, orientation, time, certainty and so on. The positions of the query originator and the sensors may also change. This is processed and integrated into the multi-level view database. Whenever the query processor needs some information, it asks the view manager. The view manager also shields the rest of the system from the details of managing sensory data, thus achieving sensory data independence.
The multiple views may include the following three views in a resolution pyramid structure: the global view, the local view and the object view. The global view describes where the target object is situated in relation to some other objects, e.g. a road from a map. This will enable the sensor analysis program to find the location of the target object with greater accuracy and thus make a better analysis. The local view provides the information such as the target object is partially hidden. The local view can be described, for example, in terms of Symbolic Projection [4],or other representations. Finally, there is also a need for a symbolic object description. The views may include information about the query originator and can be used later on in other important tasks such as in situation analysis. The multi-level views are managed by the view manager, which can be regarded as an agent, or as middleware, depending upon the system architecture. The global view is obtained primarily from the geographic information system (GIS).
94
The local view and object view are more detailed descriptions of local areas and objects. The results of query processing, and the movements of the query originator, may both lead to the updating of all three views.
6. The Ontological Knowledge Base For any single sensor the sensed data usually does not fully describe an object, otherwise there will be no need to utilize other sensors. In the general case the system should be able to detect that some sensors are not giving the complete view of the scene and automatically select those sensors that can help the most in providing more information to describe the whole scene. In order to do so the system should have a collection of facts and conditions, which constitute the working knowledge about the real world and the sensors. We propose to store this knowledge in the ontological knowledge base, whose content includes object knowledge structure, sensor and sensor data control knowledge. The ontological knowledge base consists of three parts: the sensor part describing the sensors, recognition algorithms and so on, the external conditions part providing a description of external conditions such as weather condition, light condition and so on, and the sensed objects part describing objects to be sensed. Given the external condition and the object to be sensed, we can determine what sensor(s) and recognition algorithm(s) may be applied. For example, IR and Laser can be used at night (time condition), while CCD cannot be used. IR probably can be used in foggy weather, but Laser and CCD cannot be used (weather condition). However, such determination is often uncertain. Therefore certainty factors should be associated with items in the ontological knowledge base to deal with the uncertainty. 7.
Query Optimization
In the previous sections we explained the evolutionary query processing steps and proposed the major components of the system. In this section the optimization problem related to evolutionary query processing are proposed.
Suppose that we have the sensor dependency graph such as T1 of Section 3. For the recognition algorithm recog3 15, we have the following certainty range: P(recog315 = yes 1 X = truck, Y = LR) E (0.3,0.5), and P(recog315 = no 1 X # truck ,Y= LR) E (0.4,0.6), where X= truck, Y = LR means that there is a truck in the frame which is obtained by LR. If the input data has certainty range (a,b) and the recognition algorithm has certainty range (c,d), then the output has
95
certainty range (min(a,c), min(b,d)). The optimization problem can be stated as follows: Given the sensor dependency graph, we want to recognize the object ‘truck’ with the certainty value above a threshold. Our goal is to minimize the total processing time. In other words, the optimization problem is as follows:
where
8, =
if algorithm i doesn’t run at jth order
Ti = processing time of algorithm i. subject to
56 56
5 1 (for jth order, at most one algorithm can run.)
i=l
5
1 (for every algorithm, it can be at most in one order.)
j=l
max(
c
N
6 , ) 2 8 ( 8 is the certainty threshold.)
where
c
1
= ( c( ALG,, priori certainty),
.., , c( ALGN,priori certainty) )
96
... Given the sensor dependency graph, a dual problem is to recognize the object 'truck' within the processing time limit. Our goal is to maximize the certainty value for the object truck under the condition that the total processing time is below the time limit. The problem is as follows: Maximize m a x ( c
1'
6,) if algorithm i runs at jth order
where
6 ,=
c
= ( c( ALG1, priori certainty), ... , c( ALGN, priori certainty) )
1
0
if algorithm i doesn't run at jth order
... subject to
$8,
I 1 (for jth order, at most one algorithm can run.)
i=l
I:6 ,I N
1 (for every algorithm, it can be at most in one order.)
j=l
N
N
X&,Ti I T
(T is the maximum time that we can bear.)
where
T i = processing time of algorithm i.
97
In the above optimization problems we have not considered the space of interest soi when we formalize the problem. If we put it in, the formulation of the problem becomes more complicated:
F
1
=(T (ALG1 , initial soi ) , . . . , T (ALGN, initial soi) )
A
I
= ( a( ALG1, initial soi), . .. , a( ALGN,initial soi) )
...
5F
fi
is the total running time.
k=l
Now we need to change the goal function to N
Minimize
F~ fi 0
k =l
Note: a ( ALGi, soi) is the output soi after using algorithm i on the input soi.
A
F
6
6, is the output soi after using the first algorithm.
is the running time of the k" order algorithm.
T (ALGi , soi ) is the running time of the i" algorithm on soi
98
8. An Experimental Prototype An experimental prototype for query processing, fusion and visualization has been implemented. As shown in Figure 1, after a recognition algorithm is applied and some objects identified, these objects can be displayed. Each object has seven attributes: object id, object color, object type, source, recognition algorithm, estimated processing time and certainty range for object recognition. There are also hidden attributes including the parameters for the minimum enclosing rectangle of that object, the spatial and temporal coordinates of the query originator, the space of interest and time of interest, and the certainty range for non-recognition of object.
As shown in Figure 2, the recognition algorithms can be applied dynamically to an area in the image, and the recognized objects are displayed. Figures 3 and 4 illustrate the construction of a CQL query and the results of processing the query, respectively.
Figure 1. Objects recognized by the recognition algorithms have seven attributes.
99
Figure 2. Recognition algorithms can be applied dynamically.
Figure 3. Visual construction of a query. The resultant query is shown in the upper right window.
100
Figure 4.Visualization of Query processing. The result of query was shown on ?om12 -LOW1
-IOWl I - BOW1 IQusrys -SOW1 5ualy9
- D OW2 ~
6d OW2
I Querp0
-IOW2
I Query1 -1OW2 IQuay1 2
Figure 5. The dependency tree (left) and a selected node (right). We can trace the Query processing step by step.
101
mmm POW12
-4 €Bowl
mow1 mow1
- 4ow2
8OEJ2 SOW2
mow2
Figure 6. The next step of query processing after the step shown in Figure 5. Bot tep.
Figure 7.
p window.
The main window in Figure 3 illustrates the visual construction of a query. The user drags and drops objects and enters their attributes, and the constructed query is shown in the upper right window. The objects in the dependency tree are shown as an object stream in the middle right window. In Figure 4 the lower right window shows the query results. When an evolutionary query is being executed, its dependency tree will change dynamically. Figure 5 displays the same information as that of the object stream, but in a format more familiar to
102
end users. It shows the dependency tree on the left side of the screen, and the selected node with its attributes on the right side of the screen. In the next step, both the dependency tree and the query may be changed, as illustrated in Figure 6. As shown in Figure 7, the information of optimization can be shown in a popup window. The CQL query shown in the upper right window of Figure 3 is as follows: SELECT object CLUSTER * ALIAS OBJl OBJ2 FROM SELECT t CLUSTER * FROM video-source WHERE 0BJl.type = 'car' AND 0BJl.color = 'red AND OBJ2.type = 'truck AND 0BJl.t < OBJ2.t The corresponding Object Stream is: COMBINE OBJ 1,OBJ2 OBJl.t=OBJ2.t OBJ2 video-source OBJ2.type = 'truck' OBJl video-source 0BJl.type = 'car' 0BJl.color = 'red The Result Set is: Carl, Truck3 C a d , Truck4 i.e., either (Carl, Truck3) or {Car2, Truck 4)is the retrieved result. Another example to show the fusion operation is illustrated in Figure 8. The ZQL query for fusion is as follows:
103
Figure 8. A fusion query. SELECT object CLUSTER * ALIAS OBJl OBJ2 FROM SELECT t CLUSTER * FROM LR, CCD, IR WHERE WHERE OBJ 1.type = 'car' AND OBJ 1 .color = 'red' AND OBJ2.type = 'truck AND 0BJl.t e OBJ2.t The Object Stream is: COMBINE OBJl,OBJ2 OBJ1. t
104
Figure 9. A fusion query's dependency tree consists of UnionKombination nodes, Fusion nodes and Query nodes. The selected fusion node with its attributes is displayed on the right screen. As already explained, Figure 5 and 6 show the dependency tree whose dynamic changes illustrates the steps of query processing. Figure 9 also shows the dependency tree. However, in this example of query processing for fusion, several nodes are marked as 'cut-off nodes, meaning the certainty values are already above a threshold and consequently no further processing of these nodes is needed.
9.
Query Morphing by Non-Incremental Query Modification
In preceding sections we explained our approach based upon mainly incremental query modification. Non-incremental query modification in general remains to be incorporated into the approach. Therefore in this section we describe query morphing by non-incremental query modification. Conceptually, query morphing is somewhat like image morphing: the end user formulates one query called a query point and requests the query processor to morph one query point into another query point. Within limits, the two query points are arbitrary, and the query processor is able to figure out automatically how a query point is morphed into another query point. Sometimes query morphing is accomplished by modifying the query incrementally. Sometimes more substantial query modification is necessary. In incremental query
105
modification, the two query points are more or less similar. In non-incremental query modification, the two query points are substantially different. We define a distance measure d(ql, q,,) between two query points q1 and q, based upon the number and type of transformation steps to transform q1 into 4., Depending upon the type of transformation, different weights are assigned to the transformation steps. An infinite weight is assigned to a forbidden type of transformation. Let fi,f2, ...,fn.l be the n-1 transformations such that fi (ql) = q 2 , f 2 (q2) = q3, ..., fn-1 (q,-l) = q,. The distance between 41 and q, is defined as: n
d(ql, qn) =
wj where w, is the weight assigned to transformation stepA
j=l
The following are examples of transformation steps : add target attributes (weight 1) drop target attributes (weight 1) replace sources (weight 3) add a conditional clause (weight 2) Each transformation step is assigned a certain weight. For example the add transformation step has the weight 1. An incremental morphing pair of queries (ql, 4), is one whose distance d(ql, 4”) is below a threshold 5. If q1 and qn form an incremental morphing pair, morphing from one into another by incremental query modification is possible. If q 1 and q, do not form an incremental morphing pair, morphing from one into another by incremental query modification is impossible. A non-incremental morphing pair of queries (ql, qn) is one whose distance d(ql, 9,) is finite but above the threshold z. If q1 and q2 form a non-incremental morphing pair, morphing from one into another by non-incremental query modification is possible. A non-incremental transformation is one that completely rewrites the query. A morphing pair is either an incremental or a non-incremental morphing pair. If q1 and q2 do not form a morphing pair, morphing from one into another by query modification is impossible. In the preceding sections, the application we focused on is the fusion of remote sensing data such as video, infrared images and radar images, and the examples are also drawn from that application area. We will now use distance learning as another application area in the following example on query morphing.
106
In distance learning, a student searches for information related to a subject matter such as “binary tree” from the sources. The sources initially specified by the student are the textbooks provided by the instructor. Original query I : Select object From textbook Where object.topic = “binary tree”
The query processor finds related class notes, reference books and videotaped materials, and consequently the sources in the query are updated by the query processor. This is a typical example of incremental query morphing. Morphed query 2 : Select object From textbook, classnotes, reference-book, videotaped-materials Where object.topic = “binary tree” In experience-based distance learning, learning-by-doing becomes very important. Information obtained from case studies and work experiences need to be fused into the knowledge base and made available to the learner. Fusion of information is thus required. The original query is rewritten as the fusion of several media-specific queries, each reformulated according to the characteristics of the respective media. This is an example of a substantial query transformation for non-incremental query morphing. Morphed query 3: Select object From case-studies Where object.topic = “binary tree” or Select object From life-experiences Where object.topic = “binary tree” Last but not least, learning-from-peers is another important aspect in peeroriented distance learning. A fellow student may possess information items to be shared: textbooks, class notes, case studies and work experiences. The original query is rewritten as the fusion of several peer-oriented queries, each reformulated according to the user-profile of the respective peer. This is yet another example of a substantial query transformation for non-incremental query morphing.
107
Morphed query 4 : Select object From classnotes of student1 Where object.topic = “binary tree” or Select object From life-experiences of student2 Where object.topic = “binary tree”
It is important to note that both the morphed query and the retrieval results contain information valuable to the user/learner/student. In other words, the questions are just as important as the answers. To this end, adlets [S] is used to generate morphed queries to gather information. Adlets travel from nodes to nodes to acquire more information. The query is morphed as the adlets travel along a chosen path.
As illustrated by the above examples, query morphing is often event-driven: when an event occurs, query morphing is invoked. Events are characterized by conditions: “an object is occluded”, “the time of the day is 7pm”, “the weather has changed from cloudy to raining”, “a new textbook on this subject becomes available”, “a new work experience on this subject is obtained”, “learner Smith has acquired work experience on this subject” and so on. Sometimes the condition is changing along certain direction and an event becomes predictable: it is 6pm and soon will be night time, it is changing from cloudy to raining, and so on. The end user can then be prompted to decide what to do in case such an event occurs. The end user can posit a sequence of events and query points to define a query path for morphing. In some cases the initial query is too restrictive and the end user may wish to enhance the significance of a certain type of objects. If the end user is able to visualize the type of objects that meet the information needs, the end user can add clauses andor constraints involving that type of object to the evolutionary query. In other words the end user can repeatedly adjust the query path for morphing to focus on certain type of objects. In that regard, we note the importance of visualization in query morphing: we need to visualize both the query and the retrieval results. Corresponding to adjusting the query path, the morphing algorithm revises its strategy to modify the query. For example, in case of cloudy conditions a source such as the CCD sensor should be replaced by another source such as the IR sensor. A roaming query path can be defined, which is materialized into a query path based upon the contents of the ontological knowledge base. By specifying
108
the appropriate adlet propagation rule, adlet generation rule and adlet modification rule, the interactive morphing algorithm can be designed.
10. Discussion To study the impact of query morphing on various applications in information fusion, a test bed for evolutionary query visualization and evaluation is being implemented. As mentioned in previous sections, the data from the sensors must be merged to produce a coherent response to user queries. However, given that each sensor is contributing incomplete and potentially conflicting information, it is likely that the system’s response will still contain an element of uncertainty. This motivates the need for a test bed that can effectively display potentially ambiguous results to the viewer in a meaningful way. Under the guise of routing for emergency rescue in catastrophic events, a test bed could be implemented that obeys the following stages. First, a query is issued that activates the appropriate sensors to collect information about the environment. A query processor then collects data from the sensors and fuses the information into a coherent statement about the environment. The relevant information is passed to a display that helps the viewer visualize the results. An interaction loop between the viewer and the display allows the viewer to provide feedback and modify the query. At the broadest level, the test bed is fairly simple, consisting of three main interface components: a query mechanism, a visualization display, and a feedback mechanism. This general model offers the broadest possible solution and probably describes many visual information systems for fusion. Additional requirements may include:
1) To process the query requires analysis of partial, ambiguous, redundant, and possibly conflicting information. ResolutiodFusion of the sensor data ensures a level of uncertainty that needs to be expressed in the visualization. 2) The visualization needs to be able to assist the viewer with the discriminating relevant information from background noise.
3) The evaluation of the system by the viewer needs to permit the viewer to provide feedback on the accuracy of the sensors (a), as well as the accuracy of guidance provided by the visualization (b). To address these issues, a modular approach is adopted, selecting specific technologies that address the needs for each of the components. The foundation
109
for the Query module can rely on ZQL, the query refinement fusion algorithms described in [lo] and the query morphing approach discussed above. A framework for evolutionary query, visualization and evaluation of dynamic environments is formulated. To close the loop in the system requires that the viewer be able to provide feedback, evaluating both the query results as well as the visualization itself. The query mechanism should support two major types of feedback: sensor accuracy and expressiveness of the query. Results from evolutionary query optimization using limited query morphing, and interactive approaches using roaming query paths can then be compared and evaluated. Initially we will invite graduate and undergraduate students to participate in the evaluation study. When the algorithms are well developed and the system more mature, we plan to evaluate the applicability of query morphing techniques to emergency management.
References 1. J. Baumann et al., “Mole - Concepts of a Mobile Agent System”, World Wide Web, Vol. 1, No. 3, 1998, pp 123-137. 2. C. Baumer, “Grasshopper - A Universal Agent Platform based on MASIF and FIPA Standards”, First International Workshop on Mobile Agents for Telecommunication Applications (MATA’99), Ottawa, Canada, October 1999, World Scientific, pp 1-18. 3. K. Chakrabarti, K. Porkaew and S. Mehrotra, “Efficient Query Refinement in Multimedia Databases, 16th International Conference on Data Engineering, San Diego, California, February 28 - March 3, 2000. 4. S. K. Chang and E. Jungert, Symbolic Projection for Image Information Retrieval and Spatial Reasoning, Academic Press, London, 1996. 5. S. K. Chang and E. Jungert, “A Spatialltemporal query language for multiple data sources in a heterogeneous information system environment”, The International Journal of Cooperative Information Systems (IJCIS), v01. 7, NOS2 & 3, 1998, pp 167-186. 6. S. K. Chang, G. Costagliola and E. Jungert, “Querying Multimedia Data Sources and Databases”, Proceedings of the 3rdInternational Conference on Visual Information Systems (Visual’99), Amsterdam, The Netherlands, June 2-4, 1999. 7. S. K. Chang, “The Sentient Map”, Journal of Visual Languages and Computing, Vol. 11, No. 4, August 2000, pp 455-474. 8. S . K. Chang and T. Znati, “Adlet: An Active Document Abstraction for Multimedia Information Fusion”, IEEE Trans. on Knowledge and Data Engineering, JanuaryFebruary 2001, 112-123. 9. S. K. Chang, G. Costagliola and E. Jungert, “SpatiaVTemporaf Query Processing for Information Fusion Applications”, Proceedings of the 4” International Conference on Visual Information Systems (Visua1’2000),
110
Lyon, France, November 2000, Lecture Notes in Computer Sciences 1929, Robert Laurini (Ed.), Springer, Berlin, pp 127-139. 10. S. K. Chang, E. Jungert and G. Costagliola, “Multi-sensor Information Fusion by Query Refinement”, Proc. of 5th Int’l Cnference on Visual Information Systems, Hsin Chu, Taiwan, March 2002, pp. 1-11. 11. C.-Y. Chong, S. Mori, K.-C Chang and W. H. Baker, “Architectures and Algorithms for Track Association and Fusion”, Proceedings of Fusion’99, Sunnyvale, CA, July 6-8, 1999, pp 239-246. 12. G. Grafe, “Query Evaluation Techniques for Large Databases”, ACM Computing Surveys, Vol. 25, No. 2, June 1993. 13. M. Jarke and J. Cohen, “Query Optimization in Database Systems”, ACM Computing Surveys, Vol. 16, No. 2, 1984. 14. E. Jungert, “An Information fusion System for Object Classification and Decision Support Using Multiple Heterogeneous Data Sources”, Proceedings of the 2”d International Conference on Information Fusion (Fusion’99), Sunnyvale, California, USA, July 6-8, 1999. 15. E. Jungert, “A Data Fusion Concept for a Query Language for Multiple Data Sources ”, Proceedings of the 3rd International Conference on Information Fusion (FUSION 2000), Paris, France, July 10-13, 2000. 16. L. A. Klein, “A Boolean Algebra Approach to Multiple Sensor Voting Fusion”, IEEE Transactions on Aerospace and Electronic Systems, Vol. 29, NO. 2, April 1993, pp 317-327. 17. H. Kosch, M. Doller and L. Boszormenyi, “Content-based Indexing and Retrieval supported by Mobile Agent Technology”, Multimedia Databases and Image Communication, LNCS2 184, (M. Tucci, ed.), Springer-Verlag, Berlin, 2001, pp 152-166. 18. D. B. Lange and M. Oshima, Programming and Deploying Java Mobile Agents with Aglets, Addison-Wesley, Reading, MA, USA, 1999. 19. S. Y.Lee and F. J. Hsu, “Spatial Reasoning and Similarity Retrieval of images using 2D C-string knowledge Representation”, Pattern Recognition, vol. 25, no 3, 1992, pp 305-318. 20. Hideyuki Nakashima, “Cyber Assist Project for Situated Human Support”, Proceedings of 2002 International Conference on Distributed Multimedia Systems, Hotel Sofitel, San Francisco Bay, September 26-28,2002. 21. J. R. Parker, “Multiple Sensors, Voting Methods and Target Value Analysis”, Proceedings of SPIE Conference on Signal Processing, Sensor Fusion and Target Recognition VI, SPIE vol. 3720, Orlando, Florida, April 1999, pp 330-335. 22. M. Stonebraker, “Implementation of Integrity Constraints and Views by Query Modification”, in SIGMOD, 1975. 23. Bienvenido VCIez, Ron Wiess, Mark A. Sheldon, and David K. Gifford, “Fast and Effective Query Refinement”, Proceedings of the 20” ACM
111
Conference on Research and Development in Information Retrieval (SIGIR97), Philadelphia, Pennsylvania, July 1997. 24. E. Waltz and J. Llinas, Multisensor data fusion, Artect House, Boston, 1990. 25. Wernert, E. and A. Hanson, “A Framework for Assisted Exploration with Collaboration”, Proceedings of Visualization ‘99, IEEE Computer Society Press. 1999, pp. 241-248. 26. F. E. White, “Managing Data Fusion Systems in Joint and Coalition Warfare”, Proceedings of EuroFusion98 - International Conference on Data Fusion, October 1998, Great Malvern, United Kingdom, pp 49-52.
112
IMAGE REPRESENTATION AND RETRIEVAL WITH TOPOLOGICAL TREES
c. G R A N A ~ G. , PELLACANI~,s. SEIDENARI~,R. CUCCHIARA~ t Dipartamento d i Ingegneria dell ’Informazione % Dapartimento d i Dermatologia Universita d i Modena e Reggio Emilia, Italy Typical processes of image representation comprehend initial region segmentation followed by a description of single regions’ feature and their relationships. Then a graph model can be exploited in order t o integrate the knowledge of the specific regions (that are the attributed relational graph’s (ARG) nodes) and the regions’ relations (that are the ARG’s edges). In this work we use color features to guide region segmentation, geometric features to characterize regions one by one and topological features (and in particular incluszon) t o describe regions’ relationships. Guided by the inclusion property we define the Topological Tree (TT) as a n image representation model t h a t exploiting the transitive property of inclusion, uses the adjacency and inclusion topological features. We propose a n approach based on a recursive version of fuzzy c-means t o construct t h e topological tree directly from t h e initial image, performing both segmentation and TT construction. T h e TT can be exploited in many applications of image analysis and image retrieval by similarity in those contexts where inclusion is a key feature: we propose a n applicative case of analysis of dermatological images t o support the melanoma diagnosis.In this paper describe details of the TT algorithm, including the management of not ideality and a n approximate measure of tree similarity in order to retrieve skin lesion with a similar TT-based description.
1. Introduction
A fruitful representation of the image content, often exploited in many tasks of understanding, recognition, and information retrieval by similarity, is based on region segmentation; a richer description adds to the region’s attributes some relationships between regions, spatial and topological, that describe the way we perceive the mutual relations between parts of the image. To this aim, graph-based description is a power formalism t o model the knowledge extracted from the images of the regions of interest and their relationships. Moreover, the management of large volumes of digital images has generated additional interest in methods and tools for real time archiving
113
and retrieval of images by content3. Several approaches to the problem of content-based image management have been proposed and some have In been implemented on research prototypes and commercial some works, Attributed Relational Graphs (ARGs) have been introduced as a mean6i1 to describe the spatial relationships and indexing techniques have been proposed to speed up the matching based on the edit distance6 approach. In Petrakis' papers1y2ARGSand edit distance are used for image retrieval in medical image databases. Accordingly, we defined the Topological Tree , a rich description model that can be constructed a posteriori, after the region segmentation step, for each type of image. However, in some applicative contexts, in which the inclusion is a key feature, the inclusion property can be exploited for segmentation too. Thus we propose an approach called Recursive-FCM (fuzzy c-means) that exploits both color and inclusion to perform segmentation and at the same time the TT construction. This algorithm has a general formulation but is meaningful in applications that search for inclusion and color: typical examples are dermatological images of skin lesions that appear as skin's zones darker than the normal skin, with many nuances of skin color. Many techniques have be proposed for color segmentation: among them, many have been adopted for skin lesion segmentation, as grayscale thresholding and color clustering7. Fuzzy c-means (FCM) color clustering has been successfully adopted in the work of Schmid' that adds Principal component Analysis (PCA)to FCM: a FCM segmentation over the first two principal components of the color space is tested to be meaningful and robust for skin lesion images. In a recent workg we described a recursive extension of that approach and here we will show further improvements that take into accounts not idealities. Moreover, in the second part of the paper we propose an approximate measure of tree similarity that can be exploited to search similarities between skin lesions in a image retrieval system. 2. Topological Relations
Given an image space and an 8-connection neighborhood system, that for each point xi defines the neighbor set Nzi, segmentation by color clustering aims to partition the image into a set of regions R = { R l ,. . . , R k } such that U Ri = I and Ri = 0 . To this aim, a clustering process that groups pixel w.r.t their color, should embed or be followed by a pixel connectivity analysis, according with the given neighborhood system. Then a graph-based representation describes spatial and/or topological
n
114
relations between regions. An example is the adjacency graph, a graph G(V,E ) whose vertexes are the image regions (V = R) and whose arcs show the adjacency property, that is a neighborhood system a t region level. In this context adjacency is defined as follows: Def.1: A region R, is adjacent to Rj @ 3 xi E Ri, xj E Rj: xj E N z , . In addition t o connectivity intra-region and adjacency inter-regions, we aim to evaluate inclusion of a region into another, thus we need to formally define inclusion between regions. First, we consider an “extended” set of image regions g = R U {Ro},being Ro a dummy region representing the external boundary of an image. Then, we define the inclusion property as follows: Def.2: A region Ri E R is included in Rj E @ $ P = { R l , .. . ,R N }: N
Ro U Ri U
u R, is a connected region A R j $! P .
n=l
This definition means that is not possible t o draw a path of connected points between region Ri and the end of the image space (Ro)that doesn’t include points of Rj. The transitive property holds for inclusion: if Ri is included into R j and Rj into Rt, than Ri is included into Rt. Thus a tree model is a natural representation for inclusion. This ideal definition must be relaxed for implementation purposes in real images: thus we use aFCH- inclusion definition that substitutes in Def.2 the filled convex hull of Rj to Rj itself. In this mode also a not exact inclusion in a topological sense is accepted in real image description.
3. Construction of the Topological Tree Using the FCH-inclusion propriety we developed an algorithm for providing segmentation and tree description. In previous worksg we detailed the color based segmentation with recursive-FCM. Here we add improvements to deal with exceptions found in particular cases. The algorithm for TT construction of Fig. 1 can be summarized saying that: (1) it carries out a color based segmentation in two clusters, using the
PCA and FCM algorithmg; (2) while segmenting it builds the corresponding tree; (3) it recursively applies the segmentation to the regions of interest created by the previous steps of the algorithm. In particular the algorithm finds the presence of a region that contains all the others, inserts it in the tree and then continues to apply the algorithm
115
RecursiveFCM(R,,P,)
Extraction of the regions R, and of their corresponding sets Pk
v regions R, RecursiveFCM(R,,P,)
End.
Figure 1. Algorithm flow chart
to the other regions, obtaining a further partitioning. In Fig. 1 is possible to see the recursive structure of the algorithm Recursive FCM for the construction of the Topological Tree. Starting from a region R, (initially equivalent to the whole image I ) is verified if it is possible to further partition R,. To obtain regions of interest of uniform color and significant area, the limits for the partitioning conditions shall be given by the variance of the first two components of PCA and from the size of the extracted regions. If the partitioning condition is not verified, R, is inserted in the tree. Otherwise, R, is clustered in more regions, the not
116
significant areas are erased and it is searched for the presence of an external one. The remaining regions are organized in a structure that allows for a correct recursion step. In particular, in the ideal case (which generates a TT with a single child for each node), the FCM algorithm creates two clusters and should create two regions, one including the other. In real images, often many regions are created. If one of these can be chosen as “external” it becomes a new node, parent of the others; all the other regions are further inspected in the recursion. However, some of these regions could be also present mutual inclusions and thus not allow a correct tree generation. We call these regions suspended, since they need a specific management.
3.1. Search f o r a n “external” region of R The external region R E X Tof R is the region that FCH-includes all other regions of R.This is the region that is searched for and added t o the tree. In formulae R E X T E R tjRi 6 R,Ri # REXT+ Ri is included in the filled convex hull of R E X T(is FCH-included). Since is much easier and fast to check the the inclusion between the ext e n t s (or bounding box) of two regions (extent-inclusion), it is possible t o use the observation that FCH-inclusion implies extent-inclusion to search for R E X T . This is accomplished searching a region RkxT such that all remaining regions are extent-included in it; if such a region exists, is necessary to check if all the other regions are also FCH-included in RkxT. 3.2. Use of ‘?ow interest” and suspended regions
The decomposition of a region R, can cause the generation of regions with negligible size. Such regions are considered as low interest for the interpretation of images. We will use a parameter to select the minimum area that a region can assume to be interesting. The regions that have to be eliminated are collected, during the tree construction, in a specific structure for later integration in the tree, after its complete construction. After obtaining the set of regions R, (in Fig. 1 from R = partitioning(R,)), eliminating “low interest” regions and inserting an external region to the tree, is not possible to call the algorithm for all the extracted regions. In fact the possible presence of inclusion between the regions of R could lead to the construction of a wrong tree, with a loss of inclusion relationships between children produced by different clusters. Because of this, the concept of ‘‘suspended” regions has been introduced,
117
indicating with this term the set of all the regions that cannot be immediately analyzed, but must wait for the including one. We thus consider the set R after the elimination of low interest regions and the possible external one. From R, we distinguish between regions Rk not included in others and sets Pk of regions included in Rk . Now, for each region the algorithm is recursively called along with its set of suspended regions. RNI = S P = O b ’ R , e R {
if (3 Rk
E RNI : R, is included in R k ) Pk = p k u { R a }
else
Figure 2.
Pseudo-code for the integration of “low interest” regions
The process for finding all suspended regions is described in Fig. 3.2. It is to note that a reduction of the search space is obtained, by ignoring all regions of P , in fact the external region should contain not only all regions of R but also all the suspended regions, but for the transitive property of inclusion this is guaranteed by the fact that they are included in regions of R. 4. Tree matching
The construction of TT is the basis of a retrieval approach searching fro tree similarities. An interesting non exact tree matching uses the edit distance to compare two-trees. It measure the cost of operations such as adding or eliminating nodes to transform a tree into another. Unfortunately the
118
edit distance based approach problem has proved to be computationally too expansive to be used without modifications in a search for similarities context. We tested this approach over our databases using linear assignment to explore all the search space but we found unacceptable response time. Moreover, it can be difficult t o describe the cost of an operation in order to use it together with an inter-node, feature based, similarity. These reasons lead us to produce a quick and sub-optimal algorithm that heavily relies on two assumptions:
(1) we can match only nodes on the same level of the tree; (2) given two sets of nodes, taken from two trees, we match one against the other without solving the associated linear assignment problem, but considering a sorting of the two sets and letting greater importance nodes have first choice on the other set. The first assumption strong limit the search space: it is acceptable in dermatological context and is motivated by the observation that in our images each level tends to represent a specific feature as the skin, the lesion or its colored areas and an inter-level matching doesn't always make great sense. The second one is a simplification that quickly produces good results, without any assurance of reaching an optimum. An observation that qualitatively justifies this choice is the fact that higher importance nodes are weighted more in their contribution to the matching function, so guaranteeing that they get a better match leads towards an higher match direction. The algorithm works recursively comparing two sub-trees according to the following steps: (1) The roots are compared in an Euclidean feature space by the distance d of the feature vec.ior. It can comprehend color, area, symmetry, texture and whichever other information of each region. An equivalence measure is obtained as 1 l+d
E=-
(2) Children equivalence is evaluated: (a) Let us call it 'TI the tree with more nodes and Tz the other one; (b) The nodes of TI are considered in order of importance (evaluated on the feature vector);
119
(c) Each children of TI is matched against all not assigned children of T,; (d) After evaluating the equivalence of all nodes, not assigned children of TI are matched against the null vector and prodnce a negative match.
(3) Total equivalence is given by Got =
Eroot
2
(ciI&%* CaIi
+
1)
,
where Ef is the signed equivalence of each node (with a matching node or with the null vector), Ii is the importance of the node and Erootis the equivalence of the roots, as previously defined. The equivalence measure E is bounded between 0 and 1 and this guarantees that E,' is in range [-I, 11 so the the weighted sum can give -1 in case of total mismatch of the tree structure or 1 in case of perfect match. This value is converted by the equation to the interval [0,1] and used as a reduction factor for the matching value of the roots. The interval shift has the implicit property of reducing the influence of a mismatch at lower levels of the tree. In this way, given an image represented by its TT we are able to find in a image database other images with a similar TT on the basis of the previous algorithm. Obviously the TT representation cannot be the unique approach to support query-by-example system and in melanoma diagnosis a number of other features" on the whole lesion or their part should be considered in an integrated way. Nevertheless, this is a powerful representation method that integrated with proven dermatological criteria can give interesting results in retrieval. 5. Experimental Results
Experimental results have been conducted on synthetic ad on real images, to first test the correct response of the adopted algorithm and then to verify its applicability to real world images.
5.1. Synthetic Images In Fig. 3 we report an example of test over synthetic images: we want search the similarity between S1 and the whole set. Images, the TT and the match score is indicated. Obviously the matching value of 1.000 in S1 means a
120
I
1.0000
0.9005 Figure 3.
0.9999
0.9928
I
0.8962
Synthetic images(from left to right Sl,S2,S3,S4,S5)
perfect match. All other images present not so significant variation from the original image. The colors were ignored in this evaluation and the feature vector distance is provided computing only the distance of the center of mass from the parent one and the percentage of parent area occupied by the region. Thus S3 and S4 are very similar to S1 while S2 and S5 do not have a node of S1. The results follow a correct evaluation, giving the ability t o order the images by the similarity from the first one. 5.2. Dermatological Images
Our application context is the analysis of dermatological images for melanoma diagnosis and this family of images present a natural partition of color regions included one into the other because of their usual growth process; moreover the position and size of inner areas are significant (as a diagnostic feature). In Fig. 4 some results of a query by example research are shown and an overall good retrieval was observed. In particular ideal non-melanoma skin lesion have a TT described by a list (as the last one image in Fig. 4), while melanomas typically present a more complicated structure. Unfortunately we still didn’t have the possibility of including a complete diagnostic set of features in the retrieval algorithm, so a quantitative measure is not still available. 6. Conclusions
We showed a segmentation technique able to extract the inclusion-adjacency structure of the image, accompanied by a low computational cost matching technique that enables a flexible feature search over trees, instead that over the whole images. Visual comparison over synthetic and real images have been shown to assess the promising opportunity of this methodology. We would like to thank Fabio Zanella and other students for the code generation and the tests performed.
121
Figure 4.
Experiments on real images
References E.G.M. Petrakis et al., Image Indexin Based on Spatial Similarity, Technical Report MUSIC-TR-01-99, Multimedia Systems Institute of Crete (MUSIC), 1999. E.G.M. Petrakis et al., Similarity Searching in Medical Image Databases IEEE Trans. Knowl. Data Eng. 9, 435-447 (1997) A.W.M. Smeulders et al., Content-Based Image Retrieval at the End of the Early Years, IEEE Trans. Pattern Anal. Mach. Intell. 22, 1349-1380 (2000). M. Flickner et. al., Query By Image and Video Content: The QBIC System,
122
Computer 28, 23-32 (1995). 5 . A. Pentland et al., Photobook: Content Based Manipulation of Image Databases, Int. J . Comput. Vzs. 18,233-254 (1996). 6. B.T. Messmer, Efficient Graph Matching Algorithms. PhD thesis, Univ. of Bern, Switzerland, 1995. 7. S.E. Umbaugh et al., Automatic Color Segmentation Algorithms: With Application to Skin Tumor Feature Identification, I E E E Engineering in Medicine and Biology 12,75-82 (1993). 8. Ph. Schmid, Segmentation of Digitized Dermatoscopic Images by T w e Dimensional Color Clustering, I E E E Transactions o n Medical Imaging 18, 164-171 (1999) 9. R. Cucchiara et al., Exploiting Color and Topological Features for Region Segmentation with Recursive Fuzzy c-means, Machine Graphics and Vision 11, 169-182 (2002) 10. C. Grana et al., A New Algorithm for Border Description of Polarized Light Surface Microscopic Images of Pigmented Skin Lesions, in press on I E E E Transactions o n Medical I m a g i n g , (2003)
123
An Integrated Environment for Control and Management of Pictorial Information Systems A. F. Abate, R. Cassino and M. Tucci Dipartimento di Matematica e Informatica Universita di Salemo 84081 Baronissi, Salerno - ITALY E-mail: {abate, rcassino, mtucci }@unisa.it
Abstract. The paper describes an integrated environment for control and management of pictorial information system. We consider the diagnostic radiology field as a case study. A system for filing and processing medical images is a particular pictorial information system that requires to manage information of heterogeneous nature. In this perspective, the developed environment provides the medical user with the tools to manage textual data and images in integrated way. A Visual Data Definition Language was projected and implemented, that allows the administrator of the system to extend the actual database on the base of new queries of the users: the insertion of new entities and the creation of new relationships between them take place simply manipulating the iconical representation related to the information you manage. A Visual Query Language represents a visual environment in which a user could query the database using iconic operators related to the management of the alphanumeric and pictorial information with the ability to formulate composed queries: from an alphanumeric query will be drawn pictorials data that are contained in the database and vice versa.
1 Introduction A pictorial information system is a system for analysis, storage and visualization of information of heterogeneous nature: images and alphanumeric data. Figure 1 shows the schematic diagram of a pictorial information system [ 11: The progress in data transmission has led to high scale interconnection of many workstation and to "multimedia communication" involving textual data, images, etc.. ..At first, images are digitalized through an image acquisition device (i.e. tomograph, etc.) and then manipulated by software tools. The phases that characterize the elaboration of an image are: - digitalization; - coding and data compression; - improvement of quality and restore; - segmentation; - image analysis and description; - pictorial information management.
124
Planning a pictorial information system requires to evaluate the informative content of an image in an objective way through measures of pictorial information. An image storing system is a support to the electronic filing of image characteristics, that allows to create and manage a pictorial database. A pictorial database is the nucleus of a pictorial information system, whose essential feature is coding of images. The image coding process can be described in three phases: - image partitioning in smaller elements; - contour and geo-morphological feature extraction; - construction of a meta-description of the image usehl for indexing and retrieval of pictorial information. Image Transmission
+ Comunication Net
f
Storage
t Figure 1. A Pictorial Information System.
In the field of diagnostic radiology the typical information to consider can be divided findamentally in two categories: data related to the clinical briefcases management, that give the medical consumer a real support for the development of protocol of treatment for the examined patients, and data related to the medical images elaboration, filing and retrieval. The analysis of a medical image, a report of a radiology examination, involves the execution of complex operations like the survey of a possible anomaly, the determination of its exact spatial position with respect to other objects (elements of the human body) contained in the image, the calculus of its geomorphological characteristics such as area, density, symmetry etc. In this work, we apply an icon-based methodology to device an integrated environment providing a set of tools for visual pictorial database definition and manipulation. Section 2 describes the integrated environment in which the Visual Data Definition Language and the Visual Query Language are defined. Section 3 presents the Medical Image Management System and similarity retrieval techniques. Section 4 describes the management of query results. Finally, section 5 presents future extensions of this research.
125
2 The integrated environment A system for the acquisition and elaboration of medical images is a pictorial information system that requires to manage textual data (related to the management of medical record) and medical images. In the field of the informative medical systems the elaboration of a radiological image consists of abnormality individuation, determination of geo-morphological characteristics, evaluation of the spatial relationships between the pathology and the anatomical organs in which it is located. On the base of these information, the retrieval of similar images, by means of appropriate retrieval techniques, facilitates the formulation of diagnosis and treatment plan for examined patients. In this perspective, a Visual Data Definition Language was introduced and implemented; it allows the administrator of the system to extend the actual database on the basis of new queries of the users: the insertion of new entity and the creation of new relationships between them take place simply manipulating the iconical representation related to the managed information (Figure 2).
Figure 2. Define a new entity.
The realized Visual Query Language represents a visual environment in whch the user interrogates the database by means of iconic operators related to the management of alphanumeric and pictorial information with the ability to formulate composed queries: from an alphanumeric query pictorials data can be drawn that are contained in the database and vice versa (Figure 3).
126
Figure 3. Composed query.
It is possible to store the results of a query in the database, in terms of the corresponding iconic representation, with an efficient reuse of previously formulated queries. The developed VDBMS is implemented in Java, using JDBC to connect to the underlying RDBMS. Particularly, the environment was developed like a webclient application that guarantees the portability on any platform and the access to any RDBMS (eventually remote) by means of a platform-independent user friendly interface.
3 Pictorial information management A medical image is included in the database if it contains an abnormality whose g e e morphological characteristics are automatically obtained. Such characteristics are deliberated once the image is segmented [2], [3], [4], [ 5 ] and the contour of the individuated pathology is extracted [6], [7], [8], [9] using algorithms already available in the literature (Figure 4). Once the contour of the anomaly located in the examined radiological image is extracted, the analysis of a pathology proceeds with the calculus of its geomorphological characteristics [lo], [ 111, [12], [ 131, [14]. This information will be used for querying the database relatively to the similarity retrieval for geomorphological characteristics of other images previously examined. One of the principal problems to consider in the pictorial database planning is to establish an appropriate representation of images. An image can be described both on the basis of the alphanumeric information that identifies the characteristics of the form and on the basis of the spatial arrangement of the objects it contains.
127
. .
I
I
Figure 4. Segmentation and contour extraction.
In the environment, a virtual image was linked to each examined image. The virtual image is built using the canonical representative objects of human body components, which describe the content of the real image and the spatial relationships between the contained objects and the individualized anomaly in compact manner (Figure 5).
Figure 5. Virtual image associated with an image.
The geo-morphological characteristics of the examined abnormality and the virtual image are used to interrogate the database to retrieve similar images in the database.
128
The concept of similarity between two images is expressed in terms of Euclidean distance, in the space of the characteristics, between the points that represent them. The geo-morphological characteristics we used for search strategy are: area, density, asymmetry, orientation as regards the centroid, spreadness and uniformity (Figure 6).
Figure 6. Retneval by geo-morphological characteristics.
Given a real image im, the virtual image im.i associated with im is a pair (Ob, Rel) where: Ob = {obl, ob2,..., ob,} is a set of objects of im; Re1 = (Rel,, Rel,) is a pair of sets of binary spatial relations over Ob, where Rel, (resp. Rel,) contains the mutually disjoint subsets of Ob x Ob that express relations holding between pairs of objects of im along the x-projection (resp. y-projection) 1151. Let Q be the virtual image associated with an image used as a query for similarity retrieval and i m the ~ virtual image associated with one of the images examined for possible retrieval. In the case of similarity retrieval for spatial relationships the similarity degree, denoted by Sim-deg (Q, imJ, is a value belonging to the interval [0,1] that is defined by a formula that considers how many objects of Q are contained in imviand how many spatial relationships similar to those of Q are found in imG[16]. Therefore, if Sim-deg (Q, imi) results better or equal of the least of similarity degree in query, the image will be recovered. In the environment of visual management realized in this paper, the relational algebra operators to interrogate the database on regard of the alphanumeric information contained, and the operators Similarity Retrieval and Similarity Retrieval By Virtual Image to allow the recovery of medical images similar to the query image (one considered for perform the search) and relative to clinical cases previously examined are considered (Figure 7).
129
Similarity Retriev a I
El
Similarity Retrieval by Virtual Image Figure 7. Operators to perform similarity retrieval.
The first allows to choose the query image according to the number of practice, year of filing and number form of the clinical diary which the image referred; the second allows to start the retrieval beginning from the virtual image associated with the query image and inserted in the list of the Analyzed Abnormalities (Figure 8).
Figure 8. “Analyzed Abnormalities” list.
It can access to alphanumeric data related to an image, by click up correlated CT scan. In case, it can run the retrieval from the extracted image. Besides, it was implemented the possibility of formulate “combined queries”: information on the pictorial data will be drawn contained in the database from an alphanumeric query and vice versa. In the visual realized environment the possibility of create lists customizable of sites of interest for consultations in real-time running the browser directly from the interface was implemented.
130
4 Query results management In the integrated environment designed, the possibility to memorize the results of an interrogation in the database, in terms of the corresponding iconical representation, allows an efficient reuse of it and avoids the problem of the reformulation of queries already previously made. To save the results of a alphanumeric query (Figure 9) two procedures were implemented.
I
Figure 9. Saving a query.
If a query is saved as Query formulation, each time it is launched, the query will perform on the actual data of the examined table; otherwise, if it is saved as Quely result, the present data to the date of query storing will be saved. The icons related to queries saved like Query formulation, is inserted in the Stored Researches. The icons related to queries saved like Query result, is inserted in the Old Researches. By clicking on either icons the results of the associated operation will be visualized.
5 Conclusion and further work The environment portability, the formulation of alphanumeric and pictorial queries, the possibility to memorize the results, to perform combined queries, to perform consultations in the WWW are particularly useful tools to a medical user for the formulation of diagnosis and future developments of protocol of treatment for the examined patients.
131
Future developments will concern the study of the techmques of analysis of medical images of different type (mammography, etc.) and the improvement of the management of the multi-user and of the access from remote to different RDBMS.
6 References [ 11 S.K. Chang, “Principles of Pictorial Information System Design”, Prentice Hall, 1989. [2] S. G. Carlton e R. Mitchell, “Image segmentation using texture and gray level”, Proc. IEEE Conf Pattern Recognition and Image Processing. Troy, New York, pp.387-391,6-8 Giugno 1977. [3] G. B. Coleman, “Image segmentation by clustering”, Report 750, University of Southern California Image Processing Institute, Luglio (1 977). [4] S. Vitulano, C. Di Ruberto, M. Nappi, “Different methods to segment biomedical images”, Pattern Recognition Letters, vol. 18, (1997). [5] A. Klinger e C. R. Dyer, “Experiments on picture representation using regular decomposition’’, Computer Graphics and Image Processing 4,360-372 (1976). [6] Alan Bryant, “Recognizing Shapes in Planar Binary Images”, Pattern Recognition, vol. 22, pp. 155-164, (1989). [7] F. Gritzali and G. Papakonstantinou, “A Fast Piece Linear Approximation Algorithm”, Signal Processing, vol. 5, pp. 221-227, (1983). [8] James George Dunham, “Optimum Uniform Piece Linear Approximation of Planar Curves”, IEEE Transaction on Pattern Analysis and Machine Intelligence, PAM1 vo1.8, no 1, (1986). [9] G. Papakonstantinou, “Optimal Polygonal Approximation of Digital Curves”, Signal Processing, vol. 8, pp. 131-135,( 1985). [lo] T. H. Comer, C. E. Leiserson, R. L. Rivest., “Introduzione agli algoritmi”, cap. 35, pp.835-864, (1996). [ 113 Jia-Guu Leu, “Computing a Shape’s Moments from its Boundary, Pattern Recognition”, vol. 24, no 10, pp. 949-957, (1991). [I21 Mark H. Singer, “A General Approch to Moment Calculation for Polygons and Line Segments”, Pattern Recognition, vol. 26, no 7, pp. 1019-1028, (1993). [ 131 Bing-Cheng and Jun Shen, “Fast Computation of Moments Invariant”, Pattern Recognition, vol. 24, n”8, pp. 807-813, (1991). [ 141 Jin-Jang Leou and Wen-Hsiang Tsai, “Automatic Rotational Symmetry Determination for Shape Analisis”, Pattern Recognition, vol. 20, no 6, pp.571-582, (1987). [I51 M. Sebillo, G. Tortora, M. Tucci e G. Petraglia, “Virtual Images for Similarity Retrieval in Images Databases”, IEEE Trans. on Knowledge and Data Engineering vol. 13, no. 6, Nov.Dec. 2001, pp. 951-967. [I61 A. F. Abate, M. Nappi, G. Tortora e M. Tucci, “IME: an image management environment with content-based access”, Image and Vision Computing, vol. 17, n. 13, pp 967-980, 1999.
132
A LOW LEVEL IMAGE ANALYSIS APPROACH TO STARFISH DETECTION
v. DI GESU, D. TEGOLO Universita di Palermo Dipartimento di Matematica ed Applicazioni via Archirafi 34, 90123 Palermo, Italy {digesu,tegolo}@math.unipa.it
F. ISGRO, E. TRUCCO Heriot- Watt University School of Engineering & Physical Science Edinburgh EHI4 4AS, U.K. Cfisgro,e.trucco}@hw.ac.uk
This paper introduces a simple and efficient methodology to detect starfish in video sequences from underwater missions. The nature of the input images is characterised by a low ratio signavnoise and the presence of noisy background represented by pebbles; this makes the detection a non-trivial task. The procedure we used is a chain of several steps that starting from the extraction of the area of interest ends with the classification of the starfish. Experiments report a success rate of 96% in the detection.
1. Introduction Underwater images have been used recently for a variety of inspection tasks, in particular for military purposes as mine detection, or for the inspection of underwater pipelines, cables or platforms *, or the detection of hand-made objects 7. A number of underwater missions are for biological studies, as the inspection of underwater life. Despite the fact of the large number of such missions, and that image analysis techniques are starting to be adopted in the fish farming field ', the majority of the inspection of the video footage recorded during the mission is mostly done manually, as research trying to use image analysis techniques for biological mission is relatively new 4J0. In this paper we present a simple system for the analysis of underwater video stream for biological studies. In particular our task is the detection of starfish in each frame of the video sequence. The system presented here is the first stage of
133
a more complex system for determining the amount of starfish in a particular area of the sea-bottom. The problem we tackle in this work is non-trivial, because of a number of reasons; in particular: the low quality of underwater images bringing a very low signal to noise ratio; the different kind of possible backgrounds as starfish can be found on various classes of sea-bottoms (e.g., sand, rock); The system we present here is a chain of several modules (see Figure 1) that starting from the extraction of area of interests in the image, and has as last module a classifier to discriminate the selected areas between the two classes of starfish and non-starfish. Experiments performed on a sample of 1090 candidates report an average success rate for the detection of 96% . The paper is structured as follows. The next section gives an overview of the system. The method adopted for selection areas of interest is described in section 3. In section 4 we describe the features that we extract from the areas of interest for the classification, and section 5 briefly discusses the classification methodology used for this system. Experimental results are reported and discussed in section 6, and section 7 is left to final remarks and future developments. 2. System overview The system, depicted in Figure 1, works as a pipeline of the following four different modules: (1) Data acquisition: each single frame of the underwater video sequence (live video or recorded off-line), is read by the system for processing; (2) Extraction of areas of interest: candidate starfish are extracted from the current frame (section 3); (3) Computation of shape indicators features): for each candidate a set of features are computed. The features chosen are a set of shape descriptors (section 4). (4) Classijication: this module discriminates the candidate starfish between starfish and non-starfish, using the features extracted by the previous module (section 5).
3. Selection of areas of interest This first module detects areas of the image likely to include starfish. The objective of this module is to select everything that can be a starfish, regardless of the number of false positives that can be extracted: it will be the classification module taking care of discarding the false positives.
134
System Module
Adopted Algorithms Digitize tape
Extraction of Connected Regions Geometrical Indicator Morphological Indicator Histogram Indicator
Classification
Figure 1.
Schematic representation of the detection systems.
The method adopted is very simple. We first binaries the image using a simple adaptive threshold 2 , that computes local statics for each pixel (mean value p and standard deviation a)in a window of size 7 x 7. From the binary images all the connected components are extracted and the small size ones are filtered out using the simple X84 rejection rule 3, an efficient outlier rejection method for robust estimation. 4. Features extraction
The definition of suitable shape indicators is essential for the classification phase. In our case the shape indicators have been suggested by the morphological structure of the starfish. We identified three indicators that are combined into a feature vector to discriminate the connected components extracted between starfish and noise.
Geometric indicator The convex hull of the connected component is computed, then a geometric shape indicator, p, is defined as: p = - Qcc Qch
where a,, is the area of the connected component, and a c h represents the area of the convex hull. Small values of p will mostly represent starfish.
Morphological indicator The morphological shape indicator, 8,is computed by applying the opening morphological operator to the connected component:
135
,g=- a o c Qcc
where aOc is the area of the result obtained applying the opening to the connected component. Starfish are likely to return small values for the 8 indicator.
Histogram indicator This indicator is based on the statistics mean values, p, and variances, cr, of the histograms by row and by column of the component to be analysed:
2
2.
where q, = and q2/= Small values of this indicator characterise uniform distribution of the pixels of the component. Therefore starfish components will have small values for q.
Figure 2.
Examples of input images.
5. The classifier For the classification module we adopted a simple Bayesian classifier Let C1 and C, represent the starfish class and the non-starfish class respectively, and let 2 be a vector in the feature space. What we want is to compute the a posteriori probabilities P(xIC,) of a vector z to belong to the class Ci, and assign the vector x to the class having the largest P ( z ( C i ) . The Bayes’ formula states that
Assuming a Gaussian model for the a priori probabilities of the two classes of vectors in the features space p, 8 ,(P(C,(z)), ~ a uniform distribution for P ( z ) (i.e., P ( 2 ) = l), and assuming that P(C1) = P(C2) we get that the a priori
136
Figure 3. Examples of the components extracted from the video sequences. First row show examples of starfish. The second row shows a selection of elements from the non-starfish class.
probability P(CiIz) equalises the a posteriori probability P(zJC,).Therefore we can perform the classification comparing the two a priori probabilities. It is worth to notice that what we consider as the non-sta@sh class is not everything that is not a starfish, but only material that can be found on the seabottom together with starfish, mainly pebbles. Therefore our non-starfish class is well defined and it can be seen from Figure 4 that most of the feature vectors fall in a bounded region of the feature space. This can justify the use of a Gaussian distribution to model the non-starfish class, although the cluster formed by the features vectors for this class is not so well shaped as the one formed by the starfish class. 6. Experimental results We tested our system on different video sequences obtained as different chunks of a long video from an underwater mission. We classified manually a number of connected components from three different video sequences. A set of 394 components (197 starfish and 197 non-starfish) from the first video sequence, were used as training set in order to estimate the two Gaussian distributions. The two clusters of points in the feature space relative to the training set are shown in Figure 4. A second set of 348 components, divided in 174 starfish and 174 non-starfish, and a third set of 742 components, divided in 371 starfish and 371 non-starfish,
137
have been used as test sets. The two sets were extracted from the second and third video sequence respectively. The results are reported in Table 1. In general we can observe that the success rate in classifying elements from the starfish class is high (in the order of 98%), that is a very good results for such a simple classifier. Higher is the error in classifying elements from the non-starfish class (in the order of 7%). This is due to the fact that we included among the non-starfish some components that are small parts of a starfish (such as tentacles), and these have morphological properties similar to the starfish. A way to overcome these problem is to identify a feature discriminating between starfish and this sub-class of starfish and adopt a multistep classifier, or add this feature to the feature space if different from the three adopted. Table 1. Results of the experiments on the two test sets. %E = errors percentage, #E = number of errors, MCS = Mis-classified starfish, MCNS = Mis-classified non-starfish
Test1024b Tes21550b
#Components
%E
#E
348 (2 x 174) 742 (2 x 371)
3.7 4.8
14 36
MCS %E #E 1.72 3 8 2.1
MCNS %E #E 6.3 11 7.5 28
7. Conclusions
This paper presented a system for the detection of starfish from underwater video sequences. The system is composed by a chain of modules which ends with a Bayesian classifier, that discriminates if a area of interest extracted from the input image represents a starfish or not. Experiments perfonned on a number images (more than 1000) show that our system has a classification success rate of 96%. The system can be developed and improved in a number of ways. Most of them regard the classification module. First the classification module could implement modern and sophisticated learning techniques (e.g., support vector machines). We might also associate to each classification a confidence level (for instance a candidate is classified as a starfish with 90% confidence). Moreover we might think to extend the classification to more classes, discriminating among different species of starfish. We will need more than the three features described in section 4, and it might be useful to use more than one classifier. So far the system works on single frames. An interesting and useful extension is to count the amount of starfish in a video sequence. To this purpose we need to remember the starfish seen and counted in previous frames. Therefore a tracking module (which tracks starfish in consecutive frames) must be introduced, and
138
Figure 4. Plot of the distribution of the training set in the feature space. The dark points represent elements in the non-starfish class, the grey crosses elements in the starfish class.
several candidate algorithms have been identified. Starfish counting also requires identifying and occlusions between starfish. Acknowledgements We thank Dr. Ballaro for useful discussions. This work has been partially supported by the following projects: EIERO project under grant number EU-Contract HPRI-CT-200 1-00173: the international project for universities scientific cooperation CORI May 2001-EF2001; COST-action 283. The test data were provided by Dr. Anthony Grehan (Martin Ryan Marine Science Institute, University College, Galway, IRELAND). References R. 0. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2001. R. C. Gonzales and R. E. Woods. Digital imageprocessing. Addison Wesley, 1993. F. R. Hampel, E. M. Ronchetti, P.J. Rousseeuw, and W. A. Stahel. Robust Statistics: the approach based on inJIuencefunctions. John Wiley & Sons, 1986. D.M. Kocak, N. da Vitoria Lobo, and E.A. Widder. Computer vision techniques for quantifying, tracking, and identifying bioluminescent plankton. IEEE Journal of Oceanic Engineering, 24(1):81-95, 1999.
139
5. S. Marchand-Maillet and Y.M. Sharaiha. Binary digital image processing. Accademic Press, 1982. 6. F. Odone, E. Trucco, and A. Verri. Visual learning of weight from shape using support vector machine. In Proceedings of the British Machine Vision Conference, 1998. 7 . A. Olmos and E. Trucco. Detecting man-made objects in unconstrained subsea videos. In Proceedings of the British Machine Vision Conference, 2002. 8. A. Ortiz, M. Simo, and G. Oliver. Image sequence analysis for real-time underwater cable tracking. In Proceedings of the IEEE Worhhop on Applications of Computer Vision,pages 23&236, 2000. 9. J. Serra. Image analysis and mathematical morphology. Accademic Press, 1982. 10. M. Soriano, S. Marcos, C. Saloma, and M. Quibilan amd P. Alino. Image classification of coral reef components from underwater color video. In Proceedings of the MTSIIEEE OCEANS Conference, volume 2, pages 1008-1013,2001.
140
A COMPARISON AMONG DIFFERENT METHODS IN INFORMATION RETRIEVAL F.CANNAVALE AND VSAVONA Dipartimento di Scienze Mediche, Facoltb di Medicina. v. S. Giorgio12.09124, Cagliari, Italy E-mail:
[email protected] C.SCINTU
Dipartimento Ingegneria del Territorio, Facolta di Ingegneria p.zza d % - m i .09123, Cagliari, Italy E-mail: cescintu@unica. it
In this paper we propose a comparison among algorithms (HER and HEAT, appeared in the last three years in literature) and classical elaboration and transformation methods (such as DFT, Wavelet and Euclidean Distance), when applied to information retrieval with several multimedia1 databases, chosen under specific experimental criteria. The first database is a collection of Brodatz textures, on which we applied some linear and non-linear transformations; this choice was due to the wide popularity in the scientific environment of the above mentioned textures and in the easy way to put a visual interpretation on the obtained results due to the applied transformations. The second database contains several mammographies, characterized by both benignant and malignant lesions, while the last database is an aerophotogrammetric image of Cagliari’s district area. The choice of the last two databases was due to the high grade of difficulty of their image content.
1.
Introduction
The problem of image classification and retrieval by content, based only on the actual content in the pictorial scene, is an hard one. As it turns out, human beings are extremely good at recognizing shapes and textures independently from their position and orientation, but much less confident when programming a machine to achieve the same task; finding an automated technique to solve the pattern recognition problem by computer is a dounting task and no general solution is yet available, even if scientists carried solutions for specific problems on fixed areas through. In the scientific literature the proposed technique fall almost invariably in the category of feature extraction methods, whose key idea is to analyse the pictorial scene in order to obtain n numerical features. In this way an image is mapped from the Image (or pixel) Space into a single point in n-dimensional Feature Space, where traditional -and exact- spatial access methods may be used to retrieve points (i.e. images) that are close to a query image. This type of user
141
interaction paradigm is called “query by example”, because the user supplies an example (the query image) and the system looks for images that are “near” in some sense. In order to place the present work into some perspective, we now briefly review the considered well known techniques. Given a discrete signal f(t) in the Time Space, we use a linear transformation which associates the amplitude of the signal at the considered time te[O,T], where T is the total duration of the signal, with its energy value in the Energy Space, where we obtain as many features-energy points as the values o f t in the [O,T] interval. We define the Euclidean Distance between two signals both in the FeaturesEnergy and in the Time Space as the difference of the areas between the graphs of the signals and the x axis in the considered interval, i.e. the difference of the corresponding defined integrals. We have to notice that the difference between two integrals is not absolutely acceptable as comparison of signals, because there are infinite signals, with completely different shape, which are characterized by the same Euclidean Distance both in the Features and in the Time Space. The second transformation used in this work is the DFT (Discrete Fourier Transform), that under a certain approximation, may be considered a linear transformation that maps a signal f(t), defined in the Time Space, in the Frequency Space. The comparison between two signals fi(t) and fi(t) defined in the Time Space is realized by calculating the Euclidean Distance among its representations using a stated harmonic content for each of the considered signals [ 11. A further transformation token into consideration in the present work is the wavelet decompositon; if Fourier analysis consists of breaking up a signal into sine waves of various frequencies, wavelet analysis is the breaking up of a signal into shifted and scaled versions of the original (or mother) wavelet. A wavelet is a waveform of effectively limited duration that has an average value of zero [2]. One major advantage afforded by wavelets is the ability to perform local analysis, i.e. to analyze a localized area of a larger signal. Wavelet analysis is capable of revealing aspects of data that other signal analysis techniques miss, aspects like trends, breakdown points, discontinuities in higher derivatives, and self-similarity. Furthermore, because it affords a different view of data than those presented by traditional techniques, wavelet analysis allows to compress or de-noise a signal without appreciable degradation. Indeed wavelets have already proven
142
themselves to be an useful tool in the signal processing field and continue to enjoy a burgeoning popularity today. The last methodology we present in this work is the non-linear transformation HER (Hierarchcal Entropy-based Representation) [3][4]. Given a signal f(t) in the Time Space, we can represent it by following these steps: first we choose hierarchically the absolute maxima of the considered signal, then for each of the maxima we evaluate the corresponding Gaussian, whose height is the value of the energy related with the considered maximuq and the associated entropy, whose measure is expressed by the following relationship:
si = [Ei- o i , E i +oili Ei
where Eiis the energy value for the considered absolute maximum and 0,is its standard deviation. We transform a signal f(t) from the Time Space into the Entropy Space by adopting the following criterion: we considered the entropy values associated with the absolute maxima following their hierarchical extraction order, then we place in the Distance/Entropy Space these related values of the entropy, with the first maximum at the place x=O. HER is characterized by some interesting properties: it is invariant with respect to the translation of the signal (i.e. to the amplitude, to the time-shift and to the initial phase-shift). HEAT introduces the linear transformation: that allows us to extent HER transforms from 1D signals to images. This paper is organized as follows: Section 2 gives and discusses experimental results; Section 3 concludes our study.
2.
Experiments and Results
In experiments we focused our attention on the results due to the different image processing methods when applied to three diverse databases; the choice of the above mentioned databases aimed to put in evidence the different characteristics of the proposed methods. For each of the database the experiments aimed at whatever retrieval results were in good agreement with objective and/or human similarity judgement, and in which measure.
143
The first experimental test was focused to study the behaviom of the methods with respect to linear and non-linear transformation of the considered signals. The first database was a set of 256 signals, obtained from 16 different Brodatz textures[5]. For each of the textures we selected a 32 x 32 pixel area, that produces a 1D signal of 1024 pixel length when HEAT is applied; these steps generate a set of 16 One-Dimensional signals.
Figure 1 Selection of tiles obtained from the Brodatz textures augmented with transformed versions of 16 original signals (first database). We applied several linear and non-linear transformation to the set of the ID signals: the former were several amplitude shifts, translations, mirror-reflections, rotations of integer multiples of n/2, the latter different histogram stretchings, noisy versions with different amounts of Gaussian noise addition. As easily predictable, the linear transformations produced a power spectrum variations of the Fourier transforms, while non-linear transformations involved several changing in the high frequency components of the Fourier spectrum. The obtained results confirmed the expected effects of the applied transformations, as shown in Figure 2; it is noteworthy that the performance of the DFT changes when we augment the number of the reconstruction components of the query signal: the bigger the number of the components, the worse the result.
144
14
-#False
Alarm Position
~
12
-
10
-
- - . -# -MatchesFirst 15 retrieved
CA
a
3 *
8 -
O
6 -
L1
2
42 -
DFT Harmonics
0
I
17
I
34
I
I
500
1000
Figure 2. Graph of the behaviour of the DFT-' considering different values of the reconstruction harmonics. Figure 3 shows a signal belonging to the first database (bark element) and the same signal when reconstructed using 17 harmonics in the Inverse Fourier reconstruction.
Figure 3 - Original Bark Brodatz signal (top) and the same signal after the DFT' obtained with the first 17 harmonics (down)
145
We can notice that the inverse DFT transform produces a set of signals almost overlapping when applied to the set of 16 relevant matches (Le. to one of the transformed version of the image query, including the query too) using a restricted number of components. These results are also clearly shown in Table 1, which includes the comparison among the proposed methods. Table 1 Comparison among methods for Brodatz Bark Query
HER #Correct Tiles 15 12 1°#False Alarm position 4 #False Dismissal 0.969 Normalized Recall
Euclidean Distance
Fourier
Wavelet
11 5 0.927
12 4 0.969
12 4 0.969
The second database is a collection of mammographies, which includes 49 types of breast cancers (mass/calcification/microcalcification)of both benignant (24 images) and malignant (25 images) nature; the diagnosis of each of the considered radiological images is confirmed by bioptic exams [6]. For each of the mammographies we selected a 32 pixels square area, that produces a set of 49 One-Dimensional signal of 1024 pixels length.
Figure 4 Collection of 49 mammographies of benignant and malignant breast cancers (second database).
146
The graphs of the results due to the different methods are shown in Figure 5a and Figure 5b, where a benignant and a malignant query is adopted respectively. Retrieval trend with Benignant Query 25 1
-THEORICAL
- - - HER 20-
-
D.EUCLIDEA and FOURIER WAVELET
v)
15-
B
.-t
j
lo-
/ /
5-
Tiles
Figure 5a Results due to the different methods when a benignant query is applied.
Retrieval trend with Malignant Query
25
-THEORICAL HER -D.EUCLID.
---
20
-
G
5 15 -
/ -
_ _ _ _ _ FOURIER
WAVELET
f.-
e
g
lo-
5 Tiles
0-
,
I
Figure 5b Results due to the different methods when a malignant query is applied.
147
The wide range of variability of the signals belonging to this database and their non- periodic nature are the most effective factors in the qualitative and quantitative response of the considered methods [7][8] [9] [lo]. In Figure 5a and 5b we can observe that HER gives again results very close to the theoretical response, while both DFT and Wavelet behaviours are similar each other but less perfonning with respect to HER. The thud and last database is a portion of an aerophotogrammetry of Cagliari’s district area, acquired at an altitude of 10.000 meters; the image is characterized by rural communication road, extended plantations (trees and horticultural) and farms (Figure 6 ) .
Figure 6 Aerophotogrammetry of Cagliari’s district area (third database). The aerophotogrammetric image was divided into 6400 portions (i.e. signals), each of them characterized by a length of 100 pixels. The results due to the application of all the considered methods is shown in Figure 7, where a tree query is adopted; a comparison among the results obtained in the second and the third databases reveals that the trend of the different methods are qualitative similar.
148
'.-
I .I
100
-
90
-
80
-
70
-
60-
Y
50?i
Y L.
2
40I
0
20
40
60
80
100
120
Figure 7 Results due to the different methods with a tree query.
3.
Conclusions
In this work we have faced the problem of image retrieval efficiency; four different methods were performed when applied on three diverse databases. Experimental results show the robustness both of HER and HEAT, also stated in some previous works. We were also able to prove that the above mentioned methods give results that can be compared to DFT, Wavelet and Euclidean Distance, despite HER and HEAT intrinsic nature of non-linearity. Textures, medical and aerophotogrammetric images has been considered as databases for our experiments. The obtained results need some further considerations; the use of the Brodatz database allowed us to show that HER, DFT and wavelet behaviour is quite similar. The comparison among methods gives more interesting results when applied to the other proposed databases; in fact if the behaviour of HER is generally the best one, DFT, wavelet and Euclidean Distance behaviour is worse and almost invariable with respect to the considered database; among the last three methods we are not able to establish which of them works generally better. Anyway the Euclidean Distance is the less performing among the considered methods.
149
The amount of the information considered in the process of information retrieval was also taken into account; it is important to put in evidence that HER gives its appreciable results considering only the 20% of the whole information content of the signals of each of the databases. Aknowledgements The authors would like to thank Marco Cabras and Maria Giuseppina Carta of the Provincia di Cagliari for their concerning in obtaining the permission to use some portions of the digital images of Cagliari’s district area. References Brandt s:, Laaksonen J., Oja E., Statistical Shape features in Contentbased Image Retrieval, Proc. of ICPR, Barcelona, Spain, September 2000 Teolis A., Computational signal processing with wavelets, 1998, 2 Birkhauser. Casanova A., Fraschini M., Vitulano S., Hierarchical Entropy Approach 3 for image and signals Retrieval, Proc. FSKDO2, Singapore, L.Wang et al. Editors. Distasi R., Nappi M., Tucci M., Vitulano S., CONTEXT: A technique fur 4 Image retrieval Integrating CoNtour and TEXture Information, Proc. of ICIAP 2001, Palermo 224:229,-Italy, IEEE Comp. SOC. Brodatz P., Textures, A Photographic Album of Artists and Designers, 5 Dover Publications, New York, 1966. Available in a single .tar file :ftp://ftp.cps .msu.edu/pub/prip/textures/ Suckling J., Parker J., Dance D.R. et al., The mammographic image 6 analysis society digital mammogram database, in Digital Mammography, Gale, Astley, Cairns Eds, pp 375-378, Elsevier, Amsterdam, 1994 Issam El Naqa, Yongyi Yang, et alt., Content-based image retrieval fur 7 digital mammography, ICIP 2002. Acharyya M., Kundu M.K., Wavelet-based Texture Segmentation of 8 remotely Sensed Images, Proc. of ICIAP 2001, Palermo, 69:74, IEEE Computer Society. Wang J.Z., Wiederhold G., Firshein O., Wie S.X., Content-based image 9 indexing and searching using Daubechies wavelets, Int. Jour. Digit. Libr., 1997, 1:311-328 Springer Verlag. 10 Chang R.F., Kuo W.J. Tsai H.C., Image Retrieval on Uncompressed and Compressed Domain, ICIP 2000. 1
150
HER: APPLICATION ON INFORMATION RETRIEVAL A. CASANOVA AND M. FRASCHINI Dipartimento di Scienze Mediche Internistiche, Facoltci di Medicina e Chirurgia Via Sun Giorgio 12, Cagliari, 09124, Italia
E-mail: {Casanova,fraschini}@pacs.unica.it This paper presents an overview and some remarks of an indexing technique (HER) for images retrieval based on contour and texture data, and shows last results obtained with Brodatz texture, aerial photographs and medical image datasets. The method encode 2-dimensional visual signal into a 1-d form in order to obtain an effectivetechnique for content-based image indexing. This representation is well-suited to both pattern recognition and image retrieval tasks. Our experimental results have also shown that the hierarchical entropybased system approach can improve the detection of suspicious areas and the
diagnostic accuracy. 1.
Introduction
The Information Retrieval field has generated additional interest in methods and tools for multimedia database management, analysis and communication. Multimedia computing systems are widely used for everyday tasks and in particular, image database represents the most common type of applications; it is important to extend the capabilities of such application field by developing multimedia database systems based on retrieval by content. Searching for an image in a database is a complex issue expecially if we restrict the queries to approximate or similarity matches. A variety of techniques and working prototypes for content-based image indexing systems exist in literature. This paper presents an overview and some remarks of an indexing technique, Hierarchical Entropy-based Representation for images retrieval based on contour and texture data, and shows last results obtained with Brodatz texture, aerial photographs and medical image datasets. Our method has shown to be effective on retrieve images in all cases under investigation and it has invariance and robustness properties that make it attractive for incorporation into larger systems. We also think, that using this method in medical databases as the basis for a computer-aided detection (CAD) system could be a relatively new and intriguing
151
idea; our first experimental results have shown that it is effective considering the most objective indexes to estimate the performance of diagnosis results (sensitivity, specificity, positive predictive value, and negative predictive value). The paper is organized as follows: Section 2 shortly reassumes how our method works and some of its properties; Section 3 shows a comparison with Wavelet based method and Section 4 describes the results obtained from experimentation on several image datasets. 2.
HER, the Method
The main task of pattern recognition is to compare a measured image in an unknown position to different prototypes. We get a direct brute force solution to this problem if we compare the prototypes in all possible positions and extract the optimal coincidence. If we use Euclidean distance for comparison, we end up calculating the maximum of a high order correlation function, which is a rather time consuming operation. The time required grows exponentially with the number of parameters describing the coordinate transformations induced by the motion. A more elegant way to solve the problem involves the use of mappings that are able to extract position-invariant intrinsic features of the object. The method of Fourier descriptors is known to work reasonably well for the recognition of object contours independent of position, orientation and size. There are works that show the results of the Fourier approximation of polygons for different numbers of Fourier coefficients. As it turns out, it is possible to acheve a good approximation of a polygon by using 15-30 coefficients. Even with few coefficients, the Fourier series obtain an acceptable approximation to the original curve because the low frequencies contain the most significant information about the object. Other techniques recur to the minimization of the contour’s moments with respect to an orthogonal coordinate system centered in the object’s center. Generally, only the first two moments are used because the higher-order moments add little information content. However, this approach does not appear to be particularly effective: indeed, it requires a great amount of information and long computing times. HER, Hierarchcal Entropy-based Representation, is a time-series indexing system useful for efficient retrieval by content. This model is employed in order to describe a 1-D signal by means of a few coefficients. The method reconstructs the energy distribution of the given signal along the independent variable axis selecting the most relevant local maxima based on the area, and therefore the energy, associated with each maximum.
152
Considering a signal f(') in the time space, HER represents the signal in the entropy space following these steps: Selection of first absolute maxima Consider the maxima to be the midpoint of a Gaussian distribution Compute its relative entropy Go back to first step until we have used a predefined number M of maxima or when the fraction of the total energy remaining in the signal falls below a given threshold In the entropy space the signal is represented by means the sequence of the extracted maxima, located by the distance from first maximum (largest). The distance between two given signal f, (') e f2(t) is obtained by means the comparison of the correspondence non-linear HER representations. HER is a good candidate for content based retrieval whenever the information can be accurately represented by a 1-D signal.
2.1. Her for Contours The proposed method HER has been applied to analyze and classify closed contours of objects and regions of a pictorial scene. In order to obtain a 1-d time series from 2-d contour data the approach is to scan the contour pixel by pixel. One disadvantage is that the representation will have one point for each contour pixels; therefore, the data size can get large for images at high resolution. The advantage is that any contour can be represented in a lossless, reversible way. The contour is scanned clockwise starting, from its top left pixel, recording the distance between each pixel and the center of mass. The contour is sampled pixel by pixel and this yields a periodic time series with as many points as there are pixels in the object contour. The frame of reference is a coordinate system centered in the barycentre G of the object, computed as follows:
where xi and yi are the coordinates of a pixel Pi belonging to the contour with k pixels. After that, the d4 distance is computed between the barycentre G and the k pixels of the contour. In this way is possible to obtain a representation y(s) of the contour in curvilinear coordinates. Such a representation is univocal, and it is possible to reconstruct the original 2-D contour shape without loss of information. Applying the HER method is possible to describe the y(s) representation by means of a few coefficient. The discrete form of the Fourier Transform is also often used as a shape descriptor. It has several nice well-known mathematical
153
properties, most importantly linearity. As shown by Zahn and Roskies, an adequate approximation of a polygon requires 15-30 coefficients. When the object has highly irregular or jagged contours, even 30 coefficients are not enough to characterize the shape adequately for accurate reconstruction. 2.2. Her for Textures HER has been also applied to analyze and classify 2-d texture information. The main idea of the tool is to transform an image from 2-D signal to 1-D signal. In order to obtain a 1-d time series from 2-d texture data the approach is to follow a spiral path in the texture element. This choice has the advantages of simplicity and instant computability. Applying the HER method is possible to obtain a representation of the 1-D with by means few coefficients. In the case of textures, the spiral method used to obtain a 1-d dataset is sensitive to rotation and reflection, so that exact theoretical invariance - as opposed to practical robustness -is not possible. However, by rotating and reflecting the texture into a canonical form before applying the spiral method, it is possible to make the whole process invariant. This method is invariant to some type of image transformation: contrast scaling, luminance shifting and translation.
3.
Comparison with a Wavelet Based Method
As said above, there are several methods available for image retrieval. The methods based on the multiresolution formulation of wavelet transforms are among the most reliable and robust. A wavelet is a waveform of effectively limited duration that has an average value of zero. One of the most advantages afforded by wavelets is the ability to perform local analysis. The comparison was aimed at assessing the efficiency and effectiveness of the retrieval. In particular, efficiency is related to the computational requirements and to the index size, while effectiveness has to do with the quality of the answer set. As for the quality of the retrieval, wavelet-based approaches are very robust and tolerate even the addition of Gaussian noise to the query texture without too negative consequences. In HER, as few as 4 or 5 maxima are usually enough to characterize a texture in an effective way. Indeed, having too many maxima in the index does not improve on the performance. As a consequence, the typical size of HER indices is rather small. On the other hand, a typical wavelet-based index requires about a hundred coefficients to work with good accuracy.
154 Summing up, HER’S performance in terms of quality are very close to those of methods based on the wavelet transform, but it is much less costly in terms of computing resources and index size. Additionally, as stated above, this representation can be effectively used for different kinds of data; in particular contours and textures. 4.
Experimental Results
Several experiments have been performed in order to assess the validity of the proposed method. For these tests, we focused on aerial images dataset, Brodatz set of textures and, furthermore, on one medical case study containing mammographies from the MIAS Database. In all these cases texture is significant enough that it can be tentatively used alone for indexing. The testing dataset with aerial images was constructed using aerial photographs acquired in nearby Cagliari regions. The dataset include several images of different kind of soil, vegetables, roads, river and buildings. Figure 1 shows a portion of the area under investigation, with a subdivion in tiles (10 x 10 pixels). We tried to investigate the use of texture as a visual primitive to search and retrieve aerial images.
Figure 1. Partitioning process of aerial photographs image.
155
The results obtained demonstrate (Figure 2) that our method can be used to select a large number of geographically salient features as vegetation patterns, parking lots, and building developments.
300000
250000 u)
8 2 *
=
200000 150000
u)
100000 50000
0
1000
2000
3000
4000
5000
6000 7000
Rank
Figure 2. Distance from query tile (tree).
The testing dataset from Brodatz textures include several transformed versions in order to test the robustness of retrieval. Using one of the original textures as the query and looked at the returned results we found several matches at distance 0 in feature space from the query texture.
150000
8
100000
C
m
c)
.s 0
50000
0 0
10
20
30
40
50 Rank
Figure 3. Distance from query tile (Bark.0000).
60
70
80
90
100
156
In figure 3 is shown the distances from the query tile (Bark.OOOO) to the closest 100 matches in the testing dataset. First bin represents the first nine matches at distance 0 each one belonging to the same texture type. About medical cases, the first database used was the MIAS Mammographic Database, digitised at 50 micron pixel edge and reduced to 200 micron pixel edge, so that every image is 1024 pixels x 1024 pixels with 8 bits. The MIAS Database included 330 images, arranged in pairs of films, where each pair represents the left and the right mammograms of a single patient, with the follows details: MIAS database reference number, character of background tissue, class of abnormality present, severity of abnormality, image coordinates of centre of abnormality and radius of a circle enclosing the abnormality. The testing data set includes 67 benign and 54 malignant mammograms. The lesion was labelled by the reference included in the Database. Table 1 illustrates the results obtained from the experimentation with different query tiles. The table is structured in this way: the “#FA /XX’ columns show the number of false alarms in the XX-elements answer set; the “1st FA” column contains the answer set rank of the first false alarm; the last one column “Class” represents the class of abnormality. It is important to note as in all cases we have not found any false alarm in the first 10 retrieved mammogram tiles. Table 1. Tiles query retrieval
Tile query
1st FA
I04
#FA N O
#FA/2O
#FA N O
Class
15
I
5
M
25
I2
5
9
B
99
16
3
9
M
34
I1
3
9
B
In Figure 4 we show a graphical representation of the first 30 tiles retrieved with a malignant query (tile #104). The rm curve represents the malignant retrieval trend while the rb is the false alarm curve. The db and dm curves represent, respectively, the malignant and benign distribution trend in the testing dataset.
1 57
25,W
-01 > .!? h
20,m15.00-
al 10.00 -
0
20
10
# tiles
Figure 4. Retrieval trend with malignant query tile
5.
Conclusions
The main idea we have proposed with the HER method is to consider the maxima as more important feature of a signal. The importance of the maxima is not only on their position but rather on their “mutual position” inside the signal. HER is a hierarchical method who select maxima considering the relative value and reciprocal distances. The signal is represented by means a vector containing couples of elements, where the former is the distance of the maxima from the first one and the latter represent the entropy associated. HER is a non linear transform who present several nice invariance: translation, rotation, reflection, luminance shifting and scale. Experimentation using contour signal has shown encouraging results. Such a results are strictly connected with the procedure we followed to transform a shape into a 1-d signal. HER for contour allows to obtain important information on the number and on the shape of the elongations of the object under investigation. The sampling theorem clarifies the differences between the proposed method and Fourier descriptors. Also the comparison with momentbased technique has shown the validity of the HER method. Some consideration can be done about the results obtained using the Brodatz dataset of textures: the transformations applied on the tiles (rotation, reflection, luminance shifting and contrast shifting) does not modify the low
158
frequencies of the signal, this allows the Fourier Transform to obtain better results using only few coefficients. However, such a results are not better respect to the once obtained with HER and the wavelets. One of the most importance propriety of the HER method is relative to its low time consuming respect to all other techniques take into account. Furthermore all experimentation has been conducted using only the 30% of whole signal information. In conclusion we can affirm the experimentation with HER has shown results comparable (and sometimes better) with Fourier Transform and Wavelets. These lunds of results are confirmed with the last experiments on medical images (mammography database). Considering the results obtained on medical images we thlnk our method could be used as the basis for a computer-aided detection (CAD) system. Finding similar images, with the aim to attract radiologist attention to possible lesion sites, sure is important way to provide aid during clinical practice. The importance of a content based image retrieval system in computer aided detection is to help radiologists when they need reference cases to interpret an image under analysis. Our future objective is on the development of an efficient database methodology for retrieving patterns in medical images representing pathological processes.
References 1. Casanova A., Fraschmi M., Vitulano S., Hierarchical Entropy Approach for image and signals Retrieval, Proc. FSKD02, Singapore, L.Wang et al. Editors. 2. H Distasi R., Nappi M., Tucci M., Vitulano S., CONTEXT: A techniquefor Image retrieval Integrating CONtour and TEXture Information, Proc. of ICIAP 2001, Palermo-Italy, IEEE Comp. SOC. 3. Brodatz P., Textures, A Photographic Album of Artists and Designers, Dover Publications, New York, 1966. Available in a single .tar file: ftp :I/ftp .cps .msu.edu/pub/prip/textures/ . 4. Issam El Naqa, Yongyi Yang, et alt., Content-based image retrieval for digital mammography, ICIP 2002.
159
ISSUES IN IMAGE UNDERSTANDING *
VITO DI GESU D M A University of Palermo, Italy IEF University oj Paris Sud, ORSAY, France E-mail:
[email protected]
Aim of the paper is to address some fundamental issues and view-points about machine vision systems. Among them image understanding is one of more challenging. Even in the case of the human vision its meaning is ambiguous, it depends on the context and goals to be achieved. Here, a pragmatic view will be considered, by addressing the discussion on the algorithmic aspects of the artificial vision and its applications.
1. Visual Science Visual science is considered one of the most important field of investigation in perception studies. One of reasons is that eyes collect most of the environment information and this makes very complex the related computation. Moreover, eyes interacts with other perceptive senses (e.g. hearing, touch, smell), and this interaction is not fully understood. Mental models, stored somewhere in the brain, are perhaps used to elaborate all information that flow from our senses to the brain. One of the results of this process is an update of our mental models by means of a sort of feedback loop. This scenario shows that the understanding of a visual scene surrounding us is a challenging problem. The observation of visual forms plays a considerable role in the majority of human activities. For example in our daily life, we stop the car at the red t r a f i c light, select ripe tomatoes, discarding the bad ones, read a newspaper to update our knowledge. The previous three examples are related to three different levels of understanding. In the first example an instinctive action is performed as a 'This work has been partly supported by the european action COST-283 and by the French ministry of education.
160
Figure 1.
Axial slices through five regions of activity of the human brain.
Figure 2.
Hardware based on bio-chips.
consequence of a visual stimulus. The second example concerns with conscious decision-making activity; where, attentive mechanisms are alerted by the visual task: recognize the color of the tomato and decide if it i s rape. In this case the understanding imply training and learning procedures. The third example involves an higher level of understanding. In fact, the reading of a sequence of typed words may produce or remind us concepts; for example images and emotions. Reading may generate mental forms that are different, depending on the reader culture, education, and past experiences. At this point we may argue that visual processes imply different degrees of complexity in the elaboration of the information. Visual perception has been an interesting investigation topic since the beginning of the human history, because of its presence in most of the human activities. It can be used for communication, decoration and ritual
161
purposes. For example, scenes of hunting have been represented by graffiti on walls prehistoric caves. Graffiti can be considered the first example of visual language that uses an iconic technique to pass on history. They also suggest us how external world was internalized by prehistoric men. During centuries, painters and sculptors have discovered most of color combination rules and spatial geometry relationships, allowing them to generate real scene representation, intriguing imaginary landscapes, and visual paradoxes. This evolution was due not only to the study of our surrounding environment; it was also stimulated from the emergency of more and more internal concepts and utilities. In fact, with the beginning of writing visual representation became an independent form of human expression. Since 4000 B.C., visual information has been processed by Babilon and Assir astronomers to generate sky maps representing and predicting planets and stars trajectories. Today astronomers use computer algorithms to analyze very large sky images at different frequency ranges to infer galaxy models and to predict the evolution of our universe. Physicians perform most of diagnoses by means of biomedical images and signals. Intelligent computer algorithms have been implemented to perform automatic analysis of MRI (Magnetic Resonance Imaging) and CTA (Computerized Tomography Analysis). Here, the intelligence stays for the ability to retrieve useful information by guessing the most likely disease. Visual science has been motivated by all previously outlined arguments; it is aimed to understand how we see and how we interpret scenes surrounding us by starting from the visual information that is collected by eyes and processed by our brain One of the visual science goals is the design and the realization of artificial visual systems closer and closer to human being. Recent advances in the inspection of human brain and future technology will allows us both to explore our physical brain in deep (see Figure 1))and to design artificial visual systems, behavior of which will be closer and closer to the human being (see Figure 2). However, advances in technology will be not sufficient to realize such advanced artificial visual systems, as a matter of fact that their design would need of a perfect knowledge of our visual system (from the eyes to the brain). 'v2.
162
2. The origin of artificial vision
The advent of digital computers has determined the development of the computer vision. The beginning of the computer vision can be dated around 1940 when Cybernetics started. In that period the physicist Nobert Wiener and the physician Arturo Rosenblueth promoted, at the Harward Medical School, meetings between young researchers, to debate interdisciplinary scientific topics. The guide-line of those meetings was the formalization of biological systems by including the human behavior 5. The program was ambitious, and the results not always has people hoped; however the advent of the cybernetics marks the beginning of a new scientific approach to natural science. Physicists, mathematician, neurophysiologists, psychologists, and physician cooperate to cover all aspects of human knowledge. The results of such integration was not only mere exchange of information coming from different culture; it contributed to improve the human thought. In this framework, Frank Rosemblat introduced, in a collection of papers an books the concept of perceptron. The perceptron is a multi-layers machine, that store a set visual patterns, collected by an artificial retina, that are used as a training set for an automaton; features and weights learned during the training set are then used to recognize unknown pattern. The intuitive idea is that each partial spatial predicates, recognized by the perceptron, should provides evidences about whether a given pattern belongs to a universe of patterns. The dream of building machines, able to recognize any patterns after a suitable training, was shattered either because of the intrinsic structural complexity of the natural visual system (even in the case of simple animals like frogs), or because of the insufficient technological development of that time. Nevertheless, the idea of perceptron must be considered the first paradigm of parallel machine vision system, defined as the mutual interchange of information (data and instructions) among a set of cooperating processing units. For example, it suggested the architecture of interesting machine vision systems 637,8,
9110i11.
3. The pragmatic paradigm The pragmatic paradigm of artificial vision is goal oriented, and the goal is to build machines, that perform visual tasks in a specific environment, according to user-requirements. It follows that the machine design can be based on models that are not necessarily suggested by natural visual sys-
163
tems. Here, the choice of the visual model is usually based on optimization criteria. Even if pragmatic approach has been developed under the stimulus of practical requirements, it has contributed also to understand better some vision mechanisms. For example, graph theoretical algorithms have been successfully applied to recognize Gelstat clusters according to the human perception l2?l3. The relation between graphs and natural grouping of patterns could be grounded on the fact that our neural system could be seen as is a very dense multi-graph with billions of billions of paths. Of course, the question is still open and probably will never be solved. Artificial visual systems can be described throughout several layers of increasing abstraction; each one corresponding to a set of iterated transformations. The general purpose is to reach a given goal, starting from an input scene, X, represented, for example as an array of 2D pixels or 3D voxels defined on a set of gray levels G. The computation paradigm follows four phases (see Figure 3): Low level vision, Intermediate level vision, High
level vision, Interpretation. Note that these steps don’t operate as a simple pipeline process; they may interact through semantic networks and mechanism of control based on feedback. For example, parameters and operators, used in the low level phase, can be modified if the result is inconsistent with an internal model used during the interpretation phase. The logical sequence of the vision phases is weakly related with natural vision processes; in the following a pragmatic approach is considered, where the implementation of each visual procedure is performed by means of mathematical and physical principles, which may have or not neuro-physiological counterpart. The pragmatic approach has achieved promising and useful results in many application fields. Among them robotics vision 14, face expression analysis 15, document analysis 16, medical imaging 17, and pictorial database
3.1. Low Level Vision Here, vision operators are applied point-wise and in neighborhood spatial domains to perform geometric and intensity transformations. Examples are digital linear and not-linear filters, histogram equalization with cumulative histogram 19, mathematical morphology ’O. Figures4a,b show examples of morphological erosion using man-definitions ”. The purpose of this stage is to perform a preprocessing of the input image that reduces the effect
164
Early Vis'ton
Coanitive Vision
Figure 3.
T h e classical paradigm of an artificial vision system.
Figure 4. T h e input image (a); its erosion (b)
of random noise, perform sharpening, and detects structural and shape features. A second goal of this phase is the selection of area of interests inside the scene, where to perform more complex analysis. The Discrete Symmetry Transform ( D S T ) ,as defined in 22,23, is an example of attentive operators that extract area of interest based on the gray levels circular symmetry around each pixel (see Figures 5a,b,c,d). The definition of interesting area depends on the problem and it is based on information theory methods.
165
Figure 5. The attentive operator D S T : a) the input image; b) the application of the D S T ; c) the selection of point of interest; c) the selection of eyes.
Low level vision operators can be often directly implemented in artificial retinas both to reduce the cost of the whole computation and to enhance their performance. So named active retinas have been included in active visual systems 24,25 Active visual systems have mechanisms that can actively control camera parameters such as orientation, focus, zoom, aperture and vergence in response to the requirements of the task and external stimuli. More broadly, active vision encompasses attention, selectively sensing in space, resolution and time, whether it is achieved by modifying physical camera parameters or the way data is processed after leaving the camera. The tight coupling between perception and action proposed in the active vision paradigm does not end with camera movements. The processing is tied closely with the activities it supports (navigation, manipulation, signaling danger or opportunity, etc.) allowing simplified control algorithms and scene representations, quick response time, and increased success a t supporting the goals of activities. Active vision has higher feasibility and performanqe but requires lower cost. The application of active vision facilitate certain tasks that would be impossible using passive vision. The improvement of performance can be measured in terms of reliability, repeatability, speed and efficiency in performing specific tasks as well as the generality of the kinds of tasks performed by the system. In active vision, the foveated sensor coupled with
166
a position system can replace a higher resolution sensor array. Moreover, less data needs to be acquired and processed will significantly save the hardware costs. 26,27.
3.2. Intermediate Level Vision Neurologists argue that There are quite a number of visual analysis carried out by the brain that is categorized as intermediate level vision. Included are our ability to identify objects when they undergo various transfomations, when they are partially occluded, to perceive them the same when they undergo change in size and perspective, to put them into categories, to learn to recognize new objects upon repeated encounter, and to select objects in the visual scene by looking at them or reaching for them 28. In the case of artificial visual systems the intermediate level vision computation is performed on selected area of interest. The task is the extraction of features that bring shape information. Features can of geometrical nature (borders, blobs, edges, lines, corners, global symmetries, etc.) or computed on pixel intensity values (regions, segmentation). These features are stored a t an intermediate level of abstraction. Note that, such features are free of domain information - they are not specifically objects or entities of the domain of understanding, but they contain spatial and other information. It is the spatial/geometric (and other) information that can be analyzed in terms of the domain in order to interpret the images. Yet, as in the natural case, all features involved must by invariant for geometrical and topological transformations. This property is not always satisfied in the real applications. Geometrical features. Canny’s edge detectorz9 is one of the most robust and is widely used in literature. The absolute value of the derivative of a Gaussian is a good approximation to one member of his family of filters. Ruzon and Tomasi 30 introduced an edge detector that uses also color information. The RGB components are combined following two different strategies: a) the edges of the three components are computed after the application of a gradient operator then the fusion of the three edge-images is performed; the edges are detected on the image obtained after fusing the gradient-images. 2D and 3D modelling 3 1 , Snakes computation 32 is another example of technique to retrieve object contours. They are based on an elastic modelling of a continuous, flexible open (closed) parametric curve, T ( s ) with s E [0, 11, is imposed upon and matched to an image. Borders follow the evolution of the dynamic
167
system describing the snake under constraints that are imposed by the image features. The solution is funded by an iterative procedure and it corresponds to the minimization of the system energy. The algorithm is based on the evolution of the dynamic system:
The first term of this equation represents internal forces, where w1 is the elasticity and 202 is the stiffness. The second term is the external forces; it depends in image features. The solution of this equation is funded by an iterative procedure and it corresponds to the minimization of the system energy:
I -Iz
lw12
where: Eint(-i;)(s)) = as and Efeature = -(V2P')2. A global solution is usually not easily founded and piece wise solutions are searched. The energy of each snake piece is minimized (the ends are pulled to the true contour and the snake growing process is repeated. A global solution is usually not easily founded and piece wise solutions are searched. Numerical solutions are based on finite differences methods and dynamic programming 33 (see Figure 6). Methods to compute global object symmetries are considered a t this level of the analysis, the Smoothed Local Symmetry 35 has been introduced to retrieval a global symmetry (if it exists) from the local curvature of contours. In the mathematical background to extract object skewed symmetries under scaled Euclidean, f i n e and projective images transformation is proposed. An algorithm for back-projection is given and the case of noncoplanarity is studied. The authors introduce also the concept of invariant signature for matching problems (see Figure 7). A third example of intermediated level operation is the DCADdecomposition 39 that is an extension of the Cylindrical Algebraic Decomposition 40. The DCAD decomposes a connected component of a binary input image by digital straight paths that are parallel to the Y (X)-axis and cross the internal digital borders of the component where they correspond to concavities, convexities, and bends. F'rom this construction a connectivity graph, CG, is derived as a new representation of the input connected 36137938
168
Figure 6.
(a) input image; (b) edges detection; (c) snakes computation.
components. The CG allow us to study topological properties and visibility relation among all components in the image. In this phase both topological information and structural relations need to be represented by high level data structure where full order (sequences) and/or partial order define relation between image components. Examples of partial ordered data structure are the connectivity graphs and trees 41t39,
42
Grouping and segmentation. The automatic perception and description of a scene is the result of a complex sequence of image-transformations starting from an input image. All transformations are performed by vision operators that are embedded in a vision loop representing their flow (from the low to
169
Figure 7. (a) Cipolla's skewed symmetries detection; (b) examples of global symmetry detection.
the high level vision). Within the vision loop, the segmentation of images in homogeneous components is one of the most important phase 43. For example, it plays a relevant role in the selection of parts an object on which to concentrate further analysis. Therefore, the accuracy of the segmentation step may influence the performance of the whole recognition procedure 44. More formally, we may associate to the input digital image X a weighted undirected graph G = (X,E , b ) , nodes of which are all pixels, arcs, E , depend on the digital connectivity (e.g. 4 or 8), and the function b : E -+ [0,1] is the arc-weight that could be a normalized distance between pixels. The segmentation is then defined as an equivalence relation, N ,that determine the graph partition:
where 0 5 q!~ 5 1 is a given threshold. One graph partition will correspond to each threshold. The spanning of all value of 4 will provide a large set of segmentation solutions and this make hard the segmentation problem. On the other hand, image segmentation depends on the context and it is subjective the decision process is driven by the goal or purpose of the visual
170
task. Therefore, general solutions do not exist and each proposed techniques is suitable for a class of problems. In this sense, the image segmentation is an hill posed problem that does not admit a unique solution. Moreover, the segmentation problem is often hard, because the probability distribution of the features is not well known. Often, the assumption of a Gaussian distribution of the features is a rough approximation that makes false the linear separation between classes. In the literature the segmentation problem has been formulated from different perspectives. For example, in 45 is described a two-steps procedure that use only data included in boundary, this approach has been extended t o boundary surfaces by combining splines and superquadrics to define global shape parameters 46,47. Other techniques use elastic surface models, that are deformed under the action of internal forces to fit object contours using a minima energy criteria 48. A model-driven approach segmentation of range images is proposed in 49. Recently, Jianbo and Malik 50 have considered a 2 0 image segmentation as a Graph Partitioning Problem ( G P P ) solved by a normalized cut criterion. The method founds an approximated solution by solving a generalized eigenvalue system. Moreover, the authors consider both spatial and intensities pixel features in the evaluation of the similarity between pixels. Recently, the problem of extracting the largest image regions that satisfy uniformity conditions in the intensity/spatial domains has been related to a Global Optimization Problem (GOP) 51 by modelling an images by a weighted graph; where the edge-weight is function of both intensities and spatial information. The chosen solution is that one for which a given objective function obtains its smallest value, hopefully the global minimum. In 52 a genetic algorithm is proposed to solve the segmentation problem as a G O P problem using a tree regression strategy 53. The evaluation of a segmentation method is not an easy task because the expected results are subjective and they depend on the application. One evaluation could be the comparison with a robust and well experimented method; but this choice is not always feasible; whenever possible the evaluation should be done combining the judgement of more than one human expert. For example, the comparison could be performed using a vote strategy as follows:
171
Figure 8. (a) input image; (b) human segmentation; ( c ) GS, (d) NMC, (e) SL, and (f) C-means segmentations.
where #agrkis the number of pixels in which there is the agreement between the human and the machine, I H P k I is the cardinality of the segment defined by the human, and 1 41is the cardinality of the segment found by the algorithm. Figure 8 shows how different segmentations methods (Genetic Segmentation (GS), Normalized Minimum Cut (NMC), Single link (SL), C-means) perform the segmentation of the same image. Figure 8b shows the human segmentation obtained using the vote strategy.
3.3. High Level Vision In this phase decision tasks regard the classification and the recognition of objects, the structural description of a scene. For example, high level vision provides the medical domain with objective measurements about features related to diseases, such as narrowing of arteries, volume changes of a pumping heart, or the localization of points attaching muscles to bones (used to analyze human motion) 54. Classification of cells is another example of hard problem in the biological domain; for example, both statistical
172
Figure 9. The axial moment values of an a (a) and a p (b) cell.
and shape features are used in the classification and recognition of a! and ,B ganglion retinal cells. In 55 is presented a quantitative approach where several features are combined such as diameter, eccentricity, fractal dimension, influence histogram, influence area, convex hull area, and convex hull diameter. The classification is performed integrating the results from three different clustering methods (Ward’s hierarchical scheme, K-Means and Genetic Algorithm) using a voting strategy. The experiments indicated the superiority of some features, also suggesting possible biological implications among them the eccentricity derived from the axial moments of the cell (see Figure 9). Autonomous robots equipped with visual systems are able to recognize their environment and to cooperate in finding satisfactory solutions. For example in 56 is developed a probabilistic, vision-based state estimation method for individual, autonomous robots. A team of mobile robots is able to estimate their joint positions in a known environment and track the positions of autonomously moving objects. The state estimators of different robots cooperate to increase the accuracy and reliability of the estimation process. The method has been empirically validated on experiments with a team of physical robots playing soccer 5 7 . The concept of internal model is central in this phase of the analysis.
173
Often, a geometric model is matched against the image features previously computed and embedded in data structure derived in the intermediated phase. The model parameters are optimized by minimizing a cost function. Different minimization strategies (e.g. dynamic programming, gradient descent, or genetic algorithms) can be considered. Two main techniques are used in model matching called bottom-up when the primary direction of flow of processing is from lower abstraction levels (images) to higher levels (objects), and conversely top-down when the processing is guided by expectations from the application domain 58. Matching results depends also on the parameter chosen space. For examples the classification of human bones from MRI scans requires the combination of multi-views data and the problem can’t admit an exact solution 59, the human face recognition has been treated considering a face as an element of a multi-dimensional vector space 60, in a the recognition of faces under different expressions and partial occlusions has been considered. To resolve the occlusion problem, each face is divided into local regions that are individually analyzed. The match is flexible and based on probabilistic methods. The recognition system is less sensitive to the differences between the facial expression displayed on the training and the testing images, because the author weights the results obtained on each local area on the basis of how much of this local area is affected by the expression displayed on the current test image.
3.4. Interpretation
This phase exploit the semantic part of the visual system. The result belongs to an interpretation space. Examples are linguistic description and definition of physical models. This phase could be considered as the conscious component of the visual system. However, in a pragmatic approach it is simply a set of semantic rules that are given, for example, by a knowledge base. The technical problem is that of automatically deriving a sensible interpretation from an image. This task depends on the application or the domain of interest within which the description makes sense. Typically, in a domain there are named objects and characteristics that can be used in a report or to make a decision. Obviously, there is a wide gap between the nature of images (essentially arrays of numbers) and their descriptions and the intermediate level of of the analysis is the necessary the link between image data and domain descriptions. There are researchers who take clues from
174
biological systems to develop theories, and there are those who focus on mathematical theories and physics regarding the imaging process. Eventually however, theory becomes practice in the specification of an algorithm embodied in an executable program with appropriate data representations. There are alternate views of vision, resulting in other paradigms for image understanding and research. In image interpretation, knowledge about the application domain is manipulated to arrive at understanding the recorded part of the world. Knowledge representation schemes that are studied include semantic networks 62, Bayesian and Belief Networks 6 3 , and fuzzy expert systems 64. Some of the issues addressed within these schemes are: incorporation of procedural and declarative information, handling uncertainty, conflict resolution, and mapping existing knowledge onto a specific representation scheme. Resulting interpretation systems have been successfully applied to interpret utility map, interpreting music scores and interpreting face images. Future developments will focus on the central theme of fusing knowledge representations. In particular, attention will be paid towards information fusion, distributed knowledge in multi-agent systems and mixing knowledge derived from learning techniques with knowledge from context and expert. Moreover recognition systems must be able to handle uncertainty, and t o include subjective interpretation of a scene. Fuzzy-logic 67 can provide a good theoretical support to model such kind of information For example, to evaluate the degree of truth of the propositions: 65966.
- the chair, beyond the table, is small ;
- the chair, beyond the table, is very small; - the chair, beyond the table, is quite small;
- few objects have straight medial axis; it is necessary to represent the fuzzy predicate small, the fuzzy attributes very and quite, and the fuzzy quantifier few. The evaluation of each proposition depends on the meaning that is assigned t o small, very, quite, and few. Moreover the objects chair and table, and the spatial relation beyond must be recognized with some degree of truth. These simple examples suggest the need for the use of fuzzy-logic to describe spatial relations often used in high level vision problems. However, the meaning of the term soft-vision can’t be simply limited to the application of fuzzy-operators and logic in vision. This example shows that new visual systems should include soft tools t o express abstract or not fully defined concepts 68 by following the paradigms of the soft-computing 69.
175
Figure 10. T h e Kanizsa triangle illusion.
4. Final remarks
This review has shown some problems and solutions in visual systems. Today, more than 10,000 researchers are working on visual science around the world. Visual science has become one the most popular field among scientists. Physicists, neurophysiologists, psychologists, and philosophers cooperate to reach a full understanding about visual processes from different perspectives. Fusion and the integration of which will allow us to make consistent progresses in this fascinating subject. Moreover, we note that anthropomorphic elements should be introduced to design complex artificial visual system. For example, the psychology of perception mays suggest new approaches to solve ambiguous 2D and 3D segmentation problems. For example, Figure 10 shows the well known Kanizsa illusion 70. Here the perceived edges have no physical support whatsoever in the original signal.
References 1. D.Marr, S.Francisco, W.H.Freeman, (1982). 2. S.E.Palmer, MIT Press, (1999). 3. M.D.Eesposito, J.A. Detre, G.K. Aguirre, M. Stallcup, D.C. Alsop, L.J. T i p pet, M.J. Farah, Neuropsychologia 35(5), 725 (1997). 4. M.Conrad, Advances in Computers, 31, 235 (1990). 5. [B] N.Wiener, Massachusetts Institute of Technology, MIT Press, Cambridge (1965). 6. F.Rosemblatt, Proceedings of a Symposium on the Mechanization of Thought Processes, 421, London (1959). 7. F.Rosemblatt, Self-organizing systems, Pregamon Press, NY, 63 (1960). 8. F.Rosemblatt, Spartan Books, NY (1962). 9. V. Cantoni, V. Di Ges, M. Ferretti, S. Levialdi, R. Negrini, R.Stefanelli, Journal oj VLSl Signal Processing, bf 2, 195 (1991)
176
10. A.Merigot, P.Clemont, J.Mehat, F.Devos, and B.Zavidovique, in Pyramidal Systems for Computer Vision, V.Cantoni and S.Levialdi (Eds.), Berlin, Springer-Verlag, (1986). 11. W.D.Hillis, The Connection Machine, Cambridge MA: the MIT Press, (1992). 12. C.T.Zahn, IEEE Trans. on Comp., C-20, 68 (1971). 13. V.Di Gesli, Znt.Journa1 oj Fuzzy Sets and Systems, 68, 293 (1994). 14. E.D.Dickmanns, Proceed-ings of the Fifteenth International Joint Conference on Artificial Intelli-gence, Nagoya, 1577 (1997). 15. B. Kolb and L. Taylor, Cognitive Neuroscience of Emotion, R.D. Lane and L. Nadel, eds., Oxford Univ. Press, 62 (2000). 16. P.M.Devaux, D. B. Lysak, R. Kasturi, International Journal o n Document Analysis and Recognition, 2(2/3), 120 (1999). 17. A. J.Fitzgerald , E.Berry, N.N.Zinovev, G.C.Walker, M.A.Smith and J.M.Chamberlain, Physics in Medicine and Biology, 47, 67 (2002). 18. J.Assfalg, A.Del Bimbo, P.Pala, IEEE Transactions on Visualization and Computer Graphics, 8(4), 305 (2002) 19. R.C.Gonzales, P.Wintz, Prentice Hall, (2002). 20. J. Serra, Academic Press, New York, (1982). 21. L.Vincent and P.Soille, IEEE Transactions on PAMI, 13(6), 583 (1991). 22. V.Di Gesil, C.Valenti, Vistas in Astronomy, Pergamon, 40(4), 461 (1996). 23. V.Di Gesh, C.Valenti, Advances in Computer Vision (Solina, Kropatsch, Klette and Bajcsy editors), Springer-Verlag, (1997). 24. Y. Aloimonos, CVGIP: Image Understanding, 840 (1992). 25. R.Bajcsy, Proceedings oj the IEEE, 76,996 (1988). 26. T.M. Bernard, B.Y. Zavidovique, and F.J. Devos, IEEE Journal oj SolidState Circuits, 28(7), 789 (1993). 27. G.Indiveri, R.Murer, and J.Kramer, IEEE Trans. on Cicuits and SystemsZI: Analog and Digital Signal Processing, 48(5), 492 (2001). 28. P.H.Schiller, http://web.mit.edu/bcs/schillerlab/index.html. 29. J.Canny, IEEE Trans. on Pattern Analysis and Machine Intelligence, 8(6), 679 (1986). 30. M. Ruzon and C. Tomasi, Proceedings oj the IEEE Conference on Computer Vision and Pattern Recognition, Ft. Collins CO, 2, 160 (1999). 31. O.D. Faugeras, MIT Press, 302 (1993). 32. D.Terzopulos and K.Fleischer, The visual Computer, 4, 306 (1988). 33. A.Blake and M.Isard, Springer-Verlag,London, (1998). 34. H.Blum and R.N.Nage1, Pattern recognition, 10, 167 (1978). 35. M.Brady, H.Asada, The International Journal oj Robotics Research, 3(3), 36 (1984). 36. D.P.Mukhergee, A.Zisserman, M.Brady, Philosofical Transaction oj Royal Society oj London Academy, 351, 77 (1995). 37. T.J.Chan, R.Cipolla, Image and Vision Computing, 13(5), 439 (1995). 38. J.Sato, R.Cipolla, Image and Vision Computing, 15(5), 627 (1997). 39. V.Di Gesh, C.Valenti, Journal oj Linear Algebra and its Applications, Springer Verlag, 339, 205 (2001).
177
40. G.E.Collins, Proc.of the Second GI Conference on Automata Theory and Formal Languages,Springer Lect.Notes Comp. SCi., 33, 515 (1975). 41. A.Rosenfeld, Journal oj ACM, 20, 81 (1974). 42. H.Samet, ACM Computing Surveys, 16(2), 187 (1984). 43. R. Duda and P. Hart, NY: Wiley and Sons, (1973). 44. K . h , Pattern Recognition, 13. 3 (1981). 45. A. Pentland, Int. J. Comput. Vision, 4, 107 (1990). 46. D.Terzopulos, D.Metaxas, IEEE Dans.on P A M , 13(7) (1991). 47. N. Raja and A. Jain, Image and Vision Computing, 10(3), 179 (1992). 48. I. Cohen, L. D. Cohen, N. Ayache, ECCV’92, Second European Conference on Computer Vision, Italy, 19 (1992). 49. A. Gupta, R. Bajcsy, Image Understanding, 58, 302 (1993). 50. J S h i , J.Malik, IEEE on PAMI, 22(8), 1 (2000). 51. R. Horst and P.M. Pardalos (eds.), Handbook of Global Optimization, Kluwer , Dordrecht 1995. 52. G.Lo Bosco, Proceedings of the 11th International Conference on Image Analysis and Processing, IEEE Comp SOC.Publishing, (2001). 53. L. Breiman, J.H. Friedman, R.A. Olshen,C.J. Stone, Wadsworth International Group, (1984). 54. S.Vitulano, C.Di Ruberto, M.Nappi, Proceedings of the third IEEE mt. Conf. on Electronics, Circuits and Systems, 2, 1111 (1996). 55. R.C.Coelho, V.Di Gesti, G.Lo Bosco, C. Valenti, Real Time Imaging, 8 , 213 (2002).
56. T.Schmitt, R.Hanek, M.Beetz, S.Buck, and B.Radig, IEEE Transactions on Robotics and Automation, 18(5), 670 (2002). 57. M. Beetz, S. Buck, R. Hanek, T. Schmitt, and B. Radig, First International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS), 805 (2002). 58. V.Cantoni, L.Carrioli, M.Diani, MTerretti, L.Lombardi, and M.Savini, Image Analysis and Processing, V.Cantoni, V.Di Ges, and S.Levialdi (Eds.), Plenum-Press, 329 (1988). 59. P.Ghosh, D.H.Laidlaw, K.W.Fleischer, A.H.Barr, and R.E. Jacobs, IEEE Transactions on Medical Imaging, 14(3), (1995). 60. A. Pentland, T. Starner, N. Etcoff, N. Masoiu, 0. Oliyide, and M.Turk, Proc. Workshop Int’l Joint Conf. Artificial Intelligence, Looking at People, (1993). 61. A.M.Martinez, IEEE Duns. on P A M , 24(6), (2002). 62. A.T.McCray, S.J.Nelson, Meth. Inform. Med., 34,193 (1995). 63. J.Pear1, Morgan Kaufmann, (1988). 64. N.Kasabov, MIT Press, (1996). 65. C.V.Negoita, The Benjamin/Cumming Publishing Company, (1985). 66. LSombB, Wiley Professional Compting, (1990). 67. L. Zadeh, Information and Control, 8 , 338 (1965). 68. V. Di Gesti, Fundamenta Informaticae, 37, 101 (1999). 69. L.A.Zadeh, Communication oj the ACM, 37(3), 77 (1994). 70. G.Kanizsa, Rivista di Psicologia, 49(1), 7 (1955).
178
INFORMATION SYSTEM IN THE CLINICAL-HEALTH AREA G..MADONNA
Sistemi Informativit
1.
The external context and objectives of the information system
The reference context must consider the analysis contained in the guidelines of the “White Book on Welfare”, whose objective is that of setting down a reference picture for the creation and strengthening of the Country’s social cohesion. From this viewpoint, two fundamental aspects, characterising the Italian situation, are being analysed: the matter of population and the role of the family, and two main objectives are identified: to favour the birth rate and improve family policies. As far as these focal themes are concerned, this document does not constitute a closed packet of proposals but rather it aims at representing a basis for a discussion about a new model of social politics. The policy on solidarity must be set into a framework of broad ranging actions aimed at guaranteeing social cohesion as a condition itself for development: this is the way institutional changes are going, and they are already underway both in Europe with the basic rights Card and the Lisbon summit, as well as in Italy, with the modification of Title V of the Constitution. Support to the family, the elderly and the disabled. These are therefore the main objectives the “White Book on Welfare” wishes to achieve. A revolution which wishes the family to be enhanced to the utmost, where the word family is to be understood as being the fundamental and essential nucleus for the development of civil society. The National Health System is to be inserted into this context, a system which is today subject of profound changes which concern its welfare mission, its organisation and underlying system of financing. The Legislative Decree 229/99 (Rgormu Ter) reformed the “SSiV” - the National Health System reiterating and confirming the concept of turning the system into a “company” characterised by the strategical, managerial and financial autonomy of the single health structures.
179
This scenario can however be placed into a process of transformation going back a long way, right from the Legislative decree 502/92 and which, as a whole implies: the consolidation of the trend to move the attention of the public action away from the formalities of procedures, to the verifying of the effectiveness of results; the orientation towards people satisfaction and the transparency of public action and in the case of the health system, to the achieving of best welfare levels compatible with the resources available; the attention to budget obligations, and the taking on of appropriate tools for governing them, such as the afore-mentioned process of rendering the system like a “company” Of course, both from the view point of the information system, and that of the strategical objectives of the company, to achieve at the same time the objective of satisfying users (effectiveness of result, quality of the level of service) and that of the control of costs on the basis of an availability of obliged resources (the objective about the efficiency and productivity), is not an easy task. Figure 1 represents the situation in an elementary way: a Company can achieve excellent performance results, but at the cost of a use of resources which does not pay attention to economic obligations and therefore after a short time is unsustainable; on the contrary, an approach which is too attentive to economic obligations alone (an “obsession” with costs), ends up transforming itself into low costs per unit of product, at the price of poor quality products, and implies that the essential public aim of prevention and healthcare for its citizens is not achieved.
180
users a'reserved with "infinite
Figure 1. Trade Off Effectiveness/Efficiency
The pursuit of these objectives is to be placed into a context of interactions with external entities which exchange data, information, requests. In the diagram at Figure 2, this situation is represented very simply, but effectively, so as to clarify any basic problems there may be: the crucial role of the health service, played by family doctors, and the necessity therefore to construct an effective relationship between them and the Company; the importance of the Region, as this is the office which issues not only financing but also standards, and the shifting of the central State Adrmnistration to a role of secondary level, almost always mediated by the Region, as regards the Company;
181
0
the emerging of consumer protection associations as significant actors who ask for information and system transparency, especially from the point of view of outcome and behaviour; the emerging, alongside the more traditional interactions of the Compwy with certified and agreed structures on the one hand and with suppliers of various services on the other, of further possibilities of interaction and the utilisation of suppliers of health and external welfare services (service co-operatives, non-profit making companies, voluntary workers, collaborators etc.).
Figure 2. Context diagram of the health Company
182
To sum up, all of these interactions put together require the information system supporting them to be able to: concentrate on the management of fundamental interactions (those with the patient), so as to ensure the governing of the strategic objectives pursued; handle other exchanges of information adequately, in particular by adopting progressively a logic of close integration of the information flows with external companies (for example entrusting GP’s with computerised appointment taking operations); produce automatically a full and explanatory set of information of public domain or with controlled access, so as to guarantee the necessary performance transparency of processes, and therefore make them usable by third parties, by way of portals. 2.
Internal organisation: objectives and strategic variables
The primary objective of the Company is that of guaranteeing essential levels of welfare, according to what is laid out in the National Health Plan: collective health and medical care for life and at work; district medical care; 0 hospital medical care. The division of these objectives can be immediately transformed into a structural division of the Company into Departments, Districts and Hospitals, each of which expresses their specific information needs. The three types of structures, visualised at Figure 3, must be supported by service structures and must be governed by a system of objectives and strategic variables which integrate them. As to the former, the district, department and hospital structures visualise it as the heart of components the company is made up of; a heart supported by a group of general technical and administrative services, and which exchanges data, information, processes and activities with the outside in two ways: 0 processes of access by the Company users; 0 processes of control by the strategic management on the Company. 0
183
ministrative Technical Suppo
Figure 3. Organisational components of the health Company
The strategic variables transversal to the three types of structure, then, can be identified on the basis of the two essential access and control fbnctions: 0 management control; the information and IT system, decisive both for the circulation of data for access purposes, as well as for their analysis for control purposes; the quality system, i.e. new attention paid to the service offered to the user; human resources development. 3.
The role of the information system: a system for transforming data into information
Having an “information system”, and not just scattered chunks of IT applications, represents the decisive leap of quality which needs to be taken; some fixed points must be brought into this direction, method points to be respected in that they constitute the “critical success factor” of an information system.
184
The Information System generates information useful for improving management and therefore takes account of and provides an answer to the following aspects: operational processes which constitute the activities of the Company and levels and their integration points; decisions which must be taken on a daily basis; assessments which must be carried out in order to make decisions; organisational responsibilities of those called upon to carry out the assessments and make decisions. In order to succeed in achieving these objectives, the system must respond to the following requirements: it is not parallel to the management system but an integral part of it, in that all the information, both for efficiency purposes and for system quality, can be constructed and generated by ordinary operational processes; the same datum in the scope of a homogenous procedure is collected only once and from one place only, in that the duplication of information sources, far from being a factor of data quality, is certainly a cause for redundancy of procedures and even system errors; it has the utmost flexibility and highest degree of opening to evolution of solutions both technological and, especially, organisational, so as to be able to cope with the inevitable multiplicity, diversification and poor standardisation of activities. The best integration possible between applications and the archives which constitute the information system must be envisaged, in that only from integration is it possible to obtain both process efficiency and availability of top level information. In other words, information, which is the significant aggregation of processed data according to interpreted models, is often the result of the combination of data contained in archives from different subsystems. There can be many examples, but maybe it is sufficient to recall the problems of management control, which is that typical management function through which the Company ensures that the activity carried out by the organisation goes in the direction set out by the units concerned and that it operates in the best conditions of inexpensiveness possible. From an information point of view, management control is an area where very diversified information is collected to be summed up into economic data and activity data or indicators of results, so that the costs of the activities and relative production factors can be compared. Expressed in these terms the reasoning is not particularly complex. The nodal point, nevertheless, is that the designing of a similar system requires the
185
identification of the different levels within which the data must be treated and processed and, in a correlated way, the identification of the integration mechanisms of the data themselves. Since integration is the solution to functioning problems of the information system of the company, an integration which is able to exchange flows of information is to be achieved by means of solutions which enable the sharing of archives by all subsystems. So it is clear how the help of a strong, organic and elastic support system is fundamental to informatiodmanagement activities. The system must be organised in such a way that the data only originate from primary sources, identified as follows: original data, generated by management processes; second level data, produced by processing procedures; complex data, resulting from the automatic acquisition from more than one archve. The three above-mentionedtypes are analysed and described as follows.
Original data: We can define as original all the data that, as compared with the information process as a whole, are collected at the origins of the process in a strictly correlated way to the operational management of the activity. Original data are therefore all the current information put together, managed by the part of the information system that can be defined as transactional or On Line Transactional Processing (OLTP).
Second level data: This is data obtained through the processing of information already acquired by the information system, possibly by procedures of a different subsystem to the one which is using the data.
Complex data : This is data necessary to activities of control, management and statistical and epidemiological assessment, which have been originated by a crossing of data present in more than one archive at the moment in which they are correlated, in order to be able to express significant values. This therefore deals with the definition of one or more data warehouses, starting from which the specific application systems see to the processing of a more aggregated level, in an On Line Analytical Processing (OLAP) logic.
186
Figure 4. The layers of the health information system. The division by sorts of data dealt with above, can be interpreted in the light of the centrality of the patient in the health information system: Figure 4 visualises the analogy between the levels of data complexity and the thematic layers of the health information system related to the patient: the system centres upon the patient, who is at the base of the information pyramid; the next layer is constituted by the original data, which are related to the patient; 0 fbrther up, the processing characteristic of operational systems is placed (second level data), which operates fundamentally for the
187
service of the patient, both directly, in that they are the direct user of them, and indirectly, as support to the work of health staff; the top two layers, both characterised by complex data resulting from statistical or OLAP processing, divide the information about the patient according to the two axes - effectiveness and efficiency - recalled most often: J in the first direction, the essential processing is of the medical and clinical sort, supporting the quality of the health outcome and capability of the system to guarantee the necessary levels of assistance; J in the second direction, the typical processing are those supporting managerial and strategical decisions and control of expenditure and the relative production level. 4.
Orientation of the patient and analysis capability
As a whole, the information system of a health Company must be able to undertake many and diverse tasks and it is necessary to computerise adequately all the processes inside the Company’s administrative machine. In this sense, the general orientation to the patient, which constitutes the reading key favoured here, obviously is not sufficient. In fact, it is clear that there are specific areas of management and analysis where it is essential to support specific processes. For example, in the support of specific activities like that of hotels or in interventions of industrial medicine, it is necessary to interface more to companies than to individuals. However, according to the definition presented here, the computerising of internal processes (accounts, staff management, supplies), even though essential, must be a result of a total design which puts the management of processes orientated to users outside the Company at the centre. And this for two reasons: firstly, because the average level of computerisation of administrative processes is often already higher than those strictly productive, so that effectiveness and efficiency can be improved in the second direction; secondly, because only by making data flow in a direct and integrated way from production processes to administrative ones, in particular according to the analysis of effectiveness and efficiency mentioned above, is it possible to provide real added value to the Company through its information system. The ambition of this proposal of computerisation based on an integrated information system is moving in this direction. This can be illustrated with the aid of a diagram, visualised at Figure 5, of the main (certainly not only) ways of access the patient has to the health system.
188
Q P
USER ASSOCIATION
Management Operator
Operator
Figure 5. Simplified diagram about access to health services
The three “ways of access” identified in the figure (medical checkup/prescription from the family doctor, hospital access in adrmssions and hospitalizazation, information access by way of the PR Office) set up a path which, through the automation of the phases of the patient’s entrance into the health and medical care system (Appointments Centre, ADT, support to the PR Office), constructs a precise information route whch has numerous uses:
0
0
firstly, the patients’ register is placed as a place for co-ordination and integrated management of the different information elements; the various processes of care act on this level (admissions, tests and check-ups), these produce both technical information elements (for example the Hospital Discharge Card) but also effects such as admmistration and economics, with the subsequent assessment of the services delivered; the admmistrative and control processes, i.e. in a general line with all the internal aspects necessary for developing the analysis capability of the Company and its top management, are the consequence of the access process and the complex path of welfare and medical care.
189
All in all, therefore, building up an information system orientated to the patient, if the flows handled are correctly integrated, also results in being an effective and complete way of supporting the internal processes of an administrative sort, because of their natural correlation to the welfare and medical care path. 5.
The Hospital Information System (HIS)
The Hospital Information System, so as to manage the complex wealth of company information and at the same time guarantee the involvement of operative departments, must satisfy the objectives illustrated here below:
Centrality of the patient Information which is generated about a patient at the moment they come into contact with the structure and receive the services requested, is collected and aggregated at the level of the patient themselves. Co-ordination of the welfare process The co-ordination and planned co-operation between operative units within the health structures (hospital specialists, family doctors, district teams in the hospital-territory system) enables the activation of a welfare process which ensures the highest levels of quality, timeliness and efficiency in the access to services and their delivery. Control of production processes Recent health legislation imposes an optimisation of expenditure and revision of structures and processes. The transformation according to modules of the company sort requires that medical care is carried out in the context of a global process which contemplates both clinical aspects and administrative ones. The primary objectives that the subsystem of the ClinicaVHospital proposes to resolve are as follows: improvement of the organisation of work for a more free and rational use of health structures; improvement of efficiency and effectiveness of services in the light of the qualitative growth of clinical welfare; reduction of time spent by patient in the health structure, by means of the organisation of waiting times between the request for a service and the relative delivery (average in-patient times);
190
rationalisation of the management of health information about the patient: previous complete and exhaustive clinical hstories; availability of information for investigations and statistical surveys, clinical and epidemiological research; use of the System as a decisional support. Figure 6 represents the integration between modules which compose the subsystem of the ClinicaVHospital Area. The assertion of the centrality of the register, i.e. of the patient, which is found to carry out the function of a primary node of the whole system is confirmed.
Figure 6. Integration of modules
191
6. Integration of modules - processes The integration of modules, i.e. the term Integrated Health System cannot be left behind by a new company vision: a vision based on “Processes”. This statement proposes another determining key for the “information system” as strategic variable: the capability of the system to be of support to company processes and, therefore to map itself out and configure in a flexible way on these processes. In short, the logics of design requested of an ERP system must regard not only administrative and accounting systems - now an acquired fact - but health ones too, both of the health-administration sort and health-professional sort. Each process is in itself complex and integrates administrative, health, economic and welfare elements. Each process requires therefore information integration towards itself and other correlated processes. As Figure 7 shows, an implicit logical information flow does exist which transports information from health processes (“production” in the strictest sense) towards directional processes (the “government” of production), transiting for processes which are to a certain extent auxiliary ones (but obviously essential for the functioning of the production machine of the company), of the healthladministrative type and the accounting/administrative one.
j .........................
. . . . . . . .
Figure 7. Integration of processes
. . . . . . . . . . . . . . . . . . . . . . . . .
192
The architecture of the system must enable the support of the processes described in the figure and the information flows which tie these processes, and in particular: the territory-hospital integration favouring the patient and relative welfare processes. The integration of welfare processes between operative units of hospitals and operators over the territory (family doctors and pharmacies) is activated through the support of the functionality of the whole process of delivery of services: from information activities, prescription, appointment making, delivery, to that of payment of tickets and withdrawallacquisition of return mformation. the integration of the operational units, whether clinical or not, as auxiliary service to the medical and health staff in the carrying out of activities relative to the care of the patient. The information system must handle in a unitary and facilitated way the activities of the care process controlling the process of the delivery of services requested and their outcome, so as to obtain an improvement in quality and efficiency. the integration of clinical-welfare information in order to guarantee compactness of the welfare process. The visibility of the status of the total clinical-welfare process towards the patient is made possible thanks to the access to previous clinical records. 0 the integration of information with directional purposes as auxiliary service to the management personnel of the Company. Following the latest reforms, Companies are pursuing management improvement, guaranteeing the delivery of services at the highest levels of quality, aiming at the final objective of health, at the total outcome of welfare. From that comes the need to collect, reconcile and integrate information coming from the different information, administrative and health subsystems. the information integration with the Regional Health Office General Management in order to facilitate both the administrative operations aimed at reimbursements (communications with local health authorities and with the Regional office concerning budgets and services delivered), as well as control and regional supervision activities about health expenditure, and lastly activities of health (epidemiological) supervision. The interaction of the Company with the GM of the Regional Office has a double scope: to transmit promptly the necessary documentation to receive reimbursements and regional financial support the Company is due for services delivered, and to provide the Region with the information necessary in support of the governing of expenditure and management of the financial support, planning and rebalancing of the Regional Health System and the improvement of services for the population.
193
What follows here is an illustration of three examples of health processes which involve a series of integrated modules to achieve their objective. The key identifies the fbndamental health processes carried out at the hospital structure (represented on the lefi-hand side of the figures which follow), as well as the specific activities necessary for the carrying out of the processes themselves (in the green circles in the tables).
6.1. Delivery of Services Process The process of Delivery of Services starts out with the request (Prescription) for the Service itself from the family doctor, the hospital doctor, Emergency Units and concludes with the issue of the referral (health process). The ahstrative/accounting process is completed with the handling of payment, the production of flows relative to the Ambulatory Services, the handling of flows relative to the Mobility of Specialist Services.
194
Figure 8 - Delivery of services process
The process of making appointments (request) can be activated by the modules: Appointment centre and Web Appt. Centre (external patients); INTERNAL Appt. Centre - Admissions, Discharges and Transfers (ADT) and Ward Management (internal patients); 0 Emergency Department and Admissions (Management of Urgent Requests); Out-Patient Management (Direct Acceptance of Service). All the services within the preceding modules must automatically generatelfeed the works list of the Delivery Units of the services themselves (Ambulatory Management) and, if subject to payment, produce written accounts to be visible at the module of Cash Department Management (the movement is recalled by the identification code of the appointment, or alternatively by personal data). The acceptance and delivery of the service activates the referral process and the management of its progress; the issuing of the referral qualifies its visualisation by the modules: AmbulatoryILaboratories Management, Admissions Discharges and Transfers (ADT), Ward Management, Emergency and Admwions Department (EAD), Operating Theatres, Departments concerned. The services registered as delivered activate the processes of administrative/accounting flows: control and analysis of ambulatory services (production of File “C”), control and data validation for feeding the Mobility of Specialist Services. 6.2. Hospitalisation process The Hospitalisation process starts out with the request (Prescription) for admission to hospital from the family doctor, hospital doctor, Emergency Units and concludes with the discharge of the patient and the closing up of their Clinical Record (health process). The administrativelaccounting process is completed with the handling of the Hospital Discharge Card (HDC) and the pricing of the HDC itself (Grouper 3M integration), the production of flows supplied by the closed HDC’s, the management of flows relative to the Hospitalisation Mobility.
195
Figure 9 - Hospitalisation process The Hospitalisation Process (request) can be activated by the modules: Admissions, Discharges and Transfers (ADT), for programmed hospitalisation and as DH (both as administrative admission and waiting list); Ward Management (programmed hospitalisation); Emergency Department and Admissions (Urgent hospitalisation); Internal Appt. Centre for DH admmions with servicedprocedures by appointment. Once admission has been done, aside from the module used, the patient is placed directly onto the in-patients list for the ward or onto the list of admissions to be confirmed for the ward. From the ADT modules and Ward Management it is possible to visualise the preceding reports of hospitalisation and ambulatory services, i.e. the patient’s clinical history.
196
The cards relative to hospitalisation, once filled out, are recorded in the HDCDRG, which sees to the carrying out of pricing (Grouper 3M) and formal controls and then to feeding the Hospitalisation Mobility module. The In-patients Ward Management represents in itself the evolution of a clinical-health process in the area of administrative process (admission and filling out of the Discharge card). The degree of integration sees to it that the list of in-patients of the single ward is fed by all the modules of the Health System whlch can carry out the admmistrative hospitalisation. Data relative to therapiedmedicine giving and the pages of the specialist clinical record are visible in the complete clinical history of the patient. The giving of medicine means an automatic writing for discharging form the pharmacy cabinet, which is integrated with the Pharmacy Store for the management of the sub stock and relative order for supplies. From the specialist clinical record it is possible to access all administrative and clinical data relative to the hospitalisation in itself as well as previous hospitalisations (including therein all the data about ambulatory services with referrals inside the Ambulatory Management). The figure which follows represents the logical flow relative to the Integrated management of an In-Patient Ward.
197
Figure 10 - Integration of an in-patient ward
6.3, Weyareprocess The Delivery of Welfare Services process starts out with the request (opening of file) for the Service itself from the family doctor, hospital doctor and is defined with the management of the Multi-dimension Assessment file for Adults and the Elderly and the putting onto the waiting list according to the type of welfare regime. The health process is completed with the discharge from the welfare structure and the closing of the Clinical Record. The admmistrative/accounting process is fed by the recording of the activities delivered, so the price lists relative to both private structures and the ones in the National Health Service are valid. On the basis of preventive activities it is possible to carry out expenditure forecasts for each structure and each type of welfare regime. For every welfare structure it is possible to import
198
specific plans (defined by the regions) relative to the activities carried out on patients on their lists. On the basis of the data from the files, it is possible to carry out controls on the suitability of said data; controls enable transparent management of payments made to private structures with welfare regime. Figure 11 - Welfare Process
Figure 1 1 -Welfare Process
The welfare process (request, opening of file) can be activated by the modules: Admissions, Discharges and Transfers (ADT), for the management of integrated home and protected residential assistance, should this activity be strictly connected to the post-hospitalisation phase, i.e. protected discharge; 0 Social-Health department (management of the Multi-dimension Assessment file for Adults and the Elderly and the putting onto the waiting lists). The assessment of the patient, for whom the request for assistance has been presented, activates the specific welfare regime:
199
Hospitalisation in Protected Welfare Accommodation; Day centre Care; Temporary Social Hospitality; Protected Accommodation; Day Centre for the demented; Rehabilitation assistance; Integrated Home Assistance; Hospice.
200
A WIRELESS-BASED SYSTEM FOR AN INTERACTIVE APPROACH TO MEDICAL PARAMETERS EXCHANGE GIANNI FENU University of Cagliari Department of Mathematics and Informatics e-mail:
[email protected] ANTONIO CRISPONI University of Cagliari Department of Mathematics and Informatics e-mail:
[email protected] SIMONE CUGIA University of Cagliari Department of Mathematics and Informatics e-mail:
[email protected]. it MASSIMILIANO PICCONI University of Cagliari Department of Mathematics and Informatics e-mail:
[email protected] The use of computer technology in the sector of medicine has seen lately the study of different fit applications to the clinical data management, being textual or images, across networks in which has always been given priority to the politics of flexibility and safety. In the pattern here shown I have summarized the factors of flexibility necessary for the development of systems for ample application-oriented spectrums, creditable to the technology investment wireless allowing safety and guaranteeing a certified exchange in client network-server. Besides the need to offer an ample base, in client side, has suggested the investment of different PDA patterns, making largely portable the application and allowing architectural independence. The same smart-wireless network smart-client allows cell-growth in the area of employment.
1
Introduction
The diffusion of computer tools in different sectors of modern medicine has marked an evident discontinuity in the steps of development in scientific activities. Notable benefits have been brought in the field of medicine, by the improvement of hospital services and from the consequential growth of the quality of the aid, that in different ways is related not only to the development of
20 1
information process, in a precise sense, but even in the transferral of mformation. In particular the transferral of information has seen in time and improvement of applications inside and outside of the hospital environment. The evolution of the computer technology and of the wireless networks, combined to the development of applications in order to reconcile the mobility of the consumers, offer several application fields. Among innovative services and technologies, this contribution was born, which presents itself as an aim to study and implementation of a client-server network in hospital environment with the use of palm-devices for the consultation, the visualization and the insertion of the data relating to the clinical folders of the patients in interactive way. It allows the recovery of information relatives to any previous patient admission, clinical folder data or only to verify the quality and/or quantity of specific parametric data. The opportunity to be able to consult the patients data base w i t h the hospital structure allows an analysis and monitoring therapies, without the necessity of paperwork, suggesting a different method of visualization of the data in each department. The client interface has been studied to guarantee a easier use and consultation, makmg its user-friendly for users with a little computer knowledge and allow security and flexibility in hospital departments. Particular attention has been given to the internal radiofrequencies in the hospitals, through the use of the ISM frequencies ( Industrial, Scientifical and Medical) specified by the ITU Radio Regulations for the scientific and medical devices, with inclusive bands among 2,4 and 2,5 GHz, consenting the network to conform to law. 2
Architectural Model
The solutions for the development of wireless applications are distinguished in browser-based, synchronization-based, and smart-client [3]. The browser-based introduces the disadvantage to require a permanent connection, with the consequent problems of high data exchange, more than user's necessity, and cachmg of informations, that would be able not result update. The synchronization-based solution introduces the contrary disadvantage, the offline operation, and it doesn't allow the system to work in real-time in wireless network; the application uses a cache of data on the handled device. The smart-client solution allows the network exchange of in demand informations only, guaranteeing data rate and simple inquiry mode; a further advantage is the independence by network architecture, integrating in existing server arclutecture.
202
Figure 1. Smart-client architecture. Wi-Fi connection between a PDA client and a server. The client-server systems was born as coordination of two logical systems ( the client and the server ) and of their components in applications. The clientserver application planning requires the choice of appropriate architecture; the client-server software has to be modular, because many parts of an application were execute in more then a system [I]. The separation of application functions in different components makes easier client-server processes, as components give clear locations when the applications are distributed. It needs besides to divide application modules in independent components, so that a component of request (client), it’s expect outputs through a component of response (server). Beginning from a classical definition client-server architecture like interaction among application distributed components, it notices that the execution location of components become main point for performances of the application and of the system. Some components of applications are able be executed in more efficient way in the server or in the client. Tasks of any fknctions are given below: Tasks of the client:
Visualizatiodpresentation Interaction with users; Request formulation through the server. Tasks of the server: Query on the database;
203 0
Possible arithmetics and graphic processes;
0
Patient’s data management; Communications and authentications of users; Response process.
To make executable the code on more handled device (pocket-size PC and PALM ) it’s important to eliminate the dependences of operating system by the code of applications; because currently hardware interfaces and peripherals of PDAs are not standardized, allowing so the possibility of use on different operating systems ( Palm Os, and Win CE ). In network architectures, the management of client-server interactions uses communication techniques based on socket, considered currently the more flexible technology. The client negotiates the socket with the server and establishes a connection on the same socket; through this channel all informations of every single client converge on server, so the same client, communicates with the server in different moments, would not be able to receive data fiom the same port of the server.
3
Patients’ data management
The filing, the management, and the consultation of data-base are managed by a DBMS engine, that allows the abstraction of data by the way when these are storaged and managed, and, above all, the possibility to perfom queries with a high level language. Particular consequence has again the authentication system, to avoid database consultations by unauthorized users. To can implement such mechanism is scheduled different users’ profiles diversified in several authorization levels, to the minimum, in two areas: a narrow area and a reserved area. On the skeleton of the clinical folders of the patients, had studied a relational structure contained information fields of patients: the bed in which is entertained, the informations to the hospitalization, the temperature bend, the water budget, the parameters relating to the breathing and cardiac rate, the specific therapy and the medical examinations of the patient.
204
Figure 2. Scheme of the database.
4
Client interface
The application answers three fundamental requirement: provide users’ informative reports, provide data transfer security and reliability and provide the data communication of different format but compatible with the computational archtecture and the interface of clients. The mechanism of communicatiodsynchronization between server and client are implemented through a pattem-matching system to interpret, on serverside, client commands, and on client-side, server commands [ 11 [4]. Steps of the communication between server and client are: Send from client-side a string composed by association
#<parameter>; 0
Decoding operated by the parser on server-side of command string;
205
Execution command and following operation on database; Encoding informations server-side #; Send from server-side informations; Decoding operated by the parser on client-side of response string.
I
Cognome: ICognome
,.
I 4 Pnitd Operatiw
I
15i03i20031
Figure 3. Searching mask of patients. I
$
Cognome: Nome: Ie Interu.: ,....,:vento Letto: II 6ttn
Pnitd Oparatiw
I
15/03/20031
, [TEMP.]
[m
Figure 4 . Searching mask of patient's personal data.
1
Frq. Card. 1-iminuto Frq. Resp m i m i n u t o Pres. Rrt. ( p m a x m m i n
'Dirte:
7
P i t d Operatiw
Figure 5
. Searching mask of
I
15i03/200:
cardiobreathing parameters.
206
During the implementation of communication protocol are given prominence to some problems in the management of data flow; this is depend on different implementation of sockets between client and server, to modify platforms. For the solution of those problems and the overcoming of limitations, it’s chosen of limit the information send fiom the server to the client in a fixed size of 100 bytes at step. The in general, server-side code interprets the commands in arrival fiom the client and it performs the calls to response functions.
Figure 6. Searching mask of water balancing parameters.
Figure 7. Searching mask of the parameters of the temperature bend.
Particular attention gives to the implementation of the temperature bend; from client-side it’s possible visualize the graph of the daily or weekly temperature. This element is of notable aid for example to verify during the medical discussion the post operating feedbacks of patients.
207
5
Standards for transmission
To provide request interactivity between client and server for to the exchange of real-time informations, is implemented a communication system extremely flexible and suitable to guarantee the mobility of the regulating PDA [2] [6] [7]. Currently in the market they are present different technological solutions to satisfy the quoted requisites. The transmission occurs in the ISM band (Industrial Scientific Medical) about the 2,45 Ghz; it uses spectrum spread technical to avoid interference signals. The transmission of the signal is forced on a greater band of that necessary and, in this way, more systems are forced to transmit in the same band [5]. In the classes of the wireless devices exist a difference between the devices born to integrate perfectly with Ethernet LAN preexisting ( IEEE 802.1 1 ), and devices that use an alternative technology to implement area personal computer network (PAN), as Bluetooth. The use of the protocol IEEE 802.11b, that defines a standard for the physical layer and for the MAC sublayer for the implementation of Wireless LAN, rapresents a system of communication that extends a traditional LAN on a radio tecnologies, and so it facilitates the integration in existing department. The adopted WLAN may be configured in two separate modes: Peer-to-peer. A connection without any existent fixed network device; two or more devices equippeded with wireless cards are able to create a lan on the fly. Clientherver. T h s mode permits a connection to an Access Point to more devices, and it works as a bridge between them and the wired lan. Bluetooth [8] is the name of an open standard for wireless communications, studied for transmissions to short ray, low power, with facility of use. Bluetooth works at the frequencies of 2,4 GHz in the ISM band in TDD ( Time Division Duplex ) and with Frequency Hopping (FHSS). It uses two different typology of transmission: the first, on bus of asynchronous transmission; it supports a maximum rate of 72 1 Kbps asymmetric; the second works on synchronous bus, a symmetrical installments of 432,6 Kbps. To communicate among them, two or more BD (Bluetooth Device) have to form a piconet (a radio LAN with a maximum number of eight components). The baseband level reserves a slot to the master, and a slot to the with a typically alternate course ( master-Slave 1-master-Slave2-. . .), for the resolution conflicts of the bus. The master is able transmit only in odd slots, while the Slave in pair slots. The forwarding scheme of the packets is entrusted to the mechanism of ARQ (Automatic Repeat request). The notification of an ACK
208
(ACKnowledgment) to the sender of the packet testifies the good result of the transmission. The smart-client architecture works on both wireless systems. To be able to works in a wide area is recommended the use of the technology 802.1 lb, that results be altogether to low cost, compatible with the standards Internet and Ethernet; it allows high trasmissive speed with good QoS (Quality of Services) to ensure low losses of packets in the applications in real-time. 6
Cryptography and Security.
One of the weak point in an any radio communication, is the privacy and the data security. Usually, for this reason, when a protocol for radio communications is unplemented, there are adopted advanced secutity standards. Even in this case, the security has been submitted to the functions of data authentication and cryptography. The process of the authentication is divided in two ways: Open System Authentication Shared Key Authentication The first way doesn't schedule real authentication, so any device is permitted to access. The second way happens with a pre-shared key. This process is similar to the process of authentication in use in the GSM architecture; in fact when a server receives an authentication request, it sends to the client a pseudorandom number. The terminal user, based on the pre-shared key and on the pseudorandom number, calculates the result output (through a no reversing fimction) and sends it to the server. T h s one makes the same computation and compares the two data; in this way it determines if the consumer is qualified to the access. In such mode it is guaranteed the PDA authentication-server. One of the aspects in the use of the standard 802.1l b is one's own security, which is entrusted to a protocol called WEP ( Wired Equivalent Privacy ), that is concerned with authentications of nodes and cryptography. The logical diagram of the WEP algorithm is represented in this figure:
209
b
WEp Secret Key
Kev seauence,
PRGN
b
QD+
IV
Ciphertex
~
lntegritycheckvalue (ICV)
Figure 8. Operation diagram of the WEP algorithm
The inizialization vector (IV) is a 24 bits key, linked with the secret key (40bit key). In this way we obtain a set of 64 bits that are integrated as input in a generator of pseudorandom codes ( PRGN WEP ) creating the sequence key. The data users (Plaintext) are linked with a value of 4 bits, called Integrity Check Value (ICV), generated from the algorithm for the integrity. At this point the key sequence and the output of the connection between plaintext and ICV, are submitted to a XOR operation (ciphertext). Then IV and chiphertext are lmked and subsequently transmitted. The IV changes for every transmission, and it’s the only that is transmitted in clear, to be able to reconstruct the message in receipt phase.
7
Conclusions
The described architecture is characterized for aspects of the simplicity of implementation, and the insertion in complex existent architecture. T h s integration model is characterized in portability, security and interactivity, and it consents to the user to interact with the server even to distance. There are in study some enhancements to process different models and criterions for direct data exchange in the PDA architectures. However the model of an user without ties in the treatment and in the access to parametric data of the patient represents in this moment a simply and reliable smart-client architecture.
210
References 1. E. Guttman and J. Kempf, Automatic Discovery of Thin Servers: SLP, Jini and the SLP Jini Bridge., Proc 25th Ann. Conf, IEEE Industrial Electronics SOC.(IECON 99), IEEE Press, Piscataway, N.J., 1999. 2. AAW, a cura di F. Mwatore, Le comunicazioni mobili del hturo UMTS:il nuovo sistema del200 1, CSELT, 2000. 3. Rajan Kuruppillai, Mahi Dontamsetti and J. Casentino, Tecnologie Wireless, Mc Graw Hill, 1999 4. L. Bright, S. Bhattacharjee and L. Rashid, Supporting diverse mobile applications with client profiles, International Workshop on Wireless Mobile Multimedia, Proc. 5" Ann. Conf. ACM, p. 88-95, Atlanta, Georgia, 2002 5. European Telecommunications Standards Institute, official site: http://www.etsi.org 6 . PDA Palm official site: http://www.palm.com 7. Italian Palm Users Group, official site: http://www.itapug.it 8. BlueTooth, official site: http://www.bluetootli.com
This page intentionally left blank