Rough Fuzzy Image Analysis
Foundations and Methodologies
K10185_FM.indd 1
3/29/10 1:38:14 PM
Chapman & Hall/CRC Mat...
10 downloads
555 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Rough Fuzzy Image Analysis
Foundations and Methodologies
K10185_FM.indd 1
3/29/10 1:38:14 PM
Chapman & Hall/CRC Mathematical and Computational Imaging Sciences Series Editors
Chandrajit Bajaj
Guillermo Sapiro
Center for Computational Visualization The University of Texas at Austin
Department of Electrical and Computer Engineering University of Minnesota
Aims and Scope This series aims to capture new developments and summarize what is known over the whole spectrum of mathematical and computational imaging sciences. It seeks to encourage the integration of mathematical, statistical and computational methods in image acquisition and processing by publishing a broad range of textbooks, reference works and handbooks. The titles included in the series are meant to appeal to students, researchers and professionals in the mathematical, statistical and computational sciences, application areas, as well as interdisciplinary researchers involved in the field. The inclusion of concrete examples and applications, and programming code and examples, is highly encouraged.
Proposals for the series should be submitted to the series editors above or directly to: CRC Press, Taylor & Francis Group 4th, Floor, Albert House 1-4 Singer Street London EC2A 4BQ UK
K10185_FM.indd 2
3/29/10 1:38:14 PM
Chapman & Hall/CRC Mathematical and Computational Imaging Sciences
Rough Fuzzy Image Analysis
Foundations and Methodologies
Edited by
Sankar K. Pal James F. Peters
K10185_FM.indd 3
3/29/10 1:38:14 PM
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor and Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number: 978-1-4398-0329-5 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data Rough fuzzy image analysis : foundations and methodologies / editors, Sankar K. Pal, James F. Peters. p. cm. “A CRC title.” Includes bibliographical references and index. ISBN 978-1-4398-0329-5 (hardcover : alk. paper) 1. Image analysis. 2. Fuzzy sets. I. Pal, Sankar K. II. Peters, James F. III. Title. TA1637.R68 2010 621.36’7--dc22
2009053741
Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
K10185_FM.indd 4
3/29/10 1:38:14 PM
Preface This book introduces the foundations and applications in the state-of-art of roughfuzzy image analysis. Fuzzy sets* and rough sets** as well as a generalization of rough sets called near sets*** provide important as well as useful stepping stones in various approaches to image analysis that are given in the chapters of this book. These three types of sets and various hybridizations provide powerful frameworks for image analysis. Image analysis focuses on the extraction of meaningful information from digital images. This subject has its roots in studies of space and the senses by J.H. Poincar´e during the early 1900s, studies of visual perception and the topology of the brain by E.C. Zeeman and picture processing by A.P. Rosenfeld**** . The basic picture processing approach pioneered by A.P. Rosenfeld was to extract meaningful patterns in given digital images representing real scenes as opposed to images synthesized by the computer. Underlying picture processing is an interest in filtering a picture to detect given patterns embedded in digital images and approximating a given image with simpler, similar images with lower information content (this, of course, is at the heart of the near set-based approach to image analysis). This book calls attention to the utility that fuzzy sets, near sets and rough sets have in image analysis. One of the earliest fuzzy set-based image analysis studies was published in 1982 by S.K. Pal***** . The spectrum of fuzzy set-oriented image analysis studies includes edge ambiguity, scene analysis, image enhancement using smoothing, image description, motion frame analysis, medical imaging, remote sensing, thresholding and image frame analysis. The application of rough sets in image analysis was launched in a seminal paper published in 1993 by A. Mr´ ozek and L. Plonka****** . Near sets are a recent generalization of rough sets that have proven to be useful in image analysis and pattern
* See, e.g., Zadeh, L.A., Fuzzy sets. Information and Control (1965), 8 (3) 338-353; Zadeh, L.A., Toward a theory of fuzzy granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems 90 (1997), 111-127. See, also, Rosenfeld, A., Fuzzy digital topology, in Bezdek, J.C., Pal, S.K., Eds., Fuzzy Models for Pattern Recognition, IEEE Press, 1991, 331-339; Banerjee, M., Kundu, M.K., Maji, P., Content-based image retrieval using visually significant point features, Fuzzy Sets and Systems 160, 1 (2009), 3323-3341; http://en.wikipedia.org/wiki/Fuzzy set ** See, e.g., Peters, J.F., Skowron, A.: Zdzislaw Pawlak: Life and Work, Transactions on Rough Sets V, (2006), 1-24; Pawlak, Z., Skowron, A.: Rudiments of rough sets, Information Sciences 177 (2007) 3-27; Pawlak, Z., Skowron, A.: Rough sets: Some extensions, Information Sciences 177 (2007) 28-40; Pawlak, Z., Skowron, A.: Rough sets and Boolean reasoning, Information Sciences 177 (2007) 41-73.; http://en.wikipedia.org/wiki/Rough set *** See, e.g., Peters, J.F., Puzio, L., Image analysis with anisotropic wavelet-based nearness measures, Int. J. of Computational Intelligence Systems 79, 3-4 (2009), 1-17; Peters, J.F., Wasilewski, P., Foundations of near sets, Information Sciences 179, 2009, 3091-3109; http://en.wikipedia.org/wiki/Near sets. See, also, http://wren.ee.umanitoba.ca **** See, e.g., Rosenfeld, A.P., Picture processing by computer, ACM Computing Surveys 1, 3 (1969), 147-176 ***** Pal, S.K., A note on the quantitative measure of image enhancement through fuzziness, IEEE Trans. on Pat. Anal. & Machine Intelligence 4, 2 (1982), 204-208. ****** Mr´ ozek, A., Plonka, L., Rough sets in image analysis, Foundations of Computing and Decision Sciences 18, 3-4 (1993), 268-273.
0-2 recognition******* . This volume fully reflects the diversity and richness of rough fuzzy image analysis both in terms of its underlying set theories as well as its diverse methods and applications. From the lead chapter by J.F. Peters and S.K. Pal, it can be observed that fuzzy sets, near sets and rough sets are, in fact, instances of different incarnations of Cantor sets. These three types of Cantor sets provide a foundation for what A. Rosenfeld points to as the stages in pictorial pattern recognition, i.e., image transformation, feature extraction and classification. The chapters by P. Maji and S.K. Pal on rough-fuzzy clustering, D. Malyszko and J. Stepaniuk on rough-fuzzy measures, and by A.E. Hassanien, H. Al-Qaheri, A. Abraham on rough-fuzzy clustering for segmentation point to the utility of hybrid approaches that combine fuzzy sets and rough sets in image analysis. The chapters by D. Sen, S.K. Pal on rough set-based image thresholding, H. Fashandi, J.F. Peters on rough set-based mathematical morphology as well as an image partition topology and M.M. Mushrif, A.K. Ray on image segmentation illustrate how image analysis can be carried out with rough sets by themselves. Tolerance spaces and a perceptual approach in image analysis can be found in the chapters by C. Henry, A.H. Meghdadi, J.F. Peters, S. Shahfar, and S. Ramanna (these papers carry forward the work on visual perception by J.H. Poincar´e and E.C. Zeeman). A rich harvest of applications of rough fuzzy image analysis can be found in the chapters by A.E. Hassanien, H. Al-Qaheri, A. Abraham, W. Tarnawski, G. Schaefer, T. Nakashima, L. Miroslaw, C. Henry, S. Shahfar, A.H. Meghdadi and S. Ramanna. Finally, a complete, downloadable implementation of near sets in image analysis called NEAR is presented by C. Henry. The Editors of this volume extend their profound gratitude to the many reviewers for their generosity and many helpful comments concerning the chapters in this volume. Every chapter was extensively reviewed and revised before final acceptance. We also received many helpful suggestions from the reveiwers of the original proposal for this CRC Press book. In addition, we are very grateful for the help that we have received from S. Kumar, A. Rodriguez, R.B. Stern, S.K. White, J. Vakili and others at CRC Press during the preparation of this volume. The editors of this volume have been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) research grant 185986, Manitoba Centre of Excellence Fund (MCEF) grant, Canadian Network of Excellence (NCE) and Canadian Arthritis Network (CAN) grant SRI-BIO-05, and the J.C. Bose Fellowship of the Government of India. March 2010
******* See,e.g.,
Sankar K. Pal James F. Peters
Gupta, S., Patnik, S., Enhancing performance of face recognition by using the near set approach for selecting facial features, J. Theor. Appl. Inform. Technol. 4, 5 (2008), 433-441; Henry, C., Peters, J.F., Perception-based image analysis, Int. J. Bio-Inspired Comp. 2, 2 (2009), in press; Peters, J.F., Tolerance near sets and image correspondence, Int. J. of Bio-Inspired Computation 1(4) (2009), 239-245; Peters, J.F., Corrigenda and addenda: Tolerance near sets and image correspondence, Int. J. of Bio-Inspired Computation 2(5) (2010), in press; Ramanna, S., Perceptually near Pawlak partitions, Transactions on Rough Sets XII, 2010, in press, Ramanna, S., Meghdadi, A., Measuring resemblances between swarm behaviours: A perceptual tolerance near set approach, Fundamenta Informaticae 95(4), 2009, 533-552.
0-3
Table of Contents 1 Cantor, Fuzzy, Near, and Rough Sets in Image Analysis James F. Peters and Sankar K. Pal . . . . . . . . . . . . . . . . . . . . .
1-1
2 Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images Pradipta Maji and Sankar K. Pal . . . . . . . . . . . . . . . . . . . . . . . 2-1 3 Image Thresholding using Generalized Rough Sets Debashis Sen and Sankar K. Pal . . . . . . . . . . . . . . . . . . . . . . .
3-1
4 Mathematical Morphology and Rough Sets Homa Fashandi and James F. Peters . . . . . . . . . . . . . . . . . . . .
4-1
5 Rough Hybrid Scheme: An application of breast cancer imaging Aboul Ella Hassanien, Hameed Al-Qaheri, Ajith Abraham . . . . . . . . .
5-1
6 Applications of Fuzzy Rule-based Systems in Medical Image Understanding Wojciech Tarnawski, Gerald Schaefer, Tomoharu Nakashima and Lukasz Miroslaw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 7 Near Set Evaluation And Recognition (NEAR) System Christopher Henry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-1
8 Perceptual Systems Approach to Measuring Image Resemblance Amir H. Meghdadi and James F. Peters . . . . . . . . . . . . . . . . . . .
8-1
9 From Tolerance Near Sets to Perceptual Image Analysis Shabnam Shahfar, Amir H. Meghdadi and James F. Peters . . . . . . . .
9-1
10 Image Segmentation: A Rough-set Theoretic Approach Milind M. Mushrif and Ajoy K. Ray . . . . . . . . . . . . . . . . . . . .
10-1
11 Rough Fuzzy Measures in Image Segmentation and Analysis Dariusz Malyszko and Jaroslaw Stepaniuk . . . . . . . . . . . . . . . . .
11-1
12 Discovering Image Similarities. Tolerance Near Set Approach Sheela Ramanna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12-1
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I-1
1 Cantor, Fuzzy, Near, and Rough Sets in Image Analysis
James F. Peters Computational Intelligence Laboratory, Electrical & Computer Engineering, Rm. E2-390 EITC Bldg., 75A Chancellor’s Circle, University of Manitoba, Winnipeg R3T 5V6 Manitoba Canada
Sankar K. Pal Machine Intelligence Unit, Indian Statistical Institute,Kolkata, 700 108, India
1.1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Cantor Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Near Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1–1 1–2 1–2
1.4 Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1–8
1.5 Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1–9
Near Sets and Rough Sets • Basic Near Set Approach Near Sets, Psychophysics and Merleau-Ponty • Visual Acuity Tolerance • Sets of Similar Images • Tolerance Near Sets • Near Sets in Image Analysis
•
Notion of a Fuzzy Set • Near Fuzzy Sets • Fuzzy Sets in Image Analysis Sample Non-Rough Set • Sample Rough Set • Rough Sets in Image Analysis
1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–11 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–11 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–12
Introduction
The chapters in this book consider how one might utilize fuzzy sets, near sets, and rough sets, taken separately or taken together in hybridizations, in solving a variety of problems in image analysis. A brief consideration of Cantor sets (Cantor, 1883, 1932) provides a backdrop for an understanding of several recent types of sets useful in image analysis. Fuzzy, near and rough sets provide a wide spectrum of practical solutions to solving image analysis problems such as image understanding, image pattern recognition, image retrieval and image correspondence, mathematical morphology, perceptual tolerance relations in image analysis and segmentation evaluation. Fuzzy sets result from the introduction of a membership function that generalizes the traditional characteristic function. The notion of a fuzzy set was introduced by L. Zadeh in 1965 (Zadeh, 1965). Sixteen years later, rough sets were introduced by Z. Pawlak in 1981 (Pawlak, 1981a). A set is considered rough whenever the boundary between its lower and upper approximation is non-empty. Of the three forms of sets, near sets are newest, introduced in 2007 by J.F. Peters in a perception-based approach to the study of the nearness of observable objects in a physical continuum (Peters and Henry, 2006; Peters, 2007c,a; Peters, Skowron, and Stepaniuk, 2007; Henry and Peters, 2009b; Peters and Wasilewski, 2009; Peters, 2010). This chapter highlights a context for three forms of sets that are now part of the computational intelligence spectrum of tools useful in image analysis and pattern recognition. The principal con1–1
Rough Fuzzy Image Analysis
1–2
tribution of this chapter is an overview of the high utility of fuzzy sets, near sets and rough sets with the emphasis on how these sets can be used in image analysis, especially in classifying parts of digital images presented in this book.
1.2
Cantor Set
To establish a context for the various sets utilized in this book, this section briefly presents the notion of a Cantor set. From the definition of a Cantor set, it is pointed out that fuzzy sets, near sets and rough sets are special forms of Cantor sets. In addition, this chapter points to links between the three types of sets that are part of the computational intelligence spectrum. Probe functions in near set theory provide a link between fuzzy sets and near sets, since every fuzzy membership function is a particular form of probe function. Probe functions are real-valued functions introduced by M. Pavel in 1993 as part of a study of image registration and a topology of images (Pavel, 1993). Z. Pawlak originally thought of a rough set as a new form of fuzzy set (Pawlak, 1981a). It has been shown that every rough set is a near set (this is Theorem 4.8 in (Peters, 2007b)) but not every near set is a rough set. For this reason, near sets are considered a generalization of rough sets. The contribution of this chapter is an overview of the links between fuzzy sets, near sets and rough sets as well as the relation between these sets and the original notion of a set introduced by Cantor in 1883 (Cantor, 1883). By a ‘manifold’ or ‘set’ I understand any multiplicity, which can be thought of as one, i.e., any aggregate [inbegri f f ] of determinate elements which, can be united into a whole by some law. –Foundations of a General Theory of Manifolds, –G. Cantor, 1883.
. . . A set is formed by the grouping together of single objects into a whole. –Set Theory –F. Hausdorff, 1914.
In this mature interpretation of the notion of a set, G. Cantor points to a property or law that determines elementhood in a set and “unites [the elements] into a whole” (Cantor, 1883), elaborated in (Cantor, 1932), and commented on in Lavine (1994). In 1851, Bolzano (Bolzano, 1959) writes that “an aggregate so conceived that is indifferent to the arrangement of its members I call a set”. At that time, the idea that a set could contain just one element or no elements (null set) was not contemplated. This is important in the current conception of a near set, since such a set must contain pairs of perceptual objects with similar descriptions and such a set is never null. That is, a set is a perceptual near set if, and only if it is never empty and it contains pairs of perceived objects that have descriptions that are within some tolerance of each other (see Def. 2).
1.3
Near Sets How Near How near to the bark of a tree are drifting snowflakes, swirling gently round, down from winter skies? How near to the ground are icicles,
Cantor, Fuzzy, Near, and Rough Sets in Image Analysis
1–3
slowly forming on window ledges? –Fragment of a Philosophical Poem. –Z. Pawlak & J.F. Peters, 2002.
The basic idea in the near set approach to object recognition is to compare object descriptions. Sets of objects X,Y are considered near each other if the sets contain objects with at least partial matching descriptions. –Near sets. General theory about nearness of objects, –J.F. Peters, 2007.
TABLE 1.1
Nomenclature
Symbol O, X,Y F, B φi (x) φ B (x) ε · 2 ∼ =B,ε ∼ =B A⊂ ∼ =B,ε C∼ =B,ε X B,ε Y
Interpretation Set of perceptual objects, X,Y ⊆ O, A ⊂ X, x ∈ X, y ∈ Y , Sets of probe functions, B ⊆ F, φi ∈ B, φi : X → ℜ, ith probe function representing feature of x, (φ1 (x), φ2 (x), . . . , φi (x), . . . , φk (x)),description of x of length k, ε ∈ ℜ (reals) such that ε ≥ 0, 1 = (∑ki=1 (·i )2 ) 2 , L2 (Euclidean) norm, {(x, y) ∈ O × O : φ (x) − φ (y) 2 ≤ ε }, tolerance relation, shorthand for ∼ =B,ε , ∼ ∀x, y ∈ A, x ∼ =B,ε y (i.e., A∼ =B,ε is a preclass in =B,ε ), ∼ tolerance class, maximal preclass of =B,ε , X resembles (is near) Y ⇐⇒ X ∼ =B,ε Y .
Set Theory Law 1 Near Sets Near sets contain elements with similar descriptions. Near sets are disjoint sets that resemble each other (Henry and Peters, 2010). Resemblance between disjoint sets occurs whenever there are observable similarities between the objects in the sets. Similarity is determined by comparing lists of object feature values. Each list of feature values defines an object’s description. Comparison of object descriptions provides a basis for determining the extent that disjoint sets resemble each other. Objects that are perceived as similar based on their descriptions are grouped together. These groups of similar objects can provide information and reveal patterns about objects of interest in the disjoint sets. For example, collections of digital images viewed as disjoint sets of points provide a rich hunting ground for near sets. For example, near sets can be found in the favite pentagona coral fragment in Fig. 1.1a from coral reef near Japan. If we consider the greyscale level, the sets X,Y in Fig. 1.1b are near sets, since there are many pixels in X with grey levels that are very similar to pixels in Y .
1.3.1
Near Sets and Rough Sets
Near sets are a generalization of rough sets. It has been shown that every rough set is, in fact, a near set but not every near set is a rough set Peters (2007b). Near set theory originated from an
Rough Fuzzy Image Analysis
1–4
(1.1a) favite coral
(1.1b) near sets
FIGURE 1.1: Sample Near Sets
interest in comparing similarities between digital images. Unlike rough sets, the near set approach does not require set approximation Peters and Wasilewski (2009). Simple examples of near sets can sometimes be found in tolerance classes in pairs of image coverings, if, for instance, a subimage of a class in one image has a description that is similar to the description of a subimage in a class in the second image. In general, near sets are discovered by discerning objects–either within a single set or across sets–with descriptions that are similar. From the beginning, the near set approach to perception has had direct links to rough sets in its approach to the perception of objects (Pawlak, 1981a; Orłowska, 1982) and the classification of objects (Pawlak, 1981a; Pawlak and Skowron, 2007c,b,a). This is evident in the early work on nearness of objects and the extension of the approximation space model (see, e.g., (Peters and Henry, 2006; Peters et al., 2007)). Unlike the focus on the approximation boundary of a set, the study of near sets focuses on the discovery of affinities between perceptual granules such as digital images viewed as sets of points. In the context of near sets, the term affinity means close relationship between perceptual granules (particularly images) based on common description. Affinities are discovered by comparing the descriptions of perceptual granules, e.g., descriptions of objects contained in classes found in coverings defined by the tolerance relation ∼ =F,ε .
1.3.2
Basic Near Set Approach
Near set theory provides methods that can be used to extract resemblance information from objects contained in disjoint sets, i.e., it provides a formal basis for the observation, comparison, and classification of objects. The discovery of near sets begins with choosing the appropriate method to describe observed objects. This is accomplished by the selection of probe functions representing observable object features. A basic model for a probe function was introduced by M. Pavel (Pavel, 1993) in the text of image registration and image classification. In near set theory, a probe function is a mapping from an object to a real number representing an observable feature value (Peters, 2007a). For example, when comparing fruit such as apples, the redness of an apple (observed object) can be described by a probe function representing colour, and the output of the probe function is a number representing the degree of redness. Probe functions provide a basis for describing and discerning affinities between objects as well as between groups of similar objects (Peters and Ramanna, 2009). Objects that have, in some degree, affinities are considered near each other. Similarly, groups of objects (i.e. sets) that have, in some degree, affinities are also considered near each other.
1.3.3
Near Sets, Psychophysics and Merleau-Ponty
Cantor, Fuzzy, Near, and Rough Sets in Image Analysis
1–5
Near sets offer an ideal framework for solving problems based on human perception that arise in areas such as image processing, computer vision as well as engineering and science problems. In near set theory, perception is a combination of the view of perception in psychophysics (Hoogs, Collins, Kaucic, and Mundy, 2003; Bourbakis, 2002) with a view of perception found in MerleauPonty’s work (Merleau-Ponty, 1945, 1965). In the context of psychophysics, perception of an object (i.e., in effect, our knowledge about an object) depends on sense inputs that are the source of signal values (stimularions) in the cortex of the brain. In this view of perception, the transmissions of sensory inputs to cortex cells senses are likened to probe functions defined in terms of mappings of sets of sensed objects to sets of real-values representing signal values (the magnitude of each cortex signal value represents a sensation) that are a source of object feature values assimilated by the mind. Perception in animals is modelled as a mapping from sensory cells to brain cells. For example, visual perception is modelled as a mapping from stimulated retina sensory cells to visual cortex cells (see Fig. 1.2). Such mappings are called probe functions. A probe measures observable physical characteristics of objects in our environment. In other words, a probe function provides a basis for what is commonly called feature extraction (Guyon, Gunn, Nikravesh, and Zadeh, 2006). The sensed physical characteristics of an object are identified with object features. The term feature is used in S. Watanabe’s sense of the word (Watanabe, 1985), i.e., a feature corresponds to an observable property of physical objects. Each feature has a 1-to-many relationship to real-valued functions called probe functions representing the feature. For each feature (such as colour) one or more probe functions can be introduced to represent the feature (such as grayscale, or RGB values). Objects and sets of probe functions form the basis of near set theory and are sometimes referred to as perceptual objects due to the focus on assigning values to perceived object features. Axiom 1 An object is perceivable if, and only if the object is describable. In Merleau-Ponty’s view (Merleau-Ponty, 1945, 1965), an object is perceived to the extent that it can be described. In other words, object description goes hand-in-hand with object perception. It is our mind that identifies relationships between object descriptions to form perceptions of sensed objects. It is also the case that near set theory has been proven to be quite successful in finding solutions to perceptual problems such as measuring image correspondence and segmentation evaluation. The notion of a sensation in Poincar´e (Poincar´e, 1902) and a physical model for a probe
FIGURE 1.2: Sample Visual Perception
function from near set theory (Peters and Wasilewski, 2009; Peters, 2010) is implicitly explained by Zeeman (Zeeman, 1962) in terms of visual perception. That is, ‘seeing’ consists of mappings from
1–6
Rough Fuzzy Image Analysis
sense inputs from sensory units in the retina of the eye to cortex cells of the brain stimulated by sense inputs. A sense input can be represented by a number representing the intensity of the light from the visual field (i.e., everything in the physical world that causes light to fall on the retina.) impacting on the retina. The intensity of light from the visual field will determine the level of stimulation of a cortex cell from retina sensory input. Over time, varying cortex cell stimulation has the appearance of an electrical signal. The magnitude of cortex cell stimulation is a real-value. The combination of an activated sensory cell in the retina and resulting retina-originated impulses sent to cortex cells (visual stimulation) is likened to what Poincar´e calls a sensation in his essay on separate sets of similar sensations leading to a perception of a physical continuum (Poincar´e, 1902). This model for a sensation underlies what is known as a probe function in near set theory (Peters, 2007b; Peters and Wasilewski, 2009). DEFINITION 1.1
Visual Probe Function Let O = {perceptual objects}. A perceptual object is something in the visual field that is a source of reflected light. Let ℜ denote the set of reals. Then a probe φ is a mapping φ : X → ℜ. For x ∈ X, φ (x) denotes an amplitude in a visual perception (see, e.g., Fig. 1.2). In effect, a probe function value φ (x) measures the strength of a feature value extracted from each sensation. In Poincar´e, sets of sensations are grouped together because they are, in some sense, similar within a specified distance, i.e., tolerance. Implicit in this idea in Poincar´e is the perceived feature value of a particular sensation that makes it possible for us to measure the closeness of an individual senation to other sensations. A human sensation modelled as a probe measures observable physical characteristics of objects in our environment. The sensed physical characteristics of an object are identified with object features. In Merleau-Ponty’s view, an object is perceived to the extent that it can be described (Merleau-Ponty, 1945, 1965). In other words, object description goes hand-in-hand with object perception. It is our mind that identifies relationships between object descriptions to form perceptions of sensed objects. It is also the case that near set theory has been proven to be quite successful in finding solutions to perceptual problems such as measuring image correspondence and segmentation evaluation. Axiom 2 Formulate object description to achieve object perception. In a more recent interpretation of the notion of a near set, the nearness of sets is considered in the context of perceptual systems (Peters and Wasilewski, 2009). Poincar´e’s idea of perception of objects such as digital images in a physical continuum can be represented by means of perceptual systems, which is akin to but not the same as what has been called a perceptual information system (Peters and Wasilewski, 2009; Peters, 2010). A perceptual system is a pair O, F where O is a non-empty set of perceptual objects and F is a non-empty, countable set of probe functions (see Def. 1). Definition 1 Perceptual System (Peters, 2010) A perceptual system O, F consists of a sample space O containing a finite, non-empty set of sensed sample objects and a non-empty, countable set F containing probe functions representing object features. The perception of physical objects and their description within a perceptual system facilitates pattern recognition and the discovery of sets of similar objects. In the near set approach to image analysis, one starts by identifying a perceptual system and the defining a cover on the sample space with an appropriate perceptual tolerance relation. Method 1 Perceptual Tolerance
Cantor, Fuzzy, Near, and Rough Sets in Image Analysis
1–7
1. identify a sample space O and a set F to formulate a perceptual system O, F, and then 2. introduce a tolerance relation τε that defines a cover on O.
1.3.4
Visual Acuity Tolerance
Zeeman (Zeeman, 1962) introduces a tolerance space (X, τε ), where X is the visual field of the right eye and ε is the least angular distance so that all points indistinguishable from x ∈ X are within ε of x. In this case, there is an implicit perceptual system O, F, where O := X consists of points that are sources of reflected light in the visual field and F contains probes used to extract feature values from each x ∈ O.
1.3.5
Sets of Similar Images
Consider O, F, where O consists of points representing image pixels and F contains probes used to extract feature values from each x ∈ O. Let B ⊆ F. Then introduce tolerance relation ∼ =B,ε to define a covers on X,Y ⊂ O. Then, in the case where X,Y resemble each other, i.e., X B,ε Y , then measure the degree of similarity (nearness) of X,Y (a publicly available toolset that makes it possible to complete this example for any set of digital images is available at (Henry and Peters, 2010, 2009a)). See Table 1.1 (also, (Peters and Wasilewski, 2009; Peters, 2009b, 2010)) for details about the bowtie notation B,ε used to denote resemblance between X and Y , i.e., X B,ε Y
(1.3a) Lena
(1.3b) Lena TNS
FIGURE 1.3: Lena Tolerance Near Sets (TNS)
1.3.6
Tolerance Near Sets
In near set theory, the trivial case is excluded. That is, an element x ∈ X is not considered near itself. In addition, the empty set is excluded from near sets, since the empty set is never something that we perceive, i.e., a set of perceived objects is never empty. In the case where one set X is near another set Y , this leads to the realization that there is a third set containing pairs of elements x, y ∈ X × Y with similar descriptions. The key to an understanding of near sets is the notion of a description. The description of each perceived object is specified a vector of feature values and each feature is
Rough Fuzzy Image Analysis
1–8
(1.4a) Photographer
(1.4b) Photographer TNS
FIGURE 1.4: Photographer Tolerance Near Sets
represented by what is known as a probe function that maps an object to a real value. Since our main interest is in detecting similarities between seemingly quite disjoint sets such as subimages in an image or pairs of classes in coverings on a pair of images, a near set is defined in context of a tolerance space. Definition 2 Tolerance Near Sets (Peters, 2010) Let O, F be a perceptual system. Put ε ∈ ℜ, B ⊂ F. Let X,Y ⊂ O denote disjoint sets with coverings determined by a tolerance relation ∼ =B,ε . Sets X,Y are tolerance near sets if, and only if there are preclasses A ⊂ X, B ⊂ Y such that A B,ε B.
1.3.7
Near Sets in Image Analysis
The subimages in Fig. 1.3b and Fig. 1.4b delineate tolerance classes (each with its own grey level) subregions of the original images in Fig. 1.3a and Fig. 1.4a. The tolerance classes in these images are dominated by (light grey), (medium grey) and (dark grey) subimages along with a few (very dark) subimages in Fig. 1.3b and many very dark subimages in Fig. 1.4b. From Def. 2, it can be observed that the images in Fig. 1.3a and Fig. 1.4a are examples of tolerance near sets, i.e., ImageFig. 1.4a F,ε ImageFig. 1.3a ). Examples of the near set approach to image analysis can be found in, e.g., (Henry and Peters, 2007, 475-482, 2008, 1-6, 2009a; Gupta and Patnaik, 2008; Peters, 2009a,b, 2010; Peters and Wasilewski, 2009; Peters and Puzio, 2009; Hassanien, Abraham, Peters, Schaefer, and Henry, 2009; Meghdadi, Peters, and Ramanna, 2009; Fashandi, Peters, and Ramanna, 2009) and in a number of chapters of this book. From set composition Law 1, near sets are Cantor sets containing one or more pairs of objects (e.g., image patches, one from each digital image) that resemble each other as enunciated in Def. 2, i.e., X, T ⊂ O are near sets if, and only if X F,ε Y ).
1.4
Fuzzy Sets A fuzzy set is a class of objects with a continuum of grades of membership. –Fuzzy sets, Information and Control 8 –L.A. Zadeh, 1965.
Cantor, Fuzzy, Near, and Rough Sets in Image Analysis
1–9
. . . A fuzzy set is characterized by a membership function which assigns to each object its grade of membership (a number lying between 0 and 1) in the fuzzy set. –A new view of system theory –L.A. Zadeh, 20-21 April 1965.
Set Theory Law 2 Fuzzy Sets Every element in a fuzzy set has a graded membership.
1.4.1
Notion of a Fuzzy Set
The notion of a fuzzy set was introduced by L.A. Zadeh in 1965 (Zadeh, 1965). In effect, a Cantor set is a fuzzy set if, and only if every element of the set has a grade of membership assigned to it by a specified membership function. Notice that a membership function φ : X → [0, 1] is a special case of what is known as a probe function in near set theory.
1.4.2
Near Fuzzy Sets
A fuzzy set X is a near set relative to a set Y if the grade of membership of the objects in sets X,Y is assigned to each object by the same membership function φ and there is a least one pair of objects x, y ∈ X ×Y such that φ (x) − φ (y) 2 ≤ ε }, i.e., the description of x is similar to the description y within some ε .
1.4.3
Fuzzy Sets in Image Analysis
Fuzzy sets have widely used in image analysis (see, e.g., (Rosenfeld, 1979; Pal and King, 1980, 1981; Pal, 1982; Pal, King, and Hashim, 1983; Pal, 1986, 1992; Pal and Leigh, 1995; Pal and Mitra, 1996; Nachtegael and Kerre, 2001; Deng and Heijmans, 2002; Martino, Sessa, and Nobuhara, 2008; Sussner and Valle, 2008; Hassanien et al., 2009)). In the notion of fuzzy sets, (Pal and King, 1980, 1981) defined an image of M × N dimension and L levels as an array of fuzzy singletons, each with a value of membership function denoting the degree of having brightness or some property relative to some brightness level l, where l = 0, 1, 2, . . . , L − 1. The literature on fuzzy image analysis is based on the realization that the basic concepts of edge, boundary, region, relation in an image do not lend themselves to precise definition. From set composition Law 2, it can be observed that fuzzy sets are Cantor sets.
1.5
Rough Sets A new approach to classification,based on information systems theory, given in this paper. . . . This approach leads to a new formulation of the notion of fuzzy sets (called here the rough sets). The axioms for such sets are given, which are the same as the axioms of topological closure and interior. –Classification of objects by means of attributes. –Z. Pawlak, 1981.
Rough Fuzzy Image Analysis
1–10 TABLE 1.2
Pawlak Indiscernibility Relation and Partition Symbols
Symbol Interpretation ∼B x/∼B U/∼B B∗ (X)
= {(x, y) ∈ X × X | f (x) = f (y) ∀ f ∈ B}, indiscernibility, cf. (Pawlak, 1981a), x/∼B = {y ∈ X | y ∼B x}, elementary set (class), U/∼B = {x/∼B | x ∈ U}, quotient set. B∗ (X) = x/∼ (lower approximation of X),
B ∗ (X)
B ∗ (X)
x/∼ ⊆X
=
B
x/∼ ∩X=0/
B
x/∼ (upper approximation of X). B
B
Set Theory Law 3 Rough Sets Any non-empty set X is a rough set if, and only if the approximation boundary of X is not empty. Rough sets were introduced by Z. Pawlak in (Pawlak, 1981a) and elaborated in (Pawlak, 1981b; Pawlak and Skowron, 2007c,b,a). In a rough set approach to classifying sets of objects X, one considers the size of the boundary region in the approximation of X. By contrast, in a near set approach to classification, one does not consider the boundary region of a set. In particular, assume that X is a non-empty set belonging to a universe U and that F is a set of features defined either by total or partial functions. The lower approximation of X relative to B ⊆ F is denoted by B∗ (X) and the upper approximation of X is denoted by B ∗ (X), where B∗ (X) =
x/∼ ⊆X
x/∼ , B
B
B ∗ (X) =
x/∼ ∩X=0/
x/∼ . B
B
The B-boundary region of an approximation of a set X is denoted by BndB (X), where / B∗ (X)}. BndB (X) = B ∗ (X) \ B∗ (X) = {x | x ∈ B ∗ (X) and x ∈ Definition 3 Rough Set (Pawlak, 1981a) A non-empty, finite set X is a rough set if, and only if |B ∗ (X) − B∗ (X)| = 0. A set X is roughly classified whenever BndB (X) is not empty. In other words, X is a rough set / In sum, a rough set is a Cantor set if, and only if its whenever the boundary region BndB (X) = 0. approximation boundary is non-empty. It should also be noted that rough sets differ from near sets, since near sets are defined without reference to an approximation boundary region. This means, for example, with near sets the image correspondence problem can be solved without resorting to set approximation. Method 2 Rough Set Approach 1. Let (U, B) denote a sample space (universe) U and set of object features B, 2. Using relation ∼B , partition the universe U, 3. Determine the size of the boundary of a set X.
1.5.1
Sample Non-Rough Set
Let x ∈ U. x/∼B (any elementary set) is a non-rough set.
Cantor, Fuzzy, Near, and Rough Sets in Image Analysis
1.5.2
1–11
Sample Rough Set
Any set X ⊂ U where
x/∼ ⊆U/∼ B B
x/∼ = X. B
In other words, if a set X does not equal its lower approximation, then the set X is rough, i.e., roughly approximated by the equivalence classes in the quotient set U/∼ . B
1.5.3
Rough Sets in Image Analysis The essence of our approach consists in viewing a digitized image as a universe of a certain information system and synthesizing an indiscernibility relation to identify objects and measure some of their parameters. – Adam Mrozek and Leszek Plonka, 1993.
In terms of rough sets and image analysis, it can be observed that A. Mr´ozek and L. Plonka were pioneers (Mr´ozek and Plonka, 1993). For example, he was one of the first to introduce a rough set approach to image analysis and to view a digital image as a universe viewed as a set of points. The features of pixels (points) in a digital image are a source of knowledge discovery. Using Z. Pawlak’s indiscernibility relation, it is then a straightforward task to partition an image and to consider set approximation relative to interesting objects contained in subsets of an image. This work on digital images by A. Mr´ozek and L. Plonka appeared six or more years before the publication of papers on approximate mathematical morphology by Lech Polkowski (Polkowski, 1999) (see, also, (Polkowski, 1993; Polkowski and Skowron, 1994)) and connections between mathematical morphology and rough sets pointed to by Isabelle Bloch (Bloch, 2000). The early work on the use of rough sets in image analysis has been followed by a number of articles by S.K. Pal and others (see, e.g., (Pal and Mitra, 2002; Pal, UmaShankar, and Mitra, 2005; Peters and Borkowski, 2004; Borkowski and Peters, 2006; Borkowski, 2007; Maji and Pal, 2008; Mushrif and Ray, 2008; Sen and Pal, 2009)). From set composition Law 3, it can be observed that rough sets are Cantor sets.
1.6
Conclusion
In sum, fuzzy sets, near sets and rough sets are particular forms of Cantor sets. In addition, each of these sets in the computational intelligence spectrum offer very useful approaches in image analysis, especially in classifying objects.
Acknowledgements This research by James Peters has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) grant 185986, Manitoba Centre of Excellence Fund (MCEF) grant, Canadian Centre of Excellence (NCE) and Canadian Arthritis Network grant SRI-BIO-05, and Manitoba Hydro grant T277 and that of Sankar Pal has been supported by the J.C. Bose Fellowship of the Govt. of India.
1–12
Rough Fuzzy Image Analysis
Bibliography Bloch, L. 2000. On links between mathematical morphology and rough sets. Pattern Recognition 33(9):1487–1496. Bolzano, B. 1959. Paradoxien des unendlichen (paradoxes of the infinite), trans. by d.a. steele. London: Routledge and Kegan Paul. Borkowski, M. 2007. 2d to 3d conversion with direct geometrical search and approximation spaces. Ph.D. thesis, Dept. Elec. Comp. Engg. http://wren.ee.umanitoba.ca/. Borkowski, M., and J.F. Peters. 2006. Matching 2d image segments with genetic algorithms and approximation spaces. Transactions on Rough Sets V(LNAI 4100):63–101. Bourbakis, N. G. 2002. Emulating human visual perception for measuring difference in images using an spn graph approach. IEEE Transactions on Systems, Man, and Cybernetics, Part B 32(2):191–201. ¨ Cantor, G. 1883. Uber unendliche, lineare punktmannigfaltigkeiten. Mathematische Annalen 201:72–81. ———. 1932. Gesammelte abhandlungen mathematischen und philosophischen inhalts, ed. e. zermelo. Berlin: Springer. Deng, T.Q., and H.J.A.M. Heijmans. 2002. Grey-scale morphology based on fuzzy logic. J. Math. Imag. Vis. 16:155–171. Fashandi, H., J.F. Peters, and S. Ramanna. 2009. L2 norm length-based image similarity measures: Concrescence of image feature histogram distances. In Signal and image processing, int. assoc. of science & technology for development, 178–185. Honolulu, Hawaii. Gupta, S., and K.S. Patnaik. 2008. Enhancing performance of face recognition systems by using near set approach for selecting facial features. J. Theoretical and Applied Information Technology 4(5):433–441. Guyon, I., S. Gunn, M. Nikravesh, and L.A. Zadeh. 2006. Feature extraction. foundations and applications. Berlin: Springer. Hassanien, A.E., A. Abraham, J.F. Peters, G. Schaefer, and C. Henry. 2009. Rough sets and near sets in medical imaging: A review. IEEE Trans. Info. Tech. in Biomedicine 13(6): 955–968. Digital object identifier: 10.1109/TITB.2009.2017017. Henry, C., and J.F. Peters. 2007, 475-482. Image pattern recognition using approximation spaces and near sets. In Proc. 11th int. conf. on rough sets, fuzzy sets, data mining and granular computing (rsfdgrc 2007), joint rough set symposium (jrs 2007), lecture notes in artificial intelligence 4482. Heidelberg, Germany. ———. 2008, 1-6. Near set image segmentation quality index. In Geobia 2008 pixels, objects, intelligence. geographic object based image analysis for the 21st century. University of Calgary, Alberta. ———. 2009a. Near set evaluation and recognition (near) system. Tech. Rep., Computationa Intelligence Laboratory, University of Manitoba. UM CI Laboratory Technical Report No. TR-2009-015.
Cantor, Fuzzy, Near, and Rough Sets in Image Analysis ———. 2009b. Perception-based image analysis. Int. J. of Bio-Inspired Computation 2(2). in press. ———. 2010. Near sets. Wikipedia. http://en.wikipedia.org/wiki/Near sets. Hoogs, A., R. Collins, R. Kaucic, and J. Mundy. 2003. A common set of perceptual observables for grouping, figure-ground discrimination, and texture classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(4):458–474. Lavine, S. 1994. Understanding the infinite. Cambridge, MA: Harward University Press. Maji, P., and S.K. Pal. 2008. Maximum class separability for rough-fuzzy c-means based brain mr image segmentation. Transactions on Rough Sets IX, LNCS-5390:114134. Martino, F.D., S. Sessa, and H. Nobuhara. 2008. Eigen fuzzy sets and image information retrieval. In Handbook of granular computing, ed. W. Pedrycz, A. Skowron, and V. Kreinovich, 863–872. West Sussex, England: John Wiley & Sons, Ltd. Meghdadi, A.H., J.F. Peters, and S. Ramanna. 2009. Tolerance classes in measuring image resemblance. Intelligent Analysis of Images & Videos, KES 2009, Part II, Knowledge-Based and Intelligent Information and Engineering Systems, LNAI 5712 127–134. ISBN 978-364-04591-2, doi 10.1007/978-3-642-04592-9 16. Merleau-Ponty, Maurice. 1945, 1965. Phenomenology of perception. Paris and New York: Smith, Gallimard, Paris and Routledge & Kegan Paul. Trans. by Colin Smith. Mr´ozek, A., and L. Plonka. 1993. Rough sets in image analysis. Foundations of Computing and Decision Sciences 18(3-4):268–273. Mushrif, M., and A.K. Ray. 2008. Color image segmentation: Rough-set theoretic approach. Pattern Recognition Letters 29(4):483493. Nachtegael, M., and E.E. Kerre. 2001. Connections between binary, grayscale and fuzzy mathematical morphologies. Fuzzy Sets and Systems 124:73–85. Orłowska, E. 1982. Semantics of vague concepts. applications of rough sets. Polish Academy of Sciences 469. In G.Dorn, P. Weingartner (Eds.), Foundations of Logic and Linguistics. Problems and Solutions, Plenum Press, London/NY, 1985, 465-482. Pal, S.K. 1982. A note on the quantitative measure of image enhancement through fuzziness. IEEE Trans. Pattern Anal. Machine Intell. PAMI-4(2):204–208. ———. 1986. A measure of edge ambiguity using fuzzy sets. Pattern Recognition Letters 4(1):51–56. ———. 1992. Fuzziness, image information and scene analysis. In An introduction to fuzzy logic applications in intelligent systems, ed. R.R. Yager and L.A. Zadeh, 147–183. Dordrecht: Kluwer Academic Publishers. Pal, S.K., and R.A. King. 1980. Image enhancement with fuzzy set. Electronics Letters 16(10): 376–378. ———. 1981. Image enhancement using smoothing with fuzzy set. IEEE Trans. Syst. Man and Cyberns. SMC-11(7):495–501.
1–13
1–14
Rough Fuzzy Image Analysis
Pal, S.K., R.A. King, and A.A. Hashim. 1983. Image description and primitive extraction using fuzzy sets. IEEE Trans. Syst. Man and Cyberns. SMC-13(1):94–100. Pal, S.K., and A.B. Leigh. 1995. Motion frame analysis and scene abstraction: Discrimination ability of fuzziness measures. J. Intelligent & Fuzzy Systems 3:247–256. Pal, S.K., and P. Mitra. 2002. Multispectral image segmentation using rough set initialized em algorithm. IEEE Transactions on Geoscience and Remote Sensing 11:24952501. Pal, S.K., and S. Mitra. 1996. Noisy fingerprint classification using multi layered perceptron with fuzzy geometrical and textual features. Fuzzy Sets and Systems 80(2):121–132. Pal, S.K., B. UmaShankar, and P. Mitra. 2005. Granular computing, rough entropy and object extraction. Pattern Recognition Letters 26(16):401–416. Pavel, M. 1993. Fundamentals of pattern recognition. 2nd ed. N.Y., U.S.A.: Marcel Dekker, Inc. Pawlak, Z. 1981a. Classification of objects by means of attributes. Polish Academy of Sciences 429. ———. 1981b. Rough sets. International J. Comp. Inform. Science 11:341–356. Pawlak, Z., and A. Skowron. 2007a. Rough sets and boolean reasoning. Information Sciences 177:41–73. ———. 2007b. Rough sets: Some extensions. Information Sciences 177:28–40. ———. 2007c. Rudiments of rough sets. Information Sciences 177:3–27. Peters, J.F. 2007a. Near sets. general theory about nearness of objects. Applied Mathematical Sciences 1(53):2609–2029. ———. 2007b. Near sets. general theory about nearness of objects. Applied Mathematical Sciences 1(53):2609–2029. ———. 2007c. Near sets. special theory about nearness of objects. Fundamenta Informaticae 75(1-4):407–433. ———. 2009a. Discovering affinities between perceptual granules: L2 norm-based tolerance near preclass approach. In Man-machine interactions, advances in intelligent & soft computing 59, 43–55. The Beskids, Kocierz Pass, Poland. ———. 2009b. Tolerance near sets and image correspondence. Int. J. of Bio-Inspired Computation 1(4):239–445. ———. 2010. Corrigenda and addenda: Tolerance near sets and image correspondence. Int. J. Bio-Inspired Computation 2(5). in press. Peters, J.F., and M. Borkowski. 2004. K-means indiscernibility relation over pixels. In Lecture notes in computer science 3066, ed. S. Tsumoto, R. Slowinski, K. Komorowski, and J.W. Gryzmala-Busse, 580–585. Berlin: Springer. Doi 10.1007/b97961. Peters, J.F., and C. Henry. 2006. Reinforcement learning with approximation spaces. Fundamenta Informaticae 71:323–349.
Cantor, Fuzzy, Near, and Rough Sets in Image Analysis Peters, J.F., and L. Puzio. 2009. Image analysis with anisotropic wavelet-based nearness measures. International Journal of Computational Intelligence Systems 3(2):1–17. Peters, J.F., and S. Ramanna. 2009. Affinities between perceptual granules: Foundations and perspectives. In Human-centric information processing through granular modelling sci 182, ed. A. Bargiela and W. Pedrycz, 49–66. Berlin: Springer-Verlag. Peters, J.F., A. Skowron, and J. Stepaniuk. 2007. Nearness of objects: Extension of approximation space model. Fundamenta Informaticae 79(3-4):497–512. Peters, J.F., and P. Wasilewski. 2009. Foundations of near sets. Information Sciences. An International Journal 179:3091–3109. Digital object identifier: doi:10.1016/j.ins.2009.04.018. Poincar´e, H. 1902. La science et l’hypoth`ese. Paris: Ernerst Flammarion. Later ed,˙, Champs sciences, Flammarion, 1968 & Science and Hypothesis, trans. by J. Larmor, Walter Scott Publishing, London, 1905. Polkowski, L. 1993. Mathematical morphology of rough sets. Bull. Polish Acad. Ser. Sci.Math, Warsaw: Polish Academy of Sciences. ———. 1999. Approximate mathematical morphology. rough set approach. Rough and Fuzzy Sets in Soft Computing, Berlin: Springer - Verlag. Polkowski, L., and A. Skowron. 1994. Analytical morphology: Mathematical morphology of decision tables. Fundamenta Informaticae 27:255–271. Rosenfeld, A. 1979. Fuzzy digital topology. Inform. Contrl 40(1):76–87. Sen, D., and S. K. Pal. 2009. Histogram thrsholding using fuzzy and rough means of association error. IEEE Trans. Image Processing 18(4):879–888. Sussner, P., and M.E. Valle. 2008. Fuzzy associative memories and their relationship to mathematical morphology. In Handbook of granular computing, ed. W. Pedrycz, A. Skowron, and V. Kreinovich, 733–753. West Sussex, England: John Wiley & Sons, Ltd. Watanabe, S. 1985. Pattern recognition: Human and mechanical. John Wiley & Sons: Chichester. Zadeh, L.A. 1965. Fuzzy sets. Information and Control 201:72–81. Zeeman, E.C. 1962. The topology of the brain and the visual perception. New Jersey: Prentice Hall. In K.M. Fort, Ed., Topology of 3-manifolds and Selected Topics, 240-256.
1–15
2 Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Fuzzy C-Means and Rough Sets . . . . . . . . . . . . . . . . . . . . Fuzzy C-Means
•
Rough Sets
2.3 Rough-Fuzzy C-Means Algorithm . . . . . . . . . . . . . . . . . . Objective Function the Algorithm
•
Cluster Prototypes
•
Feature Extraction Machine Intelligence Unit, Indian Statistical Institute, Kolkata, 700 108, India
Sankar K. Pal Machine Intelligence Unit, Indian Statistical Institute, Kolkata, 700 108, India
2.1
•
2–5
Details of
2.4 Pixel Classification of Brain MR Images . . . . . . . . . . . 2.5 Segmentation of Brain MR Images . . . . . . . . . . . . . . . . .
Pradipta Maji
2–1 2–3
2–7 2–9
Selection of Initial Centroids
2.6 Experimental Results and Discussion . . . . . . . . . . . . . . 2–13 Haralick’s Features Versus Proposed Features • Random Versus Discriminant Analysis Based Initialization • Comparative Performance Analysis
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–18 Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–18 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–19
Introduction
Segmentation is a process of partitioning an image space into some non-overlapping meaningful homogeneous regions. The success of an image analysis system depends on the quality of segmentation (Rosenfeld and Kak, 1982). A segmentation method is supposed to find those sets that correspond to distinct anatomical structures or regions of interest in the image. In the analysis of medical images for computer-aided diagnosis and therapy, segmentation is often required as a preliminary stage. However, medical image segmentation is a complex and challenging task due to intrinsic nature of the images. The brain has a particularly complicated structure and its precise segmentation is very important for detecting tumors, edema, and necrotic tissues, in order to prescribe appropriate therapy (Suetens, 2002). In medical imaging technology, a number of complementary diagnostic tools such as xray computer tomography (CT), magnetic resonance imaging (MRI), and position emission tomography (PET) are available. MRI is an important diagnostic imaging technique for the early detection of abnormal changes in tissues and organs. Its unique advantage over other modalities is that it can provide multispectral images of tissues with a variety of 2–1
2–2
Rough Fuzzy Image Analysis
contrasts based on the three MR parameters ρ, T1, and T2. Therefore, majority of research in medical image segmentation concerns MR images (Suetens, 2002). Conventionally, the brain MR images are interpreted visually and qualitatively by radiologists. Advanced research requires quantitative information, such as the size of the brain ventricles after a traumatic brain injury or the relative volume of ventricles to brain. Fully automatic methods sometimes fail, producing incorrect results and requiring the intervention of a human operator. This is often true due to restrictions imposed by image acquisition, pathology and biological variation. So, it is important to have a faithful method to measure various structures in the brain. One of such methods is the segmentation of images to isolate objects and regions of interest. Many image processing techniques have been proposed for MR image segmentation, most notably thresholding (Lee, Hun, Ketter, and Unser, 1998; Maji, Kundu, and Chanda, 2008), region-growing (Manousakes, Undrill, and Cameron, 1998), edge detection (Singleton and Pohost, 1997), pixel classification (Pal and Pal, 1993; Rajapakse, Giedd, and Rapoport, 1997) and clustering (Bezdek, 1981; Leemput, Maes, Vandermeulen, and Suetens, 1999; Wells III, Grimson, Kikinis, and Jolesz, 1996). Some algorithms using the neural network approach have also been investigated in the MR image segmentation problems (Cagnoni, Coppini, Rucci, Caramella, and Valli, 1993; Hall, Bensaid, Clarke, Velthuizen, Silbiger, and Bezdek, 1992). One of the main problems in medical image segmentation is uncertainty. Some of the sources of this uncertainty include imprecision in computations and vagueness in class definitions. In this background, the possibility concept introduced by the fuzzy set theory (Zadeh, 1965) and rough set theory (Pawlak, 1991) have gained popularity in modeling and propagating uncertainty. Both fuzzy set and rough set provide a mathematical framework to capture uncertainties associated with human cognition process (Dubois and H.Prade, 1990; Maji and Pal, 2007b; Pal, Mitra, and Mitra, 2003). The segmentation of MR images using fuzzy c-means has been reported in (Bezdek, 1981; Brandt, Bohan, Kramer, and Fletcher, 1994; Hall et al., 1992; Li, Goldgof, and Hall, 1993; Xiao, Ho, and Hassanien, 2008). Image segmentation using rough sets has also been done (Mushrif and Ray, 2008; Pal and Mitra, 2002; Widz, Revett, and Slezak, 2005a,b; Widz and Slezak, 2007; Hassanien, 2007). In this chapter, a hybrid algorithm called rough-fuzzy c-means (RFCM) algorithm is presented for segmentation of brain MR images. Details of this algorithm have been reported in (Maji and Pal, 2007a,c). The RFCM algorithm is based on both rough sets and fuzzy sets. While the membership function of fuzzy sets enables efficient handling of overlapping partitions, the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition. Each partition is represented by a cluster prototype (centroid), a crisp lower approximation, and a fuzzy boundary. The lower approximation influences the fuzziness of the final partition. The cluster prototype (centroid) depends on the weighting average of the crisp lower approximation and fuzzy boundary. However, an important issue of the RFCM based brain MR image segmentation method is how to select initial prototypes of different classes or categories. The concept of discriminant analysis, based on the maximization of class separability, is used to circumvent the initialization and local minima problems of the RFCM, and enables efficient segmentation of brain MR images (Maji and Pal, 2008). The effectiveness of the RFCM algorithm, along with a comparison with other c-means algorithms, is demonstrated on a set of brain MR images using some standard validity indices. The chapter is organized as follows: Section 2.2 briefly introduces the necessary notions of fuzzy c-means and rough sets. In Section 2.3, the RFCM algorithm is described based on the theory of rough sets and fuzzy c-means. While Section 2.4 deals with pixel classification problem, Section 2.5 gives an overview of the feature extraction techniques employed in seg-
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images
2–3
mentation of brain MR images along with the initialization method of c-means algorithm based on the maximization of class separability. Implementation details, experimental results, and a comparison among different c-means are presented in Section 2.6. Concluding remarks are given in Section 2.7.
2.2
Fuzzy C-Means and Rough Sets
This section presents the basic notions of fuzzy c-means and rough sets. The rough-fuzzy c-means (RFCM) algorithm is developed based on these algorithms.
2.2.1
Fuzzy C-Means
Let X = {x1 , · · · , xj , · · · , xn } be the set of n objects and V = {v1 , · · · , vi , · · · , vc } be the set of c centroids, where xj ∈ ℜm , vi ∈ ℜm , and vi ∈ X. The fuzzy c-means provides a fuzzification of the hard c-means (Bezdek, 1981; Dunn, 1974). It partitions X into c clusters by minimizing the objective function J=
n X c X
´ (µij )m ||xj − vi ||2
(2.1)
j=1 i=1
where 1 ≤ m ´ < ∞ is the fuzzification factor, vi is the ith centroid corresponding to cluster βi , µij ∈ [0, 1] is the fuzzy membership of the pattern xj to cluster βi , and ||.|| is the distance norm, such that n n X 1X ´ ´ (µij )m (2.2) (µij )m xj ; where ni = vi = ni j=1 j=1 and
c X 2 dij m−1 µij = ( ( ) ´ )−1 ; where d2ij = ||xj − vi ||2 dkj
(2.3)
k=1
subject to c X i=1
µij = 1, ∀j, and 0 <
n X
µij < n, ∀i.
j=1
The process begins by randomly choosing c objects as the centroids (means) of the c clusters. The memberships are calculated based on the relative distance of the object xj to the centroids by Equation 2.3. After computing memberships of all the objects, the new centroids of the clusters are calculated as per Equation 2.2. The process stops when the centroids stabilize. That is, the centroids from the previous iteration are identical to those generated in the current iteration. The basic steps are outlined as follows: 1. Assign initial means vi , i = 1, 2, · · · , c. Choose values for m ´ and threshold ǫ. Set iteration counter t = 1. 2. Compute memberships µij by Equation 2.3 for c clusters and n objects. 3. Update mean (centroid) vi by Equation 2.2. 4. Repeat steps 2 to 4, by incrementing t, until |µij (t) − µij (t − 1)| > ǫ. Although fuzzy c-means is a very useful clustering method, the resulting memberships values do not always correspond well to the degrees of belonging of the data, and it may
2–4
Rough Fuzzy Image Analysis
be inaccurate in a noisy environment (Krishnapuram and Keller, 1993, 1996). In real data analysis, noise and outliers are unavoidable. Hence, to reduce this weakness of fuzzy c-means, and to produce memberships that have a good explanation of the degrees of belonging for the data, Krishnapuram and Keller (Krishnapuram and Keller, 1993, 1996) proposed a possibilistic approach to clustering which used a possibilistic type of membership function to describe the degree of belonging. However, the possibilistic c-means sometimes generates coincident clusters (Barni, Cappellini, and Mecocci, 1996). Recently, the use of both fuzzy (probabilistic) and possibilistic memberships in a clustering has been proposed in (Pal, Pal, Keller, and Bezdek, 2005).
2.2.2
Rough Sets
The theory of rough sets begins with the notion of an approximation space, which is a pair < U, R >, where U be a non-empty set (the universe of discourse) and R an equivalence relation on U , i.e., R is reflexive, symmetric, and transitive. The relation R decomposes the set U into disjoint classes in such a way that two elements x, y are in the same class iff (x, y) ∈ R. Let denote by U/R the quotient set of U by the relation R, and U/R = {X1 , X2 , · · · , Xm } where Xi is an equivalence class of R, i = 1, 2, · · · , m. If two elements x, y ∈ U belong to the same equivalence class Xi ∈ U/R, then x and y are called indistinguishable. The equivalence classes of R and the empty set ∅ are the elementary sets in the approximation space < U, R >. Given an arbitrary set X ∈ 2U , in general it may not be possible to describe X precisely in < U, R >. One may characterize X by a pair of lower and upper approximations defined as follows (Pawlak, 1991): R(X) =
[
Xi ;
R(X) =
[
Xi
Xi ∩X6=∅
Xi ⊆X
That is, the lower approximation R(X) is the union of all the elementary sets which are subsets of X, and the upper approximation R(X) is the union of all the elementary sets which have a non-empty intersection with X. The interval [R(X), R(X)] is the representation of an ordinary set X in the approximation space < U, R > or simply called the rough set of X. The lower (resp., upper) approximation R(X) (resp., R(X)) is interpreted as the collection of those elements of U that definitely (resp., possibly) belong to X. Further, • a set X ∈ 2U is said to be definable (or exact) in < U, R > iff R(X) = R(X). ˜ , iff • for any X, Y ∈ 2U , X is said to be roughly included in Y , denoted by X ⊂Y R(X) ⊆ R(Y ) and R(X) ⊆ R(Y ). • X and Y is said to be roughly equal, denoted by X ≃R Y , in < U, R > iff R(X) = R(Y ) and R(X) = R(Y ). In (Pawlak, 1991), Pawlak discusses two numerical characterizations of imprecision of a subset X in the approximation space < U, R >: accuracy and roughness. Accuracy of X, denoted by αR (X), is simply the ratio of the number of objects in its lower approximation to that in its upper approximation; namely αR (X) =
|R(X)| |R(X)|
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images
2–5
The roughness of X, denoted by ρR (X), is defined by subtracting the accuracy from 1: ρR (X) = 1 − αR (X) = 1 −
|R(X)| |R(X)|
Note that the lower the roughness of a subset, the better is its approximation. Further, the following observations are easily obtained: 1. As R(X) ⊆ X ⊆ R(X), 0 ≤ ρR (X) ≤ 1. 2. By convention, when X = ∅, R(X) = R(X) = ∅ and ρR (X) = 0. 3. ρR (X) = 0 if and only if X is definable in < U, R >.
2.3
Rough-Fuzzy C-Means Algorithm
Incorporating both fuzzy and rough sets, next a newly introduced c-means algorithm, termed as rough-fuzzy c-means (RFCM) (Maji and Pal, 2007a,c), is described. The RFCM algorithm adds the concept of fuzzy membership of fuzzy sets, and lower and upper approximations of rough sets into c-means algorithm. While the membership of fuzzy sets enables efficient handling of overlapping partitions, the rough sets deal with uncertainty, vagueness, and incompleteness in class definition.
2.3.1
Objective Function
Let A(βi ) and A(βi ) be the lower and upper approximations of cluster βi , and B(βi ) = {A(βi ) − A(βi )} denote the boundary region of cluster βi . The RFCM partitions a set of n objects into c clusters by minimizing the objective function ˜ × B1 if A(βi ) 6= ∅, B(βi ) 6= ∅ w × A1 + w A1 if A(βi ) 6= ∅, B(βi ) = ∅ JRF = (2.4) B1 if A(βi ) = ∅, B(βi ) 6= ∅ A1 =
c X
X
i=1 xj ∈A(βi )
||xj − vi ||2
B1 =
c X
X
´ (µij )m ||xj − vi ||2
i=1 xj ∈B(βi )
vi represents the centroid of the ith cluster βi , the parameter w and w ˜ correspond to the relative importance of lower bound and boundary region, and w + w ˜ = 1. Note that, µij has the same meaning of membership as that in fuzzy c-means. In the RFCM, each cluster is represented by a centroid, a crisp lower approximation, and a fuzzy boundary (Fig. 2.1). The lower approximation influences the fuzziness of final partition. According to the definitions of lower approximations and boundary of rough sets, if an object xj ∈ A(βi ), then xj ∈ / A(βk ), ∀k 6= i, and xj ∈ / B(βi ), ∀i. That is, the object xj is contained in βi definitely. Thus, the weights of the objects in lower approximation of a cluster should be independent of other centroids and clusters, and should not be coupled with their similarity with respect to other centroids. Also, the objects in lower approximation of a cluster should have similar influence on the corresponding centroid and cluster. Whereas, if xj ∈ B(βi ), then the object xj possibly belongs to βi and potentially belongs to another cluster. Hence, the objects in boundary regions should have different influence on the centroids and clusters. So, in the RFCM, the membership values of objects in lower approximation are µij = 1, while those in boundary region are the same as fuzzy c-means (Equation 2.3). In other word, the RFCM algorithm first partitions the data into two classes - lower approximation and boundary. Only the objects in boundary are fuzzified.
2–6
Rough Fuzzy Image Analysis
Cluster βi Crisp Lower Approximation A( βi ) with µ ij = 1 Fuzzy Boundary B( βi ) with µ ij [0, 1]
FIGURE 2.1
2.3.2
RFCM: cluster βi is represented by crisp lower bound and fuzzy boundary
Cluster Prototypes
The new centroid is calculated based on the weighting average of the crisp lower approximation and fuzzy boundary. Computation of the centroid is modified to include the effects of both fuzzy memberships and lower and upper bounds. The modified centroid calculation for the RFCM is obtained by solving Equation 2.4 with respect to vi : ˜ × D1 if A(βi ) 6= ∅, B(βi ) 6= ∅ w × C1 + w (2.5) C1 if A(βi ) 6= ∅, B(βi ) = ∅ viRF = D1 if A(βi ) = ∅, B(βi ) 6= ∅
C1 =
1 |A(βi )|
X
xj ;
where |A(βi )| represents the cardinality of A(βi )
xj ∈A(βi )
and D1 =
1 ni
X xj ∈B(βi )
´ (µij )m xj ; where ni =
X
´ (µij )m
xj ∈B(βi )
Thus, the cluster prototypes (centroids) depend on the parameters w and w, ˜ and fuzzification factor m ´ rule their relative influence. The correlated influence of these parameters and fuzzification factor, makes it somewhat difficult to determine their optimal values. Since the objects lying in lower approximation definitely belong to a cluster, they are assigned a higher weight w compared to w ˜ of the objects lying in boundary region. Hence, for the RFCM, the values are given by 0 < w ˜ < w < 1. From the above discussions, the following properties of the RFCM algorithm can be derived. S 1. A(βi ) = U , U be the set of objects of concern. 2. A(βi ) ∩ A(βk ) = ∅, ∀i 6= k. 3. A(βi ) ∩ B(βi ) = ∅, ∀i. 4. ∃i, k, B(βi ) ∩ B(βk ) 6= ∅. 5. µij = 1, ∀xj ∈ A(βi ). 6. µij ∈ [0, 1], ∀xj ∈ B(βi ). Let us briefly comment on some properties of the RFCM. The property 2 says that if an / A(βk ), ∀k 6= i. That is, the object xj is contained in βi definitely. object xj ∈ A(βi ) ⇒ xj ∈ The property 3 establishes the fact that if xj ∈ A(βi ) ⇒ xj ∈ / B(βi ), - that is, an object may not be in both lower and boundary region of a cluster βi . The property 4 says that
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images
2–7
if xj ∈ B(βi ) ⇒ ∃k, xj ∈ B(βk ). It means an object xj ∈ B(βi ) possibly belongs to βi and potentially belongs to other cluster. The properties 5 and 6 are of great importance in computing the objective function JRF and the cluster prototype v RF . They say that the membership values of the objects in lower approximation are µij = 1, while those in boundary region are the same as fuzzy c-means. That is, each cluster βi consists of a crisp lower approximation A(βi ) and a fuzzy boundary B(βi ).
2.3.3
Details of the Algorithm
Approximate optimization of JRF (Equation 2.4) by the RFCM is based on Picard iteration through Equations 2.3 and 2.5. This type of iteration is called alternating optimization. The process starts by randomly choosing c objects as the centroids of the c clusters. The fuzzy memberships of all objects are calculated using Equation 2.3. Let µi = (µi1 , · · · , µij , · · · , µin ) represent the fuzzy cluster βi associated with the centroid vi . After computing µij for c clusters and n objects, the values of µij for each object xj are sorted and the difference of two highest memberships of xj is compared with a threshold value δ. Let µij and µkj be the highest and second highest memberships of xj . If (µij − µkj ) > δ, then xj ∈ A(βi ) as well as xj ∈ A(βi ), otherwise xj ∈ A(βi ) and xj ∈ A(βk ). After assigning each object in lower approximations or boundary regions of different clusters based on δ, memberships µij of the objects are modified. The values of µij are set to 1 for the objects in lower approximations, while those in boundary regions are remain unchanged. The new centroids of the clusters are calculated as per Equation 2.5. The main steps of the RFCM algorithm proceed as follows: 1. Assign initial centroids vi , i = 1, 2, · · · , c. Choose values for fuzzification factor m, ´ and thresholds ǫ and δ. Set iteration counter t = 1. 2. Compute µij by Equation 2.3 for c clusters and n objects. 3. If µij and µkj be the two highest memberships of xj and (µij − µkj ) ≤ δ, then xj ∈ A(βi ) and xj ∈ A(βk ). Furthermore, xj is not part of any lower bound. 4. Otherwise, xj ∈ A(βi ). In addition, by properties of rough sets, xj ∈ A(βi ). 5. Modify µij considering lower and boundary regions for c clusters and n objects. 6. Compute new centroid as per Equation 2.5. 7. Repeat steps 2 to 7, by incrementing t, until |µij (t) − µij (t − 1)| > ǫ. The performance of the RFCM depends on the value of δ, which determines the class labels of all the objects. In other word, the RFCM partitions the data set into two classes - lower approximation and boundary, based on the value of δ. In the present work, the following definition is used: n 1X δ= (µij − µkj ) (2.6) n j=1 where n is the total number of objects, µij and µkj are the highest and second highest memberships of xj . That is, the value of δ represents the average difference of two highest memberships of all the objects in the data set. A good clustering procedure should make the value of δ as high as possible. The value of δ is, therefore, data dependent.
2.4
Pixel Classification of Brain MR Images
2–8
Rough Fuzzy Image Analysis
In this section, we present the results of different c-means algorithms on pixel classification of brain MR images, that is, the results of clustering based on only gray value of pixels. Above 100 MR images with different sizes and 16 bit gray levels are tested with different c-means algorithms. All the brain MR images are collected from Advanced Medicare and Research Institute, Salt Lake, Kolkata, India. The comparative performance of different cmeans is reported with respect to DB, and Dunn index, as well as the β index (Pal, Ghosh, and Sankar, 2000), which are reported next. Davies-Bouldin (DB) Index:
The Davies-Bouldin (DB) index (Bezdek and Pal, 1988) is a function of the ratio of sum of within-cluster distance to between-cluster separation and is given by c S(vi ) + S(vk ) 1X DB = maxi6=k c i=1 d(vi , vk ) for 1 ≤ i, k ≤ c. The DB index minimizes the within-cluster distance S(vi ) and maximizes the between-cluster separation d(vi , vk ). Therefore, for a given data set and c value, the higher the similarity values within the clusters and the between-cluster separation, the lower would be the DB index value. A good clustering procedure should make the value of DB index as low as possible. Dunn Index:
Dunn index (Bezdek and Pal, 1988) is also designed to identify sets of clusters that are compact and well separated. Dunn index maximizes d(vi , vk ) for 1 ≤ i, k, l ≤ c. Dunn = mini mini6=k maxl S(vl ) A good clustering procedure should make the value of Dunn index as high as possible. β Index: The β-index of Pal et al. (Pal et al., 2000) is defined as the ratio of the total variation and within-cluster variation, and is given by β=
ni ni c X c c X X X X N ||xij − v||2 ; M = ni = n; ; where N = ||xij − vi ||2 ; M i=1 j=1 i=1 i=1 j=1
ni is the number of objects in the ith cluster (i = 1, 2, · · · , c), n is the total number of objects, xij is the jth object in cluster i, vi is the mean or centroid of ith cluster, and v is the mean of n objects. For a given image and c value, the higher the homogeneity within the segmented regions, the higher would be the β value. The value of β increases with c. Consider the image of Fig. 2.3 as an example, which represents an MR image (I-20497774) of size 256×180 with 16 bit gray levels. So, the number of objects in the data set of IMAGE20497774 is 46080. Table 2.1 depicts the values of DB index, Dunn index, and β index of FCM and RFCM for different values of c on the data set of I-20497774 considering only gray value of pixel. The results reported here with respect to DB and Dunn index confirm that both FCM and RFCM achieve their best results for c = 4 (background, gray matter, white matter, and cerebro-spinal fluid). Also, the value of β index, as expected, increases
2–9
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images TABLE 2.1 I-20497774 Value of c 2 3 4 5 6 7 8 9 10
Performance of FCM and RFCM on
DB Index FCM RFCM 0.51 0.21 0.25 0.17 0.16 0.15 0.39 0.17 0.20 0.19 0.23 0.27 0.34 0.27 0.32 0.28 0.30 0.24
Dunn Index FCM RFCM 2.30 6.17 1.11 1.62 1.50 1.64 0.10 0.64 0.66 1.10 0.98 0.12 0.09 0.31 0.12 0.13 0.08 0.12
β Index FCM RFCM 2.15 2.19 3.55 3.74 9.08 9.68 10.45 10.82 16.93 17.14 21.63 22.73 25.82 26.38 31.75 32.65 38.04 39.31
with increase in the value of c. For a particular value of c, the performance of RFCM is better than that of FCM. Fig. 2.2 shows the scatter plots of the highest and second highest memberships of all the objects in the data set of I-20497774 at first and final iterations respectively, considering w = 0.95, m ´ 1 = 2.0, and c = 4. The diagonal line represents the zone where two highest memberships of objects are equal. From Fig. 2.2, it is observed that though the average difference between two highest memberships of the objects are very low at first iteration (δ = 0.145), they become ultimately very high at the final iteration (δ = 0.652).
At 1st Iteration
After 20th Iteration 1
Second Highest Membership Value
Second Highest Membership Value
1
0.8
0.6
0.4
0.2
0
0.6
0.4
0.2
0 0
0.2
0.4
0.6
Highest Membership Value
FIGURE 2.2
0.8
0.8
1
0
0.2
0.4
0.6
0.8
1
Highest Membership Value
Scatter plots of two highest membership values of all objects in data set I-20497774
Table 2.2 compares the performance of different c-means algorithms on some brain MR images with respect to DB, Dunn, and β index considering c = 4 (back-ground, gray matter, white matter, and CSF). All the results reported in Table 2.2 confirm that the RFCM algorithm produces pixel clusters more promising than do the conventional methods. Some of the existing algorithms like PCM and FPCM have failed to produce multiple clusters as they generate coincident clusters even when they have been initialized with the final prototypes of FCM. Also, the values of DB, Dunn, and β index of RFCM are better compared to other c-means algorithms.
2.5
Segmentation of Brain MR Images
2–10
Rough Fuzzy Image Analysis TABLE 2.2 Data Set I-20497761
I-20497763
I-20497774
I-20497777
Performance of Different C-Means Algorithms Algorithms HCM FCM RCM RFCM HCM FCM RCM RFCM HCM FCM RCM RFCM HCM FCM RCM RFCM
DB Index 0.16 0.14 0.15 0.13 0.18 0.16 0.15 0.11 0.18 0.16 0.17 0.15 0.17 0.16 0.15 0.14
Dunn Index 2.13 2.26 2.31 2.39 1.88 2.02 2.14 2.12 1.17 1.50 1.51 1.64 2.01 2.16 2.34 2.39
β Index 12.07 12.92 11.68 13.06 12.02 12.63 12.59 13.30 8.11 9.08 9.10 9.68 8.68 9.12 9.28 9.81
In this section, the feature extraction methodology for segmentation of brain MR images is first described. Next, the methodology to select initial centroids for different c-means algorithms is provided based on the concept of maximization of class separability (Maji and Pal, 2008).
2.5.1
Feature Extraction
Statistical texture analysis derives a set of statistics from the distribution of pixel values or blocks of pixel values. There are different types of statistical texture, first-order, second-order, and higher order statistics, based on the number of pixel combinations used to compute the textures. The first-order statistics, like mean, standard deviation, range, entropy, and the qth moment about the mean, are calculated using the histogram formed by the gray scale value of each pixel. These statistics consider the properties of the gray scale values, but not their spatial distribution. The second-order statistics are based on pairs of pixels. This takes into account the spatial distribution of the gray scale distribution. In the present work, only first- and second-order statistical textures are considered. A set of 13 input features is used for clustering the brain MR images. These include gray value of the pixel, two recently introduced features (first order statistics) - homogeneity and edge value of the pixel (Maji and Pal, 2008), and 10 Haralick’s textural features (Haralick, Shanmugam, and Dinstein, 1973) (second order statistics) - angular second moment, contrast, correlation, inverse difference moment, sum average, sum variance, sum entropy, second order entropy, difference variance, and difference entropy. They are useful in characterizing images, and can be used as features of a pixel. Hence these features have promising application in clustering based brain MRI segmentation. Homogeneity
If H is the homogeneity of a pixel Im,n within 3 × 3 neighborhood, then 1 {|Im−1,n−1 + Im+1,n+1 − Im−1,n+1 − Im+1,n−1 | + 6(Imax − Imin ) |Im−1,n−1 + 2Im,n−1 + Im+1,n−1 − Im−1,n+1 − 2Im,n+1 − Im+1,n+1 |}
H=1−
where Imax and Imin represent the maximum and minimum gray values of the image. The region that is entirely within an organ will have a high H value. On the other hand, the regions that contain more than one organ will have lower H values (Maji and Pal, 2008).
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images
2–11
Edge Value
In MR imaging, the histogram of the given image is in general unimodal. One side of the peak may display a shoulder or slope change, or one side may be less steep than the other, reflecting the presence of two peaks that are close together or that differ greatly in height. The histogram may also contain a third, usually smaller, population corresponding to points on the object-background border. These points have gray levels intermediate between those of the object and background; their presence raises the level of the valley floor between the two peaks, or if the peaks are already close together, makes it harder to detect the fact that they are not a single peak. As the histogram peaks are close together and very unequal in size, it may be difficult to detect the valley between them. In determining how each point of the image should contribute to the segmentation method, the current method takes into account the rate of change of gray level at the point, as well as the point’s gray level (edge value); that is, the maximum of differences of average gray levels in pairs of horizontally and vertically adjacent 2 × 2 neighborhoods (Maji et al., 2008; Weszka and Rosenfeld, 1979). If ∆ is the edge value at a given point Im,n , then ∆=
1 max{|Im−1,n + Im−1,n+1 + Im,n + Im,n+1 − Im+1,n − Im+1,n+1 − Im+2,n − Im+2,n+1 |, 4 |Im,n−1 + Im,n + Im+1,n−1 + Im+1,n − Im,n+1 − Im,n+2 − Im+1,n+1 − Im+1,n+2 |}
According to the image model, points interior to the object and background should generally have low edge values, since they are highly correlated with their neighbors, while those on the object-background border should have high edge values (Maji et al., 2008). Haralick’s Textural Feature
Texture is one of the important features used in identifying objects or regions of interest in an image. It is often described as a set of statistical measures of the spatial distribution of gray levels in an image. This scheme has been found to provide a powerful input feature representation for various recognition problems. Haralick et al. (Haralick et al., 1973) proposed different textural properties for image classification. Haralick’s textural measures are based upon the moments of a joint probability density function that is estimated as the joint co-occurrence matrix or gray level co-occurrence matrix (Haralick et al., 1973; Rangayyan, 2004). It reflects the distribution of the probability of occurrence of a pair of gray levels separated by a given distance d at angle θ. Based upon normalized gray level co-occurrence matrix, Haralick proposed several quantities as measure of texture like energy, contrast, correlation, sum of squares, inverse difference moments, sum average, sum variance, sum entropy, entropy, difference variance, difference entropy, information measure of correlation 1, and correlation 2. In (Haralick et al., 1973), these properties were calculated for large blocks in aerial photographs. Every pixel within these each large block was then assigned the same texture values. This leads to a significant loss of resolution that is unacceptable in medical imaging. In the present work, the texture values are assigned to a pixel by using a 3 × 3 sliding window centered about that pixel. The gray level co-occurrence matrix is constructed by mapping the gray level co-occurrence probabilities based on spatial relations of pixels in different angular directions (θ = 0◦ , 45◦ , 90◦ , 135◦ ) with unit pixel distance, while scanning the window (centered about a pixel) from left-to-right and top-to-bottom (Haralick et al., 1973; Rangayyan, 2004). Ten texture measures - angular second moment, contrast, correlation, inverse difference moment, sum average, sum variance, sum entropy, second order
2–12
Rough Fuzzy Image Analysis
entropy, difference variance, and difference entropy, are computed for each window. For four angular directions, a set of four values is obtained for each of ten measures. The mean of each of the ten measures, averaged over four values, along with gray value, homogeneity, and edge value of the pixel, comprise the set of 13 features which is used as feature vector of the corresponding pixel.
2.5.2
Selection of Initial Centroids
A limitation of the c-means algorithm is that it can only achieve a local optimum solution that depends on the initial choice of the centroids. Consequently, computing resources may be wasted in that some initial centroids get stuck in regions of the input space with a scarcity of data points and may therefore never have the chance to move to new locations where they are needed. To overcome this limitation of the c-means algorithm, next a method is described to select initial centroids, which is based on discriminant analysis maximizing some measures of class separability (Otsu, 1979). It enables the algorithm to converge to an optimum or near optimum solutions (Maji and Pal, 2008). Prior to describe the new method for selecting initial centroids, next a quantitative measure of class separability (Otsu, 1979) is provided that is given by J(T) =
P1 (T)P2 (T)[m1 (T) − m2 (T)]2 P1 (T)σ12 (T) + P2 (T)σ22 (T)
(2.7)
where P1 (T) =
T X z=0
m1 (T) =
L−1 X
h(z); P2 (T) =
h(z) = 1 − P1 (T)
z=T+1
T L−1 X 1 1 X zh(z); m2 (T) = zh(z) P1 (T) z=0 P2 (T) z=T+1
σ12 (T) =
1 P1 (T)
T X
[z − m1 (T)]2 h(z); σ22 (T) =
z=0
1 P2 (T)
L−1 X
[z − m2 (T)]2 h(z)
z=T+1
Here, L is the total number of discrete values ranging between [0, L − 1], T is the threshold value, which maximizes J(T), and h(z) represents the percentage of data having feature value z over the total number of discrete values of the corresponding feature. To maximize J(T), the means of the two classes should be as well separated as possible and the variances in both classes should be as small as possible. Based on the concept of maximization of class separability, the method for selecting initial centroids is described next. The main steps of this method proceeds as follows. 1. The data set X = {x1 , · · · , xj , · · · , xn } with xj ∈ ℜm are first discretized to facilitate class separation method. Suppose, the possible value range of a feature fm in the data set is (fm,min , fm,max ), and the real value that the data element xj takes at fm is fmj , then the discretized value of fmj is Discretized(fmj ) = (L − 1) ×
fmj − fm,min fm,max − fm,min
where L is the total number of discrete values ranging between [0, L − 1]. 2. For each feature fm , calculate h(z) for 0 ≤ z < L.
(2.8)
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images
2–13
3. Calculate the threshold value Tm for the feature fm , which maximizes class separability along that feature. 4. Based on the threshold Tm , discretize the corresponding feature fm of the data element xj as follows f mj =
1, if Discretized(fmj ) ≥ Tm 0, Otherwise
5. Repeat steps 2 to 4 for all the features and generate the set of discretized objects X = {x1 , · · · , xj , · · · , xn }. 6. Calculate total number of similar discretized objects N(xi ) and mean of similar objects v(xi ) of xi as
N(xi ) =
n X
1 X δj × xj N(xi ) j=1 n
δj
and v(xi ) =
j=1
where δj =
1 0
if xj = xi Otherwise
7. Sort n objects according to their values of N(xi ) such that N(x1 ) > N(x2 ) > · · · > N(xn ). 8. If xi = xj , then N(xi ) = N(xj ) and v(xj ) should not be considered as a centroid (mean), resulting in a reduced set of objects to be considered for initial centroids. 9. Let there be n ´ objects in the reduced set having N(xi ) values such that N(x1 ) > N(x2 ) > · · · > N(xn´ ). A heuristic threshold function can be defined as follows (Banerjee, Mitra, and Pal, 1998): n ´
Tr =
X 1 R ; where R = ǫ ˜ N(x ) − N(xi+1 ) i i=1
where ǫ˜ is a constant (= 0.5, say), so that all the means v(xi ) of the objects in reduced set having N(xi ) value higher than it are regarded as the candidates for initial centroids (means). The value of Tr is high if most of the N(xi )’s are large and close to each other. The above condition occurs when a small number of large clusters are present. On the other hand, if the N(xi )’s have wide variation among them, then the number of clusters with smaller size increases. Accordingly, Tr attains a lower value automatically. Note that the main motive of introducing this threshold function lies in reducing the number of centroids. Actually, it attempts to eliminate noisy centroids (data representatives having lower values of N(xi )) from the whole data set. The whole approach is, therefore, data dependent.
2.6
Experimental Results and Discussion
2–14
Rough Fuzzy Image Analysis
In this section, the performance of different c-means algorithms on segmentation of brain MR images is presented. Details of the experimental set up, data collection, and objective of the experiments are same as those of Section 2.4. Consider Fig. 2.3 as an example that represents an MR image (I-20497774) along with the segmented images obtained using different c-means algorithms. Each image is of size 256 × 180 with 16 bit gray levels. So, the number of objects in the data set of I-20497774 is 46080. The parameters generated in the discriminant analysis based initialization method are shown in Table 2.3 only for I-20497774 data set along with the values of input parameters. The threshold values for 13 features of the given data set are also reported in this table. Table 2.4 depicts the values of DB index, Dunn index, and β index of FCM and
FIGURE 2.3
I-20497774: original and segmented images of HCM, FCM, RCM, and RFCM
RFCM for different values of c on the data set of I-20497774, considering w = 0.95 and m ´ = 2.0. The results reported here with respect to DB and Dunn index confirm that both FCM and RFCM achieve their best results for c = 4. Also, the value of β index, as expected, increases with increase in the value of c. For a particular value of c, the performance of RFCM is better than that of FCM. TABLE 2.3
Values of Different Parameters
Size of image = 256 × 180 Minimum gray value = 1606, Maximum gray value = 2246 Samples per pixel = 1, Bits allocated = 16, Bits stored = 12 Number of objects = 46080 Number of features = 13, Value of L = 101 Threshold Values: Gray value = 1959, Homogeneity = 0.17, Edge value = 0.37 Angular second moment = 0.06, Contrast = 0.12 Correlation = 0.57, Inverse difference moment = 0.18 Sum average = 0.17, Sum variance = 0.14, Sum entropy = 0.87 Entropy = 0.88, Difference variance = 0.07, Difference entropy = 0.79
Finally, Table 2.5 provides the comparative results of different c-means algorithms on I-20497774 with respect to the values of DB index, Dunn index, and β index. The corresponding segmented images along with the original one are presented in Fig. 2.3. The results reported in Fig. 2.3 and Table 2.5 confirm that the RFCM algorithm produces segmented image more promising than do the conventional c-means algorithms. Some of the existing algorithms like PCM and FPCM fail to produce multiple segments as they generate coincident clusters even when they are initialized with final prototypes of the FCM.
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images
2–15
TABLE 2.4 Performance of FCM and RFCM on I-20497774 data set Value of c 2 3 4 5 6 7 8 9 10
DB Index FCM RFCM 0.38 0.19 0.22 0.16 0.15 0.13 0.29 0.19 0.24 0.23 0.23 0.21 0.31 0.21 0.30 0.24 0.30 0.22
Dunn Index FCM RFCM 2.17 3.43 1.20 1.78 1.54 1.80 0.95 1.04 0.98 1.11 1.07 0.86 0.46 0.95 0.73 0.74 0.81 0.29
β Index FCM RFCM 3.62 4.23 7.04 7.64 11.16 13.01 11.88 14.83 19.15 19.59 24.07 27.80 29.00 33.02 35.06 40.07 41.12 44.27
TABLE 2.5 Performance of Different C-Means on I-20497774 data set Algorithms HCM FCM RCM RFCM
TABLE 2.6 Algorithms HCM
FCM
RCM
RFCM
2.6.1
DB Index 0.17 0.15 0.16 0.13
Dunn Index 1.28 1.54 1.56 1.80
β Index 10.57 11.16 11.19 13.01
Haralick’s and Proposed Features on I-20497774 data set Features H-13 H-10 P-2 H-10 ∪ P-2 H-13 H-10 P-2 H-10 ∪ P-2 H-13 H-10 P-2 H-10 ∪ P-2 H-13 H-10 P-2 H-10 ∪ P-2
DB Index 0.19 0.19 0.18 0.17 0.15 0.15 0.15 0.15 0.19 0.19 0.17 0.16 0.13 0.13 0.13 0.13
Dunn Index 1.28 1.28 1.28 1.28 1.51 1.51 1.51 1.54 1.52 1.52 1.51 1.56 1.76 1.76 1.77 1.80
β Index 10.57 10.57 10.57 10.57 10.84 10.84 11.03 11.16 11.12 11.12 11.02 11.19 12.57 12.57 12.88 13.01
Time (ms) 4308 3845 1867 3882 36711 34251 14622 43109 5204 5012 1497 7618 15705 15414 6866 17084
Haralick’s Features Versus Proposed Features
Table 2.6 presents the comparative results of different c-means for Haralick’s features and features proposed in (Maji and Pal, 2008) on I-20497774 data set. While P-2 and H-13 stand for the set of two proposed features (Maji and Pal, 2008) and thirteen Haralick’s features, H-10 represents that of ten Haralick’s features which are used in the current study. The proposed features are found as important as Haralick’s ten features for clustering based segmentation of brain MR images. The set of 13 features, comprising of gray value, two proposed features, and ten Haralick’s features, improves the performance of all c-means with respect to DB, Dunn, and β. It is also observed that the Haralick’s three features sum of squares, information measure of correlation 1, and correlation 2, do not contribute any extra information for segmentation of brain MR images.
2.6.2
Random Versus Discriminant Analysis Based Initialization
Table 2.7 provides comparative results of different c-means algorithms with random initialization of centroids and the discriminant analysis based initialization method described in
2–16
Rough Fuzzy Image Analysis
Section 2.5.2 for the data sets I-20497761, I-20497763, and I-20497777 (Fig. 2.4). TABLE 2.7 Method Data Set I-204 97761
Performance of Random and Discriminant Analysis Based Initialization Algorithms HCM FCM RCM RFCM
I-204 97763
HCM FCM RCM RFCM
I-204 97777
HCM FCM RCM RFCM
FIGURE 2.4
The
Initialization Random Proposed Random Proposed Random Proposed Random Proposed Random Proposed Random Proposed Random Proposed Random Proposed Random Proposed Random Proposed Random Proposed Random Proposed
DB Index 0.23 0.15 0.19 0.12 0.19 0.14 0.15 0.11 0.26 0.16 0.21 0.15 0.21 0.14 0.17 0.10 0.33 0.16 0.28 0.15 0.27 0.13 0.19 0.11
Dunn Index 1.58 2.64 1.63 2.69 1.66 2.79 2.07 2.98 1.37 2.03 1.54 2.24 1.60 2.39 1.89 2.38 1.52 2.38 1.67 2.54 1.71 2.79 1.98 2.83
β Index 9.86 12.44 12.73 13.35 10.90 12.13 11.89 13.57 10.16 13.18 10.57 13.79 10.84 13.80 11.49 14.27 6.79 8.94 7.33 10.02 7.47 9.89 8.13 11.04
Time (ms) 8297 4080 40943 38625 9074 6670 19679 16532 3287 3262 46157 45966 10166 6770 19448 15457 4322 3825 42284 40827 8353 7512 18968 16930
Examples of some brain MR images: I-20497761, I-20497763, I-20497777
discriminant analysis based initialization method is found to improve the performance in terms of DB index, Dunn index, and β index as well as reduce the time requirement of all c-means algorithms. It is also observed that HCM with this initialization method performs similar to RFCM with random initialization, although it is expected that RFCM is superior to HCM in partitioning the objects. While in random initialization, the c-means algorithms get stuck in local optimums, the discriminant analysis based initialization method enables the algorithms to converge to an optimum or near optimum solutions. In effect, the execution time required for different c-means algorithms is lesser in this scheme compared to random initialization.
2.6.3
Comparative Performance Analysis
Table 2.8 compares the performance of different c-means algorithms on some brain MR images with respect to DB, Dunn, and β index. The segmented versions of different c-
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images
2–17
means are shown in Figs. 2.5-2.7. All the results reported in Table 2.8 and Figs. 2.5-2.7 TABLE 2.8 Data Set I-204 97761 I-204 97763 I-204 97777
Performance of Different C-Means Algorithms Algorithms HCM FCM RCM RFCM HCM FCM RCM RFCM HCM FCM RCM RFCM
DB Index 0.15 0.12 0.14 0.11 0.16 0.15 0.14 0.10 0.16 0.15 0.13 0.11
Dunn Index 2.64 2.69 2.79 2.98 2.03 2.24 2.39 2.38 2.38 2.54 2.79 2.83
β Index 12.44 13.35 12.13 13.57 13.18 13.79 13.80 14.27 8.94 10.02 9.89 11.04
Time (ms) 4080 38625 6670 16532 3262 45966 6770 15457 3825 40827 7512 16930
confirm that although each c-means algorithm, except PCM and FPCM, generates good segmented images, the values of DB, Dunn, and β index of the RFCM are better compared to other c-means algorithms. Both PCM and FPCM fail to produce multiple segments of the brain MR images as they generate coincident clusters even when they are initialized with the final prototypes of other c-means algorithms. Table 2.8 also provides execution time (in milli sec.) of different c-means. The execution time required for the RFCM is significantly lesser compared to FCM. For the HCM and RCM, although the execution time is less, the performance is considerably poorer than that of RFCM. Following conclusions can be drawn from the results reported in this chapter:
FIGURE 2.5
I-20497761: segmented versions of HCM, FCM, RCM, and RFCM
FIGURE 2.6
I-20497763: segmented versions of HCM, FCM, RCM, and RFCM
1. It is observed that RFCM is superior to other c-means algorithms. However, RFCM requires higher time compared to HCM/RCM and lesser time compared
2–18
Rough Fuzzy Image Analysis
FIGURE 2.7
I-20497777: segmented versions of HCM, FCM, RCM, and RFCM
to FCM. But, the performance of RFCM with respect to DB, Dunn, and β is significantly better than all other c-means. The performance of FCM and RCM is intermediate between RFCM and HCM. 2. The discriminant analysis based initialization is found to improve the values of DB, Dunn, and β as well as reduce the time requirement substantially for all c-means algorithms. 3. Two features proposed in (Maji and Pal, 2008) are as important as Haralick’s ten features for clustering based segmentation of brain MR images. 4. Use of rough sets and fuzzy memberships adds a small computational load to HCM algorithm; however the corresponding integrated method (RFCM) shows a definite increase in Dunn index and decrease in DB index. The best performance of the segmentation method in terms of DB, Dunn, and β is achieved due to the following reasons: 1. the discriminant analysis based initialization of centroids enables the algorithm to converge to an optimum or near optimum solutions; 2. membership of the RFCM handles efficiently overlapping partitions; and 3. the concept of crisp lower bound and fuzzy boundary of the RFCM algorithm deals with uncertainty, vagueness, and incompleteness in class definition. In effect, promising segmented brain MR images are obtained using the RFCM algorithm.
2.7
Conclusion
A robust segmentation technique is presented in this chapter, integrating the merits of rough sets, fuzzy sets, and c-means algorithm, for brain MR images. Some new measures are reported, based on the local properties of MR images, for accurate segmentation. The method, based on the concept of maximization of class separability, is found to be successful in effectively circumventing the initialization and local minima problems of iterative refinement clustering algorithms like c-means. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated on a set of brain MR images. The extensive experimental results show that the rough-fuzzy c-means algorithm produces a segmented image more promising than do the conventional algorithms.
Acknowledgments. The authors thank Advanced Medicare and Research Institute, Kolkata, India, for providing brain MR images. This work was done when S. K. Pal was a Govt. of India J.C. Bose Fellow.
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images
Bibliography Banerjee, Mohua, Sushmita Mitra, and Sankar K Pal. 1998. Rough Fuzzy MLP: Knowledge Encoding and Classification. IEEE Transactions on Neural Networks 9(6): 1203–1216. Barni, M., V. Cappellini, and A. Mecocci. 1996. Comments on A Possibilistic Approach to Clustering. IEEE Transactions on Fuzzy Systems 4(3):393–396. Bezdek, J. C. 1981. Pattern Recognition with Fuzzy Objective Function Algorithm. New York: Plenum. Bezdek, J. C., and N. R. Pal. 1988. Some New Indexes for Cluster Validity. IEEE Transactions on System, Man, and Cybernetics, Part B 28:301–315. Brandt, M. E., T. P. Bohan, L. A. Kramer, and J. M. Fletcher. 1994. Estimation of CSF, White and Gray Matter Volumes in Hydrocephalic Children Using Fuzzy Clustering of MR Images. Computerized Medical Imaging and Graphics 18:25–34. Cagnoni, S., G. Coppini, M. Rucci, D. Caramella, and G. Valli. 1993. Neural Network Segmentation of Magnetic Resonance Spin Echo Images of the Brain. Journal of Biomedical Engineering 15(5):355–362. Dubois, D., and H.Prade. 1990. Rough Fuzzy Sets and Fuzzy Rough Sets. International Journal of General Systems 17:191–209. Dunn, J. C. 1974. A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact, Well-Separated Clusters. Journal of Cybernetics 3:32–57. Hall, L. O., A. M. Bensaid, L. P. Clarke, R. P. Velthuizen, M. S. Silbiger, and J. C. Bezdek. 1992. A Comparison of Neural Network and Fuzzy Clustering Techniques in Segmenting Magnetic Resonance Images of the Brain. IEEE Transactions on Neural Networks 3(5):672–682. Haralick, R. M., K. Shanmugam, and I. Dinstein. 1973. Textural Features for Image Classification. IEEE Transactions on Systems, Man and Cybernetics SMC-3(6): 610–621. Hassanien, Aboul Ella. 2007. Fuzzy Rough Sets Hybrid Scheme for Breast Cancer Detection. Image Vision Computing 25(2):172–183. Krishnapuram, R., and J. M. Keller. 1993. A Possibilistic Approach to Clustering. IEEE Transactions on Fuzzy Systems 1(2):98–110. ———. 1996. The Possibilistic C-Means Algorithm: Insights and Recommendations. IEEE Transactions on Fuzzy Systems 4(3):385–393. Lee, C., S. Hun, T. A. Ketter, and M. Unser. 1998. Unsupervised Connectivity Based Thresholding Segmentation of Midsaggital Brain MR Images. Computers in Biology and Medicine 28:309–338. Leemput, K. V., F. Maes, D. Vandermeulen, and P. Suetens. 1999. Automated ModelBased Tissue Classification of MR Images of the Brain. IEEE Transactions on Medical Imaging 18(10):897–908.
2–19
2–20
Rough Fuzzy Image Analysis
Li, C. L., D. B. Goldgof, and L. O. Hall. 1993. Knowledge-Based Classification and Tissue Labeling of MR Images of Human Brain. IEEE Transactions on Medical Imaging 12(4):740–750. Maji, Pradipta, Malay K. Kundu, and Bhabatosh Chanda. 2008. Second Order Fuzzy Measure and Weighted Co-Occurrence Matrix for Segmentation of Brain MR Images. Fundamenta Informaticae 88(1-2):161–176. Maji, Pradipta, and Sankar K. Pal. 2007a. RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets. Fundamenta Informaticae 80(4):475–496. ———. 2007b. Rough-Fuzzy C-Medoids Algorithm and Selection of Bio-Basis for Amino Acid Sequence Analysis. IEEE Transactions on Knowledge and Data Engineering 19(6):859–872. ———. 2007c. Rough Set Based Generalized Fuzzy C-Means Algorithm and Quantitative Indices. IEEE Transactions on System, Man and Cybernetics, Part B, Cybernetics 37(6):1529–1540. ———. 2008. Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation. LNCS Transactions on Rough Sets IX(5390):114–134. Manousakes, I. N., P. E. Undrill, and G. G. Cameron. 1998. Split and Merge Segmentation of Magnetic Resonance Medical Images: Performance Evaluation and Extension to Three Dimensions. Computers and Biomedical Research 31(6):393–412. Mushrif, Milind M., and Ajoy K. Ray. 2008. Color Image Segmentation: Rough-Set Theoretic Approach. Pattern Recognition Letters 29(4):483–493. Otsu, N. 1979. A Threshold Selection Method from Gray Level Histogram. IEEE Transactions on System, Man, and Cybernetics 9(1):62–66. Pal, N. R., K. Pal, J. M. Keller, and J. C. Bezdek. 2005. A Possibilistic Fuzzy C-Means Clustering Algorithm. IEEE Transactions on Fuzzy Systems 13(4):517–530. Pal, N. R., and S. K. Pal. 1993. A Review on Image Segmentation Techniques. Pattern Recognition 26(9):1277–1294. Pal, S. K., A. Ghosh, and B. Uma Sankar. 2000. Segmentation of Remotely Sensed Images with Fuzzy Thresholding, and Quantitative Evaluation. International Journal of Remote Sensing 21(11):2269–2300. Pal, S. K., and P. Mitra. 2002. Multispectral Image Segmentation Using Rough Set Initiatized EM Algorithm. IEEE Transactions on Geoscience and Remote Sensing 40(11):2495–2501. Pal, Sankar K, Sushmita Mitra, and Pabitra Mitra. 2003. Rough-Fuzzy MLP: Modular Evolution, Rule Generation, and Evaluation. IEEE Transactions on Knowledge and Data Engineering 15(1):14–25. Pawlak, Z. 1991. Rough Sets, Theoretical Aspects of Resoning About Data. Dordrecht, The Netherlands: Kluwer. Rajapakse, J. C., J. N. Giedd, and J. L. Rapoport. 1997. Statistical Approach to Segmentation of Single Channel Cerebral MR Images. IEEE Transactions on Medical Imaging 16:176–186.
Rough-Fuzzy Clustering Algorithm for Segmentation of Brain MR Images Rangayyan, Rangaraj M. 2004. Biomedical Image Analysis. CRC Press. Rosenfeld, A., and A. C. Kak. 1982. Digital Picture Processing. Academic Press, Inc. Singleton, H. R., and G. M. Pohost. 1997. Automatic Cardiac MR Image Segmentation Using Edge Detection by Tissue Classification in Pixel Neighborhoods. Magnetic Resonance in Medicine 37(3):418–424. Suetens, Paul. 2002. Fundamentals of Medical Imaging. Cambridge University Press. Wells III, W. M., W. E. L. Grimson, R. Kikinis, and F. A. Jolesz. 1996. Adaptive Segmentation of MRI Data. IEEE Transactions on Medical Imaging 15(4):429–442. Weszka, J. S., and A. Rosenfeld. 1979. Histogram Modification for Threshold Selection. IEEE Transactions on System, Man, and Cybernetics SMC-9(1):62–66. Widz, Sebastian, Kenneth Revett, and Dominik Slezak. 2005a. A Hybrid Approach to MR Imaging Segmentation Using Unsupervised Clustering and Approximate Reducts. Proceedings of the 10th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing 372–382. ———. 2005b. A Rough Set-Based Magnetic Resonance Imaging Partial Volume Detection System. Proceedings of the First International Conference on Pattern Recognition and Machine Intelligence 756–761. Widz, Sebastian, and Dominik Slezak. 2007. Approximation Degrees in Decision Reduct-Based MRI Segmentation. Proceedings of the Frontiers in the Convergence of Bioscience and Information Technologies 431–436. Xiao, Kai, Sooi Hock Ho, and Aboul Ella Hassanien. 2008. Automatic Unsupervised Segmentation Methods for MRI Based on Modified Fuzzy C-Means. Fundamenta Informaticae 87(3-4):465–481. Zadeh, L. A. 1965. Fuzzy Sets. Information and Control 8:338–353.
2–21
3 Image Thresholding using Generalized Rough Sets 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1 3.2 Generalized Rough Set based Entropy Measures with respect to the Definability of a Set of Elements . . 3–3 Roughness of a Set in a Universe • The Lower and Upper Approximations of a Set • The Entropy Measures • Relation between ρR (X) and ρR (X { ) • Properties of the Proposed Classes of Entropy Measures
Debashis Sen Center for Soft Computing Research, Indian Statistical Institute
Sankar K. Pal Center for Soft Computing Research, Indian Statistical Institute
3.1
3.3 Measuring Grayness Ambiguity in Images . . . . . . . . . 3–11 3.4 Image Thresholding based on Association Error . . 3–15 Bilevel Thresholding
•
Multilevel Thresholding
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–19 Qualitative analysis
•
Quantitative analysis
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–26 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–27
Introduction
Real-life images are inherently embedded with various ambiguities. In order to perceive the nature of ambiguities in images, let us consider a 1001 × 1001 grayscale image (see Figure 3.1(a)) that has sinusoidal gray value gradations in horizontal direction. When an attempt is made to mark the boundary of an arbitrary region in the image, an exact boundary can not be defined as a consequence of the presence of steadily changing gray values (gray value gradation). This is evident from Figure 3.1(b) that shows a portion of the image, where it is known that the pixels in the ‘white’ shaded area uniquely belong to a region. However, the boundary (on the left and right sides) of this region is vague as it can lie anywhere in the gray value gradations present in the portion. Value gradation is a common phenomenon in real-life images and hence it is widely accepted (Pal, 1982; Pal, King, and Hashim, 1983; Udupa and Saha, 2003) that regions in an image have fuzzy boundaries. Moreover, the gray levels at various pixels in grayscale images are considered to be imprecise, which means that a gray level resembles other nearby gray levels to certain extents. It is also true that pixels in a neighborhood with nearby gray levels have limited discernibility due to the inadequacy of contrast. For example, Figure 3.1(c) shows a 6 × 6 portion cut from the image in Figure 3.1(a). Although the portion contains gray values separated by 6 gray levels, it appears to be almost homogeneous. The aforementioned ambiguities in 3–1
3–2
Rough Fuzzy Image Analysis
(a) A grayscale image
(b) Fuzzy boundary
(c) Rough resemblance
FIGURE 3.1: Ambiguities in a grayscale image with sinusoidal gray value gradations in horizontal direction
images due to fuzzy boundaries of various regions and rough resemblance of nearby gray levels is studied and modeled in this chapter. Note that, the aforementioned ambiguities are related to the indefiniteness in deciding an image pixel as white or black and hence they can be collectively referred to as the grayness ambiguity (Pal, 1999). Fuzzy set theory of Lofti Zadeh, is based on the concept of vague boundaries of sets in the universe of discourse (Klir and Yuan, 2005). Rough set theory of Zdzislaw Pawlak, on the otherhand, focuses on ambiguity in terms of limited discernibility of sets in the domain of discourse (Pawlak, 1991). Therefore, fuzzy sets can be used to represent the grayness ambiguity in images due to the vague definition of region boundaries (fuzzy boundaries) and rough sets can be used to represent the grayness ambiguity due to the indiscernibility between individual or groups of gray levels (rough resemblance). Rough set theory, which was initially developed considering crisp equivalence approximation spaces (Pawlak, 1991), has been generalized by considering fuzzy (Dubois and Prade, 1990, 1992; Thiele, 1998) and tolerance (Skowron and Stepaniuk, 1996) approximation spaces. Furthermore, rough set theory, which was initially developed to approximate crisp sets, has also been generalized to approximate fuzzy sets (Dubois and Prade, 1990, 1992; Thiele, 1998). In this chapter, we propose the use of the rough set theory and its certain generalizations to quantify grayness ambiguity in images. Here the generalizations to rough set theory based on the approximation of crisp and fuzzy sets considering crisp equivalence, fuzzy equivalence, crisp tolerance and fuzzy tolerance approximation spaces in different combinations are studied. All these combinations give rise to different concepts for modeling vagueness, which can be quantified using the roughness measure (Pawlak, 1991). We propose classes of entropy measures which use the roughness measures obtained considering the aforementioned various concepts for modeling vagueness. We perform rigorous theoretical analysis of the proposed entropy measures and provide some properties which they satisfy. We then use the proposed entropy measures to quantify grayness ambiguity in images, giving an account of the manner in which the grayness ambiguity is captured. We show that the aforesaid generalizations to rough set theory regarding the approximation of fuzzy sets can be used to quantify grayness ambiguity due to both fuzzy boundaries and rough resemblance. We then propose an image thresholding methodology that employs grayness ambiguity
Image Thresholding using Generalized Rough Sets
3–3
measure obtained using the proposed classes of entropies. The strength of the proposed methodology lies in the fact that it does not make any prior assumptions about the image unlike many existing thresholding techniques. We present a novel bilevel thresholding scheme that performs thresholding by assigning a bin in the graylevel histogram of an image to one of the two classes based on the computation of certain association errors. In the methodology, the graylevel histogram is first divided into three regions, say, bright (a region of larger gray values), dark (a region of smaller gray values) and an undefined region. These regions are obtained using two predefined gray values, which are called the seed values. It is known (prior knowledge) that the bins of a graylevel histogram representing the smallest and the largest gray value would belong to the dark and bright regions, respectively. Hence, we consider that the graylevel bins of the histogram below the smaller seed value belong to the dark region and those above the larger seed value belong to the bright region. Rest of the graylevel bins form the undefined region. Then, each graylevel bin in the undefined region is associated with the defined regions, dark and bright, followed by the use of grayness ambiguity measure to obtain the errors due to the associations. The thresholding is then achieved by comparing the association errors and assigning each graylevel bin of the undefined region to one of the defined regions that corresponds to the lower association error. To carry out multilevel thresholding in a manner similar to the bilevel thresholding, more than two seed values would be required. Unlike bilevel thresholding, in the case of multilevel thresholding we do not posses the prior knowledge required to assign all the seed values. Hence, we present a binary tree structured technique that uses the proposed bilevel thresholding scheme in order to carry out multilevel thresholding. In this technique, each region (node) obtained at a particular depth are further separated using the proposed bilevel thresholding method to get the regions at the next higher depth. The required number of regions are obtained by proceeding to a sufficient depth and then discarding some regions at that depth using a certain criterion. As a region in the graylevel histogram of an image corresponds to a region in the image, the aforementioned thresholding methodology would divide the image into predefined (required) number of regions. Image thresholding operations for segmentation and edge extraction are carried out in this chapter employing grayness ambiguity measure obtained based on the proposed classes of entropies. The aforesaid image thresholding operations are performed in two ways, namely, by the ambiguity minimization method reported in (Pal et al., 1983) and by the proposed image thresholding methodology. Qualitative and quantitative experimental results obtained using aforementioned methods are compared to that obtained using a few popular existing image thresholding techniques in order to demonstrate the utility of the proposed entropy measures and the effectiveness of the proposed image thresholding methodology. The organization of this chapter is as follows. In Section 3.2, the proposed entropy measures and their properties are presented after briefly mentioning the existing entropy measures based on rough set theory. The use of the proposed entropy measures for quantifying grayness ambiguity in images is presented in Section 3.3. The explanation of the proposed image thresholding methodology is given in Section 3.4. In Section 3.5, experimental results are presented to demonstrate the utility and effectiveness of the proposed entropy measures and image thresholding methodology. The chapter concludes with Section 3.6.
3.2
Generalized Rough Set based Entropy Measures with respect to the Definability of a Set of Elements
3–4
Rough Fuzzy Image Analysis
Defining entropy measures based on rough set theory has been considered by researchers in the past decade. Probably, first such work was reported in (Beaubouef, Petry, and Arora, 1998), where a ‘rough entropy’ of a set in a universe has been proposed. This rough entropy measure is defined based on the uncertainty in granulation (obtained using a relation defined over universe (Pawlak, 1991)) and the definability of the set. Another entropy measure called the ‘rough schema entropy’ has been proposed in (Beaubouef et al., 1998) in order to quantify the uncertainty in granulation alone. Other entropy measures of granulation have been defined in (D¨ untsch and Gediga, 1998; Wierman, 1999; Liang, Chin, Dang, and Yam, 2002). Later, entropy measures of fuzzy granulation have been reported in (Bhatt and Gopal, 2004; Mi, Li, Zhao, and Feng, 2007). It is worthwhile to mention here that (Yager, 1992) and (Hu and Yu, 2005) respectively present and analyze an entropy measure, which, although not based on rough set theory, quantifies information with the underlying elements having limited discernibility between them. Incompleteness of knowledge about a universe leads to granulation (Pawlak, 1991) and hence a measure of the uncertainty in granulation quantifies this incompleteness of knowledge. Therefore, apart from the ‘rough entropy’ in (Beaubouef et al., 1998) which quantifies the incompleteness of knowledge about a set in a universe, the other aforesaid entropy measures quantify the incompleteness of knowledge about a universe. The effect of incompleteness of knowledge about a universe becomes evident only when an attempt is made to define a set in it. Note that, the definability of a set in a universe is not always affected by a change in the uncertainty in granulation. This is evident in a few examples given in (Beaubouef et al., 1998), which we do not repeat here for the sake of brevity. Hence, a measure of incompleteness of knowledge about a universe with respect to only the definability of a set is required. First attempt of formulating an entropy measure with respect to the definability of a set was made in (Pal, Shankar, and Mitra, 2005), which was used for image segmentation. However, as pointed out in (Sen and Pal, 2007), this measure does not satisfy the necessary property that the entropy value is maximum (or optimum) when the uncertainty (in this case, incompleteness of knowledge) is maximum. In this section, we propose classes of entropy measures, which quantify the incompleteness of knowledge about a universe with respect to the definability of a set of elements (in the universe) holding a particular property (representing a category). An inexactness measure of a set, like the ‘roughness’ measure (Pawlak, 1991), quantifies the definability of the set. We measure the incompleteness of knowledge about a universe with respect to the definability of a set by considering the roughness measure of the set and also that of its complement in the universe.
3.2.1
Roughness of a Set in a Universe
Let U denote a universe of elements and X be an arbitrary set of elements in U holding a particular property. According to rough set theory (Pawlak, 1991) and its generalizations, limited discernibility draws elements in U together governed by an indiscernibility relation R and hence granules of elements are formed in U . An indiscernibility relation (Pawlak, 1991) in a universe refers to the similarities that every element in the universe has with the other elements of the universe. The family of all granules obtained using the relation R is represented as U/R. The indiscernibility relation among elements and sets in U results in an inexact definition of X. However, the set X can be approximately represented by two
Image Thresholding using Generalized Rough Sets exactly definable sets RX and RX in U , which are obtained as [ RX = {Y ∈ U/R : Y ⊆ X} [ RX = {Y ∈ U/R : Y ∩ X 6= ∅}
3–5
(3.1) (3.2)
In the above, RX and RX are respectively called the R-lower approximation and the Rupper approximation of X. In essence, the pair of sets < RX, RX > is the representation of any arbitrary set X ⊆ U in the approximation space < U, R >, where X can not be defined. As given in (Pawlak, 1991), an inexactness measure of the set X can be defined as ρR (X) = 1 −
|RX| |RX|
(3.3)
where |RX| and |RX| are respectively the cardinalities of the sets RX and RX in U . The inexactness measure ρR (X) is called the R-roughness measure of X and it takes a value in the interval [0, 1].
3.2.2
The Lower and Upper Approximations of a Set
The expressions for the lower and upper approximations of the set X depends on the type of relation R and whether X is a crisp (Klir and Yuan, 2005) or a fuzzy (Klir and Yuan, 2005) set. Here we shall consider the upper and lower approximations of the set X when R denotes an equivalence, a fuzzy equivalence, a tolerance or a fuzzy tolerance relation and X is a crisp or a fuzzy set. When X is a crisp or a fuzzy set and the relation R is a crisp or a fuzzy equivalence relation, we consider the expressions for the lower and the upper approximations of the set X as
where M (u)
=
RX
= {(u, M (u))| u ∈ U }
RX
= {(u, M (u))| u ∈ U }
X Y ∈U/R
M (u)
=
X
Y ∈U/R
(3.4)
mY (u) × inf max(1 − mY (ϕ), µX (ϕ)) ϕ∈U
mY (u) × sup min(mY (ϕ), µX (ϕ))
(3.5)
ϕ∈U
where the membership function mY represents the belongingness of every element (u) in the P universe (U ) to a granule Y ∈ U/R and it takes values in the interval [0, 1] such that Y mY (u) = 1, and µX , which takes values in the interval [0, 1], is the membership function associated with X. When X is a crisp set, µX would take values only from the set {0, 1}. Similarly, when R is a crisp equivalence relation mY would take values only from the set P {0, 1}. In the above, the symbols (sum) and × (product) respectively represent specific fuzzy union and intersection operations (Klir and Yuan, 2005), which are chosen judging their suitability with respect to the underlying application of measuring ambiguity. In the above, we have considered the indiscernibility relation R ⊆ U × U to be an equivalence relation, that is, R satisfies crisp or fuzzy reflexivity, symmetry and transitivity properties (Klir and Yuan, 2005). We shall also consider here the case when the transitivity property is not satisfied. Such a relation R is said to be a tolerance relation (Skowron
3–6
Rough Fuzzy Image Analysis
TABLE 3.1
The different names of < RX, RX >
X Crisp Fuzzy Crisp Fuzzy Crisp Fuzzy Crisp Fuzzy
R mY (u) ∈ {0, 1} (crisp equivalence ) mY (u) ∈ {0, 1} (crisp equivalence) mY (u) ∈ [0, 1](fuzzy equivalence) mY (u) ∈ [0, 1](fuzzy equivalence) SR : U × U → {0, 1}(crisp tolerance) SR : U × U → {0, 1}(crisp tolerance) SR : U × U → [0, 1](fuzzy tolerance) SR : U × U → [0, 1](fuzzy tolerance)
< RX, RX > rough set of X rough-fuzzy set of X fuzzy rough set of X fuzzy rough-fuzzy set of X tolerance rough set of X tolerance rough-fuzzy set of X tolerance fuzzy rough set of X tolerance fuzzy rough-fuzzy set of X
and Stepaniuk, 1996). When R is a tolerance relation, we consider the expressions for the membership values corresponding to the lower and upper approximations (see (3.5)) of an arbitrary set X in U as M (u)
=
M (u)
=
inf max(1 − SR (u, ϕ), µX (ϕ))
ϕ∈U
sup min(SR (u, ϕ), µX (ϕ))
(3.6)
ϕ∈U
where SR (u, ϕ) is a value representing the tolerance relation R between u and ϕ. Note that, two different notions of expressing the upper and lower approximations of a set, exist in literature pertaining to rough set theory (Radzikowska and Kerre, 2002). Among the two notion, one is based on concept of similarity and the other is based on concept of granulation due to limited discernibility. We use the first aforesaid notion in (3.6) and the second aforesaid notion in (3.5), considering aspects of their practical implementation for measuring ambiguity. We refer the pair of sets < RX, RX > differently depending on whether X is a crisp or a fuzzy set; the relation R is a crisp or a fuzzy equivalence, or a crisp or a fuzzy tolerance relation. The different names are listed in Table 3.1.
3.2.3
The Entropy Measures
As mentioned earlier, the lower and upper approximations of a vaguely definable set X in a universe U can be used in the expression given in (3.3) in order to get an inexactness measure of the set X called the roughness measure ρR (X). The vague definition of X in U signifies incompleteness of knowledge about U . Here we propose two classes of entropy measures based on the roughness measures of a set and its complement in order to quantify the incompleteness of knowledge about a universe. One of the proposed two classes of entropy measures is obtained by measuring the ‘gain in information’ or in our case the ‘gain in incompleteness’ using a logarithmic function as suggested in the Shannon’s theory (Shannon, 1948). This proposed class of entropy measures for quantifying the incompleteness of knowledge about U with respect to the definability of a set X ⊆ U is given as ρR (X) ρR (X { ) i 1h L (3.7) + ρR (X { ) logβ HR (X) = − ρR (X) logβ 2 β β where β denotes the base of the logarithmic function used and X { ⊆ U stands for the
3–7
Image Thresholding using Generalized Rough Sets
complement of the set X in the universe. The various entropy measures of this class are obtained by calculating the roughness values ρR (X) and ρR (X { ) considering the different ways of obtaining the lower and upper approximations of thevaguely definable set X. Note that, the ‘gain in incompleteness’ term is taken as − logβ
ρR β
in (3.7) and for β > 1
it takes a value in the interval [1, ∞]. The other class of entropy measures proposed is obtained by considering an exponential function (Pal and Pal, 1991) to measure the ‘gain in incompleteness’. This second proposed class of entropy measures for quantifying the incompleteness of knowledge about U with respect to the definability of a set X ⊆ U is given as { 1 E HR (X) = (3.8) ρR (X)β 1−ρR (X) + ρR (X { )β 1−ρR (X ) 2 where β denotes the base of the exponential function used. The authors in (Pal and Pal, 1991) had considered only the case when β equaled e. Similar to the class of entropy L measures HR , the various entropy measures of this class are obtained by using the different ways of obtaining the lower and upper approximations of X in order to calculate ρR (X)
and ρR (X { ). The ‘gain in incompleteness’ term is taken as β 1−ρR in (3.8) and for β > 1 it takes a value in the finite interval [1, β]. Note that, an analysis on the appropriate values L E that β in HR and HR can take is given later in Section 3.2.5. We shall name a proposed entropy measure using attributes that represent the class (logarithmic or exponential) it belongs to, and the type of the pair of sets < RX, RX > considered. For example, if < RX, RX > represents a tolerance rough-fuzzy set and the expression of the proposed entropy in (3.8) is considered, then we call such an entropy as the exponential tolerance rough-fuzzy entropy. Some other examples of names for the proposed entropy measures are, the logarithmic rough entropy, the exponential fuzzy rough entropy and the logarithmic tolerance fuzzy rough-fuzzy entropy.
3.2.4
Relation between ρR (X) and ρR (X { )
Let us first consider a brief discussion on fuzzy set theory based uncertainty measures. Assume that a set F S is fuzzy in nature and it is associated with a membership function µF S . As mentioned in (Pal and Bezdek, 1994), most of the appropriate fuzzy set theory based uncertainty measures can be grouped into two classes, namely, the multiplicative class and the additive class. It should be noted from (Pal and Bezdek, 1994) that the measures belonging to these classes are functions of µF S and µF S { where µF S = 1 − µF S { . Now, as mentioned in (Jumarie, 1990) and pointed out in (Pal and Bezdek, 1994), the existence of an exact relation between µF S and µF S { suggests that they ‘theoretically’ convey the same. However, sometimes such unnecessary terms should to be retained as dropping them would cause the corresponding measures to fail certain important properties (Pal and Bezdek, 1994). We shall now analyze the relation between ρR (X) and ρR (X { ), and show that there exist no unnecessary terms in the classes of entropy measures (see (3.7) and (3.8)) proposed using rough set theory and its certain generalizations. As it is known that ρR (X) takes a value in the interval [0, 1], let us consider ρR (X) =
1 , 1≤C≤∞ C
(3.9)
Let us now find the range of values that ρR (X { ) can take when the value of ρR (X) is given. Let the total number of elements in the universe U under consideration be n. As we have
3–8
Rough Fuzzy Image Analysis
X ∪ X { = U , it can be easily deduced that RX ∪ RX { = U and RX ∪ RX { = U . Using these deductions, from (3.3) we get |RX| |RX|
(3.10)
n − |RX| |RX { | =1− n − |RX| |RX { |
(3.11)
ρR (X) = 1 − ρR (X { ) = 1 −
From (3.9), (3.10) and (3.11), we deduce that 1 |RX| = ρR (X { ) = ρR (X) n − |RX| C
|RX| n − |RX|
! (3.12)
We shall now separately consider three cases of (3.12), where we have 1 < C < ∞, C = 1 and C = ∞. |RX| When we have 1 < C < ∞, we get the relation |RX| from (3.9). Using this = C−1 C relation in (3.12) we obtain ! C 1 |RX|( C−1 ) { (3.13) ρR (X ) = C n − |RX| After some algebraic manipulations, we deduce 1 ρR (X ) = C −1 {
!
1 n |RX|
−1
(3.14)
Note that, when 1 < C < ∞, ρR (X) takes value in the interval (0, 1). Therefore, in this case, the value of |RX| could range from a positive infinitesimal quantity, say , to a maximum value of n. Hence, we have C −1 C −1 ≤ |RX| ≤ n (3.15) C C Using (3.15) in (3.14), we get ≤ ρR (X { ) ≤ 1 (3.16) nC − (C − 1) As 1 < C < ∞, << 1 and usually n >> 1, we may write (3.16) as 0 < ρR (X { ) ≤ 1
(3.17)
Thus, we may conclude that for a given non-zero and non-unity value of ρR (X), ρR (X { ) may take any value in the interval (0, 1]. When C = 1 or ρR (X) takes a unity value, |RX| = 0 and the value of |RX| could range from to a maximum value of n. Therefore, it is easily evident from (3.12) that ρR (X { ) may take any value in the interval (0, 1] when ρR (X) = 1. Let us now consider the case when C = ∞ or ρR (X) = 0. In such a case, the value of |RX| could range from zero to a maximum value of n and |RX| = |RX|. As evident from (3.12), when C = ∞, irrespective of any other term, we get ρR (X { ) = 0. This is obvious, as a exactly definable set X should imply an exactly definable set X { . Therefore, we find that the relation between ρR (X) and ρR (X { ) is such that, if one of them is considered to take a non-zero value (that is, the underlying set is vaguely definable or inexact), the value of the other, which would also be a non-zero quantity, can not be uniquely specified. Therefore, there exist no unnecessary terms in the proposed classes of entropy measures given in (3.7) and (3.8). However, from (3.10) and (3.11), it is easily evident that ρR (X) and ρR (X { ) are positively correlated.
Image Thresholding using Generalized Rough Sets
3.2.5
3–9
Properties of the Proposed Classes of Entropy Measures
In the previous two subsections we have proposed two classes of entropy measures and we have shown that the expressions for the proposed entropy measures do not have any unnecessary terms. However, the base parameters β (see (3.7) and (3.8)) of the two classes of entropy measures incur certain restrictions, so that the proposed entropies satisfy some important properties. In this subsection, we shall discuss the restrictions regarding the base parameters and then provide few properties of the proposed entropies. Range of Values for the Base β L E The proposed classes of entropy measures HR and HR respectively given in (3.7) and (3.8) must be consistent with the fact that maximum information (entropy) is available when the uncertainty is maximum and the entropy is zero when there is no uncertainty. Note that, in our case, maximum uncertainty represents maximum possible incompleteness of knowledge about the universe. Therefore, maximum uncertainty occurs when both the roughness values L E used in HR and HR equal unity and uncertainty is zero when both of them are zero. It can L be easily shown that in order to satisfy the aforesaid condition, the base β in HR must take E a finite value greater than or equal to e(≈ 2.7183) and the base β in HR must take a value L E L in the interval (1, e]. When β ≥ e in HR and 1 < β ≤ e in HR , the values taken by both HR E and HR lie in the range [0, 1]. Note that, when β takes an appropriate value, the proposed entropy measures attain the minimum value of zero only when ρR (X) = ρR (X { ) = 0 and the maximum value of unity only when ρR (X) = ρR (X { ) = 1.
Properties
Here we present few properties of the proposed logarithmic and exponential classes of enE L as functions of two parameters representing roughand HR tropy measures expressing HR ness measures. We may respectively rewrite the expressions given in (3.7) and (3.8) in parametric form as follows i A B 1h L (3.18) + B logβ HR (A, B) = − A logβ 2 β β 1 E HR (A, B) = Aβ 1−A + Bβ 1−B (3.19) 2 where the parameters A (∈ [0, 1]) and B (∈ [0, 1]) represent the roughness values ρR (X) and ρR (X { ), respectively. Considering the convention 0 logβ 0 = 0, let us now discuss the L E properties of HR (A, B) and HR (A, B) along the lines of (Ebanks, 1983). L E P1. Nonnegativity: We have HR (A, B) ≥ 0 and HR (A, B) ≥ 0 with equality in both the cases if and only if A = 0 and B = 0. E P2. Continuity: As all first-order partial and total derivatives of HR (A, B) exists E for A, B ∈ [0, 1], HR (A, B) is a continuous function of A and B. On the other L hand, all first-order partial and total derivatives of HR (A, B) exists only for L A, B ∈ (0, 1]. However, it can be easily shown that limA→0, B→0 HR (A, B) = 0 L (HR (A, B) tends to zero when A and B tend to zero) using L’hospitals rule and L we have 0 logβ 0 = 0. Therefore, HR (A, B) is a continuous function of A and B, where A, B ∈ [0, 1]. L E P3. Sharpness: It is evident that both HR (A, B) and HR (A, B) equal zero if and only if the roughness values A and B equal zero, that is, A and B are ‘sharp’.
3–10
Rough Fuzzy Image Analysis
L E P4. Maximality: Both HR (A, B) and HR (A, B) attain their maximum value of unity L if and only if the roughness values A and B are unity. That is, we have HR (A, B) ≤ L E E HR (1, 1) = 1 and HR (A, B) ≤ HR (1, 1) = 1, where A, B ∈ [0, 1]. L L E E P5. Resolution: We have HR (A∗ , B ∗ ) ≤ HR (A, B) and HR (A∗ , B ∗ ) ≤ HR (A, B), where A∗ and B ∗ are respectively the sharpened version of A and B, that is, A∗ ≤ A and B ∗ ≤ B. L L E E P6. Symmetry: It is evident that HR (A, B) = HR (B, A) and HR (A, B) = HR (B, A). L E Hence HR (A, B) and HR (A, B) are symmetric about the line A = B. L P7. Monotonicity: The first-order partial and total derivatives of HR (A, B), when A, B ∈ (0, 1], are
L L δHR dHR 1h A + = = − logβ δA dA 2 β L L B δHR dHR 1h + = = − logβ δB dB 2 β
1 i ln β 1 i ln β
(3.20)
L For the appropriate values of β in HR (A, B), where A, B ∈ (0, 1], we have L L L L δHR dHR δHR dHR = ≥ 0 and = ≥0 δA dA δB dB
(3.21)
L L Since we have HR (A, B) = 0 when A = B = 0 and HR (A, B) > 0 when A, B ∈ L (0, 1], we may conclude from (3.20) and (3.21) that HR (A, B) is a monotonically non-decreasing function. In a similar manner, for appropriate values of β in E HR (A, B), where A, B ∈ (0, 1], we have
i E E δHR dHR 1h = = β (1−A) − Aβ (1−A) ln β ≥ 0 δA dA 2 i E E dHR 1h δHR = = β (1−B) − Bβ (1−A) ln β ≥ 0 δB dB 2
(3.22)
E E We also have HR (A, B) = 0 when A = B = 0 and HR (A, B) > 0 when A, B ∈ E (0, 1], and hence we may conclude from (3.22) that HR (A, B) is a monotonically non-decreasing function. P8. Concavity: A two dimensional function f un(A, B) is concave on a two dimensional interval < [amin , amax ], [bmin , bmax ] >, if for any four points a1 , a2 ∈ [amin , amax ] and b1 , b2 ∈ [bmin , bmax ], and any λa , λb ∈ (0, 1)
f un(λa a1 + (1 − λa )a2 , λb b1 + (1 − λb )b2 ) ≥ λ11 f un(a1 , b1 ) + λ12 f un(a1 , b2 ) +λ21 f un(a2 , b1 ) + λ22 f un(a2 , b2 ) where λ11 = λa λb , λ12 = λa (1 − λb ), λ21 = (1 − λa )λb , λ22 = (1 − λa )(1 − λb )
(3.23)
Image Thresholding using Generalized Rough Sets
(a) Plot of the proposed logarithmic class of entropies for various roughness values A and B
3–11
(b) Plot of the proposed exponential class of entropies for various roughness values A and B
FIGURE 3.2: Plots of the proposed classes of entropy measures
FIGURE 3.3: Plots of the proposed entropy measures for a few values of the base β, when A = B
L E Both HR (A, B) and HR (A, B) are concave functions of A and B, where A, B ∈ [0, 1], as they satisfy the inequality given in (3.23) when appropriate values of β and the convention 0 logβ 0 = 0 are considered. L E The plots of the proposed classes of entropies HR and HR as functions of A and B are L E given in Figures 3.2 and 3.3, respectively. In Figure 3.2, the values of HR and HR are shown for all possible values of the roughness measures A and B considering β = e. Figure 3.3 shows the plots of the proposed entropies for different values of the base β, when A = B.
3.3
Measuring Grayness Ambiguity in Images
In this Section, we shall use the entropy measures proposed in the previous section in order to quantify the grayness ambiguity (See Section 3.1) in a grayscale image. As we shall see later, the entropy measures based on the generalization of rough set theory regarding the
3–12
Rough Fuzzy Image Analysis
approximation of fuzzy sets (that is, when the set X considered in the previous section is fuzzy) can be used to quantify grayness ambiguity due to both fuzzy boundaries and rough resemblance. Whereas, the entropy measures based on the generalization of rough set theory regarding the approximation of crisp sets (that is, when the set X considered in the previous section is crisp) can be used to quantify grayness ambiguity only due to rough resemblance. Now, we shall obtain the grayness ambiguity measure by considering the fuzzy boundaries of regions formed based on global gray value distribution and the rough resemblance between nearby gray levels. The image is considered as an array of gray values and the measure of consequence of the incompleteness of knowledge about the universe of gray levels in the array quantifies the grayness ambiguity. Note that, the measure of incompleteness of knowledge about a universe with respect to the definability of a set must be used here, as the set would be employed to capture the vagueness in region boundaries. Let G be the universe of gray levels and ΥT be a set in G, that is ΥT ⊆ G, whose elements hold a particular property to extents given by a membership function µT defined on G. Let OI be the graylevel histogram of the image I under consideration. The fuzzy boundaries and rough resemblance in I causing the grayness ambiguity are related to the incompleteness of knowledge about G, which can be quantified using the proposed classes of entropy measures in 3.2.3. We shall consider ΥT such that it represents the category ‘dark areas’ in the image I and the associated property ‘darkness’ given by the membership function µT shall be modeled using the Z-function (Klir and Yuan, 2005) as given below 1 #2 " (l−(T −∆)) 2∆ 1−2 " #2 µT (l) = Z(l; T, ∆) = (l−(T +∆)) 2 2∆ 0
l ≤T −∆ T −∆≤l ≤T ; l∈G
(3.24)
T ≤l ≤T +∆ l ≥T +∆
where T and ∆ are respectively called the crossover point and the bandwidth. We shall consider the value of ∆ as a constant and that different definitions of the property ‘darkness’ can be obtained by changing the value of T , where T ∈ G. In order to quantify the grayness ambiguity in the image I using the proposed classes of entropy measures, we consider the following sets ΥT
= {(l, µT (l))| l ∈ G}
Υ{T
= {(l, 1 − µT (l))| l ∈ G}
(3.25)
The fuzzy sets ΥT and Υ{T considered above capture the fuzzy boundary aspect of the grayness ambiguity. Furthermore, we consider limited discernibility among the elements in G that results in vague definitions of the fuzzy sets ΥT and Υ{T , and hence the rough resemblance aspect of the grayness ambiguity is also captured. Granules, with crisp or fuzzy boundaries, are induced in G as its elements are drawn together due to the presence of limited discernibility (or indiscernibility relation) among them and this process is referred to as the graylevel granulation. We assume that the indiscernibility relation is uniform in G and hence the granules formed have a constant support cardinality (size), say, ω. Now, using (3.4), (3.5) and (3.6), we get general expressions for the different lower and upper approximations of ΥT and Υ{T obtained considering the
3–13
Image Thresholding using Generalized Rough Sets different indiscernibility relations discussed in Section 3.2.2 as follows ΥT = {(l, MΥT (l))| l ∈ G}, ΥT = {(l, MΥT (l))| l ∈ G} Υ{T = {(l, MΥ{ (l))| l ∈ G}, Υ{T = {(l, MΥ{ (l))| l ∈ G} T
(3.26)
T
where we have MΥT (l) MΥT (l) MΥ{ (l) T
MΥ{ (l) T
= = = =
γ X i=1 γ X i=1 γ X i=1 γ X i=1
mzωi (l) × inf max(1 − mzωi (ϕ), µT (ϕ)) ϕ∈G
mzωi (l) × sup min(mzωi (ϕ), µT (ϕ)) ϕ∈G
mzωi (l) × inf max(1 − mzωi (ϕ), 1 − µT (ϕ)) ϕ∈G
mzωi (l) × sup min(mzωi (ϕ), 1 − µT (ϕ))
(3.27)
ϕ∈G
when equivalence indiscernibility relation is considered and we have MΥT (l) = inf max(1 − Sω (l, ϕ), µT (ϕ)), MΥT (l) = sup min(Sω (l, ϕ), µT (ϕ)) ϕ∈G
ϕ∈G
MΥ{ (l) = inf max(1 − Sω (l, ϕ), 1 − µT (ϕ)), MΥ{ (l) = sup min(Sω (l, ϕ), 1 − µT (ϕ))(3.28) T
ϕ∈G
T
ϕ∈G
when tolerance indiscernibility relation is considered. In the above, γ denotes the number of granules formed in the universe G and mzωi (l) gives the membership grade of l in the ith granule zω i . These membership grades may be calculated using any concave, symmetric and normal membership function (with support cardinality ω) such as the one having triangular, trapezoidal or bell (example, the π function) shape. Note that, the sum of these membership grades over all the granules must be unity for a particular value of l. In (3.28), Sω : G×G → [0, 1], which can be any concave and symmetric function, gives the relation between any two gray levels in G. The value of Sω (l, ϕ) is zero when the difference between l and ϕ is greater than ω and Sω (l, ϕ) equals unity when l equals ϕ. The lower and upper approximations of the sets ΥT and Υ{T take different forms depending on the nature of rough resemblance considered, and whether the need is to capture grayness ambiguity due to both fuzzy boundaries and rough resemblance or only those due to rough resemblance. The nature of rough resemblance may be considered such that an equivalence ω relation between gray levels induces granules having crisp (crisp zω i ) or fuzzy (fuzzy zi ) boundaries, or there exists a tolerance relation between between gray levels that may be crisp (Sω : G × G → {0, 1}) or fuzzy (Sω : G × G → [0, 1]). When the sets ΥT and Υ{T considered are fuzzy sets, grayness ambiguity due to both fuzzy boundaries and rough resemblance would be captured. Whereas, when the sets ΥT and Υ{T considered are crisp sets, only the grayness ambiguity due to rough resemblance would be captured. The different forms of the lower and upper approximation of ΥT are shown graphically in Figure 3.4. We shall now quantify the grayness ambiguity in the image I by measuring the consequence of the incompleteness of knowledge about the universe of gray levels G in I. This measurement is done by calculating the following values P P (l)OI (l) l∈G MΥ{ T l∈G MΥT (l)OI (l) { (3.29) , %ω (ΥT ) = 1 − P %ω (ΥT ) = 1 − P l∈G MΥT (l)OI (l) l∈G MΥ{ (l)OI (l) T
3–14
Rough Fuzzy Image Analysis
(a) Crisp ΥT and Crisp zω i
(b) Fuzzy ΥT and Crisp zω i
(c) Crisp ΥT and Fuzzy zω i
(d) Fuzzy ΥT and Fuzzy zω i
(e) Crisp ΥT and Sω : G × G → {0, 1}
(f) Fuzzy ΥT and Sω : G × G → {0, 1}
(g) Crisp ΥT and Sω : G × G → [0, 1]
(h) Fuzzy ΥT and Sω : G × G → [0, 1]
FIGURE 3.4: The different forms that the lower and upper approximation of ΥT can take when used to get the grayness ambiguity measure
The grayness ambiguity measure Λ of I is obtained as a function of T , which characterizes the underlying set ΥT , as follows %ω (ΥT ) %ω (Υ{T ) 1 { % (Υ ) log (3.30) + % (Υ ) log ΛL (T ) = − ω T ω β β T ω 2 β β Note that, the above expression is obtained by using %ω (ΥT ) and %ω (Υ{T ) in the proposed logarithmic (L) class of entropy functions given in (3.8), instead of roughness measures. When the proposed exponential (E) class of entropy functions is used, we get 1 1−%ω (ΥT ) 1−%ω (Υ{ { T) ΛE (T ) = (3.31) % (Υ )β + % (Υ )β ω T ω ω T 2 It should be noted that the values %ω (ΥT ) and %ω (Υ{T ) in (3.29) are obtained by considering ‘weighted cardinality’ measures instead of cardinality measures, which are used for calcu-
Image Thresholding using Generalized Rough Sets
3–15
lating roughness values (see (3.3)). The weights considered are the number of occurrences of gray values given by the graylevel histogram OI of the image I. Therefore, the weighted cardinality of the underlying set (in G) gives the number of pixels in the image I that take the gray values belonging to that set. From (3.30) and (3.31), we see that the grayness ambiguity measure lies in the range [0, 1], where a larger value means higher ambiguity.
3.4
Image Thresholding based on Association Error
In this section, we propose a new methodology to perform image thresholding using the grayness ambiguity measure presented in the previous section. The proposed methodology does not make any prior assumptions about the image unlike many existing thresholding techniques. As boundaries of regions in an image are in general not well-defined and nearby gray values are indiscernible, we consider here that the various areas in an image are ambiguous in nature. We then use grayness ambiguity measures of regions in an image to perform thresholding in that image.
3.4.1
Bilevel Thresholding
Here we propose a methodology to carry out bilevel image thresholding based on the analysis of the graylevel histogram of the image under consideration. Let us consider two regions in the graylevel histogram of an image I containing a few graylevel bins corresponding to the dark and bright areas of the image, respectively. These regions are obtained using two predefined gray values, say gd and gb , with the graylevel bins in the range [gb , gmax ] representing the initial bright region and the graylevel bins in the range [gmin , gd ] representing the initial dark region. The symbols gmin and gmax represent the lowest and highest gray value of the image, respectively. A third region given by the graylevel bins in the range (gd , gb ) is referred to as the undefined region. Now, let the association of a graylevel bin from the undefined region to the initial bright region causes an error of Errb units and the association of a graylevel bin from the undefined region to the initial dark region results in an error of Errd units. Then, if Errd > Errb (Errb > Errd ), it would be appropriate to assign the graylevel bin from the undefined region to the bright (dark) region. The Proposed Methodology
Here we present the methodology to calculate the error caused due to the association of a graylevel bin from the undefined region to a defined region. Using this method we shall obtain the association errors corresponding to the dark and bright regions, that is, Errd and Errb . Each of these association errors comprise of two constituent error measure referred to as the proximity error and the change error. Let Hi represent the value of the ith bin of the graylevel histogram of a grayscale image I. We may define Sb , the array of all the graylevel bins in the initial bright region as Sb = [Hi : i ∈ Gb ], where Gb = [gb , gb + 1, . . . , gmax ]
(3.32)
and Sd , the array of all the graylevel bins in the initial dark region as Sd = [Hi : i ∈ Gd ], where Gd = [gmin , . . . , gd − 1, gd ]
(3.33)
Now, consider that a graylevel bin from the undefined region corresponding to a gray value ga has been associated to the initial bright region. The bright region after the association
3–16
Rough Fuzzy Image Analysis
is represented by an array Sba as Sba = [Hia : i ∈ Gab ], where Gab = [ga , . . . , gb , . . . , gmax ] Hia = Hi when (i = ga or i ≥ gb ), Hia = 0 elsewhere.
(3.34)
In a similar manner, the dark region after the association is represented by an array Sda as Sda = [Hia : i ∈ Gad ], where Gad = [gmin , . . . , gd , . . . , ga ]
(3.35)
Hia = Hi when (i = ga or i ≤ gd ), Hia = 0 elsewhere. In order to decide whether the graylevel bin corresponding to the gray value ga belongs to the bright or dark region, we need to determine the corresponding errors Errd and Errb . As mentioned earlier, our measure of an association error (Err) comprises of a proximity error measure (ep ) and a change error measure (ec ). We represent an association error as Err = (α + βec ) + ep
(3.36)
where α and β are constants such that α + βec and ep take values from the same range, say, [0, 1]. In order to determine the errors ep and ec corresponding to the bright and dark regions, let us consider the arrays Sba and Sda , respectively. We define the change error due to the association in the bright region as ebc =
GA(Sba ) − GA(S´ba ) GA(S a ) + GA(S´a ) b
(3.37)
b
where the array S´ba is obtained by replacing Hgaa by 0 in Sba and GA(SΩ ) gives the grayness ambiguity in the image region represented by the graylevel bins in an array SΩ . The grayness ambiguity in the image region is calculated using the expression in (3.30) or (3.31). Note that, the grayness ambiguity is calculated for a region in an image here and not for the whole image as presented in Section 3.3. Now, in a similar manner, the change error due to the association in the dark region is given as edc =
GA(Sda ) − GA(S´da ) GA(S a ) + GA(S´a ) d
(3.38)
d
where the array S´da is obtained by replacing Hgaa by 0 in Sda . It is evident that the expressions in (3.37) and (3.38) measure the change in grayness ambiguity of the regions due to the association of ga and hence we refer the measures as the change errors. The form of these expressions is chosen so as to represent the measured change as the contrast in grayness ambiguity, which is given by the ratio of difference in grayness ambiguity to average grayness ambiguity. As can be deduced from (3.37) and (3.38), the change errors would take values in the range [−1, 1]. It is also evident from (3.37) and (3.38) that the change error may take a pathological value of 0/0. In such a case we consider the change error to be 1. Next, we define the proximity errors due to the associations in the bright and dark regions respectively as ebp = and
edp
=
1 − GA(S´ba ) 1 − C × GA(S´a ) d
(3.39) (3.40)
In the above, we take edp = 0, if C × GA(S´da ) > 1. It will be evident later from the explanation of the function GA(·), that the grayness ambiguity measures in (3.39) and
Image Thresholding using Generalized Rough Sets
3–17
(3.40) increase with the increase in proximity of the graylevel bin corresponding to ga from the corresponding regions. Thus the expressions in (3.39) and (3.40) give measures of farness of the graylevel bin corresponding to ga from the regions and hence we refer the measures as the proximity errors. The symbol C is a constant such that the values of ebp and edp when ga equals gb − 1 and gd + 1, respectively, are same and hence the proximity error values are not biased towards any region. As can be deduced from (3.39) and (3.40), the proximity errors would take values in the range [0, 1]. The various arrays defined in this section are graphically shown in Figure 3.5.
FIGURE 3.5: The various defined arrays shown for a multimodal histogram
From Section 3.3, we find that we need to define the crossover point T , the bandwidth ∆ of the Z-function and the granule size ω in order to measure grayness ambiguity. For the calculation of the association errors corresponding to the bright and dark regions, we define the respective crossover points as ga + gb 2 gd + ga and Td = 2 Tb =
(3.41) (3.42)
Considering the above expressions for the crossover points and the explanation in Section 3.3, it can be easily deduced that the grayness ambiguity measures in (3.39) and (3.40) increase with the increase in proximity of the graylevel bin corresponding to ga from the defined regions. While calculating the association errors corresponding to both the bright and dark regions, it is important that same bandwidth (∆) and same granule size (ω) be considered. As presented earlier in (3.36), the errors due to the association of a gray value from the
3–18
Rough Fuzzy Image Analysis
undefined region to the dark and the bright region are given as Errd = (α + βedc ) + edp Errb = (α +
βebc )
+
ebp
(3.43) (3.44)
We calculate the association errors Errd and Errb for all graylevel bins corresponding to ga ∈ (gd , gb ), that is, the graylevel bins of the undefined region. We then compare the corresponding association errors and assign these graylevel bins to one of the two defined (dark and bright) regions that corresponds to the lower association error. In (3.43) and (3.44), we consider α = β = 0.5 and hence force the range of contribution from the change errors to [0, 1], same as that of the proximity errors. Thus the bilevel thresholding is achieved by separating the bins of the graylevel histogram into two regions, namely, the dark and the bright regions. As a region in the graylevel histogram of an image corresponds to a region in the image, the aforesaid bilevel thresholding would divide the image into two regions.
3.4.2
Multilevel Thresholding
Here we extend above given novel bilevel image thresholding methodology to the multilevel image thresholding problem. Note that, we do not posses the prior knowledge required to assign more than two seed values for carrying out multilevel thresholding. Therefore, we understand that the concept of thresholding based on association error can be used to separate a histogram only into two regions and then these regions can further be separated only into two regions each and so on. From this understanding, we find that the proposed concept of thresholding using association error could be used in a binary tree structured technique in order to carry out multilevel thresholding. Now, let us consider that we require a multilevel image thresholding technique using association error in order to separate a image into Θ regions. Let D be a non-negative integer such that 2D−1 < Θ ≤ 2D . In our approach to multilevel image thresholding for obtaining Θ regions, we first separate the graylevel histogram of the image into 2D regions. The implementation of this approach can be achieved using a binary tree structured algorithm (Breiman, Friedman, Olshen, and Stone, 1998). Note that in (Breiman et al., 1998), the binary tree structure has been used for classification purposes, which is not our concern. In our case, we use the binary tree structure to achieve multilevel image thresholding using association error, which is an unsupervised technique. We list a few characteristics of a binary tree below stating what they represent when used for association error based multilevel image thresholding. 1. A node of the binary tree would represent a region in the histogram. 2. The root node of the binary tree represents the histogram of the whole image. 3. The depth of a node is given by D. At any depth D we always have 2D nodes (regions). 4. Splitting at each node is performed using the bilevel image thresholding technique using association error proposed in the previous section. 5. All the nodes at a depth D are terminal nodes when our goal is to obtain 2D regions in the histogram. In order to get Θ regions from the 2D regions, we need to declare certain bilevel thresholding of histogram regions (node) at depth D −1 as invalid. In order to do so, we define a measure (ι) of a histogram region based on the association errors Errd and Errb obtained for the
Image Thresholding using Generalized Rough Sets
3–19
FIGURE 3.6: Separation of a histogram into three regions using the proposed multilevel thresholding based on association error
values of ga (see Section 3.4.1) corresponding to the histogram region as follows ι=
X
Errd (ga ) + Errb (ga )
(3.45)
ga ∈(gd ,gb )
where ga , gd and gb are the same as explained in the previous section, except for the fact that they are defined for the underlying histogram region and not for the entire histogram. We use the expression in (3.45) to measure the suitability of the application of the bilevel image thresholding technique to all the regions in the graylevel histogram at the depth D−1. Larger the value of ι for a region of the graylevel histogram, more is the corresponding average association error and hence more is the suitability. Hence, in order to get Θ regions, we declare the bilevel thresholding of 2D − Θ least suitable (based on ι) regions at depth D − 1 as invalid and hence we are left with Θ regions at depth D. Now, as a region in the graylevel histogram of an image corresponds to a region in the image, the aforesaid multilevel thresholding would divide the image into Θ regions. Figure 3.6 graphically demonstrates the use of proposed multilevel thresholding technique using association error in order to obtain three regions (Regions 1, 2 and 3) in the histogram. The values ι1 and ι2 gives the suitability of the application of the bilevel thresholding on the two regions at depth D = 1.
3.5
Experimental Results
In this Section, we demonstrate the utility of the proposed entropy measures and effectiveness of the proposed image thresholding methodology by considering some image segmentation and edge extraction tasks. Grayness ambiguity measure based on the proposed entropies are employed to carry out image thresholding in order to perform image segmentation and edge extraction. As mentioned in Section 3.1, the aforesaid image thresholding is performed in two ways, namely, by the ambiguity minimization method reported in (Pal et al., 1983) and by the image thresholding methodology proposed earlier in this chapter. Results obtained using a few popular existing image thresholding algorithms are also con-
3–20
Rough Fuzzy Image Analysis
sidered for qualitative and quantitative performance comparison with that obtained using the two aforementioned techniques. Throughout this section, we consider the grayness ambiguity measure given in (3.30), which signifies measuring the ambiguity using the proposed logarithmic class of entropy functions. The quantities in (3.29) which are used in (3.30) are calculated considering that the pairs of lower and upper approximations of the sets ΥT and Υ{T represent a tolerance fuzzy rough-fuzzy set. The aforesaid statement, according to the terminology given in Section 3.2.3, signifies that logarithmic tolerance fuzzy rough-fuzzy entropy is used in this section to get the grayness ambiguity measure. We consider the values of the parameters ∆ and ω respectively as 8 and 6 gray levels, and the base β as e, without loss of generality. Note that, the logarithmic tolerance fuzzy rough-fuzzy entropy is a representative of the proposed entropies which can be used to capture grayness ambiguity due to both fuzzy boundaries and rough resemblance.
(a) The Image
(b) Graylevel Histogram
(c) Segmentation by (i)
(d) Segmentation by (ii)
(e) Segmentation by (iii)
(f) Segmentation by (iv)
(g) Segmentation by (v)
(h) Segmentation by (vi)
(i) Segmentation by (vii)
FIGURE 3.7: Segmentation obtained using the various thresholding algorithms applied to separate dark and bright regions in an image
3–21
Image Thresholding using Generalized Rough Sets
3.5.1
Qualitative analysis
Segmentation
Let us consider here qualitative assessment of segmentation results in different images in order to evaluate the performance of various techniques. The techniques considered for comparison are: (i) the proposed thresholding methodology using the aforesaid grayness ambiguity measure, (ii) the ambiguity minimization based thresholding method reported in (Pal et al., 1983) using the aforesaid grayness ambiguity measure (iii) thresholding method by Otsu (Otsu, 1979), (iv) thresholding method by Kapur et al. (Kapur, Sahoo, and Wong, 1985), (v) thresholding method by Kittler et al. (Kittler and Illingworth, 1986), (vi) Thresholding method by Tsai (Tsai, 1985), and (vii) thresholding method by Pal et al. (Pal et al., 1983). The methods considered will henceforth be referred using their corresponding numbers.
(a) The Image
(b) Graylevel Histogram
(c) Segmentation by (i)
(d) Segmentation by (ii)
(e) Segmentation by (iii)
(f) Segmentation by (iv)
(g) Segmentation by (v)
(h) Segmentation by (vi)
(i) Segmentation by (vii)
FIGURE 3.8: Performance of the various thresholding algorithms applied to find the core and extent of the galaxy in an image
3–22
Rough Fuzzy Image Analysis
In Figure 3.7, we consider an image with almost a bell-shaped graylevel histogram. Separation of dark and bright regions in this image is a non-trivial task as the histogram is not well-defined for thresholding using many existing algorithms. It is evident from the figure that the proposed bilevel thresholding methodology (algorithm (i)), algorithm (iii) and algorithm (vi) perform much better than the others in separating the dark areas in the image from the bright ones. An image of a galaxy is considered in Figure 3.8. The graylevel histogram of this image is almost unimodal in nature and hence extracting multiple regions from it is a non-trivial task. We use the proposed multilevel thresholding scheme and the various other schemes to find out the total extent and the core region of the galaxy. It is evident from the figure that the results obtained using the proposed thresholding methodology is as good as some of the others. While the ‘white’ shaded area in the result obtained
(a) The Image
(b) Graylevel Histogram
(c) Segmentation by (i)
(d) Segmentation by (ii)
(e) Segmentation by (iii)
(f) Segmentation by (iv)
(g) Segmentation by (v)
(h) Segmentation by (vi)
(i) Segmentation by (vii)
FIGURE 3.9: Performance of the various thresholding algorithms applied to segment a ‘low contrast’ image into three regions
using algorithm (i) represent a region slightly larger than the core region, the ‘white’ shaded
3–23
Image Thresholding using Generalized Rough Sets
area in the result obtained using algorithm (vi) represents a region slightly smaller than the core region. Figure 3.9 presents an image where the sand, sea and sky regions are to be separated. The image has a multimodal histogram and it is evident from the image that the average gray values of the three regions do not differ by much. As can be seen from the figure, the proposed multilevel thresholding methodology (algorithm (i)) performs better than some of the others and as good as algorithms (ii), (iv) and (vii). The results in Figures 3.7(c) and (d), Figures 3.8(c) and (d) and Figures 3.9(c) and (d) demonstrate the utility of the proposed logarithmic tolerance fuzzy rough-fuzzy entropy. Note that, as described in Section 3.4, two values gd and gb are needed to be predefined in order to use the proposed thresholding methodology. We have considered gd = gmin + 20 and gb = gmax − 20. Edge Extraction
Let us consider here qualitative assessment of edge extraction results in different images in order to evaluate the performance of various techniques. We consider the gradient magnitude at every pixel in an image and determine thresholds from the associated gradient magnitude histogram in order to perform edge extraction in that image. Gradient magnitude histograms are in general unimodal and positively (right) skewed in nature. In literature, very few techniques have been proposed to carry out bilevel thresholding in such histograms. Among these techniques, we consider the following for comparison: (viii) unimodal histogram thresholding technique by Rosin (Rosin, 2001) and (ix) the thresholding technique by Henstock et al. (Henstock and Chelberg, 1996). In addition to the aforesaid techniques, we also consider here some of the existing thresholding techniques mentioned previously in this section.
(a) The Image
(b) The Histogram
(c) Edges by (i)
(d) Edges by (ii)
(e) Edges by (iii)
(f) Edges by (v)
(g) Edges by (viii)
(h) Edges by (ix)
FIGURE 3.10: Performance of the various thresholding algorithms applied to mark the edges in a gradient image
(a) The Image
(b) The Histogram
(c) Edges by (i)
(d) Edges by (ii)
(e) Edges by (iii)
(f) Edges by (iv)
(g) Edges by (v)
(h) Edges by (vi)
FIGURE 3.11: Qualitative performance of the various thresholding algorithms applied to obtain the edge, non-edge and possible edge regions in a gradient image
3–24
Rough Fuzzy Image Analysis
As described in Section 3.4, two values gd and gb are needed to be predefined in order to use the proposed thresholding methodology. While using the proposed thresholding methodology on gradient magnitude images, gd and gb represent two gradient magnitude values and we consider the input parameters as gd = gmin + max([10 g3% ]) and gb = gmax − max([10 g97% ]). The notation gρ% denotes the ρth percentile of the gradient magnitude in the distribution (gradient magnitude histogram). Figures 3.10 and 3.11 give the edge extraction performance of the various thresholding algorithms. In Figure 3.10, we find that the proposed technique does much better than the others in determining the valid edges and eliminating those due to the inherent noise and texture. In Figure 3.11, we find three regions in the gradient image. One (white) represents the gradient values which surely correspond to valid edges, another (black) represents those which surely do not correspond to valid edges and the third region (gray) represents the gradient values which could possibly correspond to valid edges. Such multilevel thresholding in gradient magnitude histograms could be used along with the hysteresis technique suggested in (Canny, 1986) in order to determine the actual edges. We see from the figure that the proposed techniques perform as good as or better than the others. Note that, the gradient magnitude at every pixel in an image is obtained using the operator given in (Canny, 1986), and edge thinning has not been done in the results shown in Figures 3.10 and 3.11, as it is not of much significance with respect to the intended comparisons.
3.5.2
Quantitative analysis
Here, we consider human labeled ground truth based quantitative evaluation of bilevel thresholding based segmentation in order to carry out a rigorous quantitative analysis. The Image Dataset Considered
We consider 100 grayscale images from the ‘Berkeley Segmentation Dataset and Benchmark’ (Martin, Fowlkes, Tal, and Malik, 2001). Each one of the 100 images considered are associated with multiple segmentation results hand labeled by multiple human subjects and hence we have multiple segmentation ground truths for every single image. The Evaluation Measure Considered
We use the local consistency error (LCE) measure defined in (Martin et al., 2001) in order to judge the appropriateness of segmentation results obtained by a bilevel thresholding algorithm. Consider SH as a segmentation result hand labeled by a human subject and SA as a segmentation result obtained applying an algorithm to be analyzed. The LCE measure representing the appropriateness of SA with reference to the ground truth SH is given as LCE(SH , SA ) =
n n o 1X min E(SH , SA , pi ), E(SA , SH , pi ) n i=1
where E(S1 , S2 , p) =
|R(S1 , p) \ R(S2 , p)| |R(S1 , p)|
(3.46)
(3.47)
In the above, \ represents set difference, |x| represents the cardinality of a set x, R(S, p) represents the set of pixels corresponding to the region in segmentation S that contains pixel p and n represents the number of pixels in the image under consideration. The LCE take values in the range [0, 1], where a smaller value indicates more appropriateness of the segmentation result SA (with reference to the ground truth SH ).
3–25
Image Thresholding using Generalized Rough Sets
As we have considered the evaluation of bilevel thresholding based segmentation here, every image under consideration would be separated into two regions. However, the number of regions in the human labeled segmentation ground truths of the 100 images considered is always more than two. Now, the LCE measure penalizes an algorithm only if both SH and SA are not refinements of each other at a pixel and it does not penalize an algorithm if any one of them is the refinement of the other at a pixel (Martin et al., 2001). Therefore, the use of LCE measure is desirable in our experiments, as we do not want to penalize an algorithm when SA is not a refinement of SH at a pixel and SH is a refinement of SA at that pixel. This aforesaid case is a very highly probable one in our experiments, as the number regions associated with SA would be much less than SH .
(a) Box plots for algorithm (i)
(b) Box plots diagrams for algorithm (ii)
(c) Box plots for algorithm (iii)
(d) Box plots for algorithm (iv)
(e) Box plots for algorithm (v)
(f) Box plots for algorithm (vi)
(g) Box plots for algorithm (vii)
FIGURE 3.12: Box plot based summarization of segmentation performance by various thresholding algorithms
3–26
Rough Fuzzy Image Analysis
Analysis of Performance
Consider Figure 3.12 that shows box plots (Tukey, 1977) which graphically depict the LCE values corresponding to the segmentation achieved by bilevel thresholding using algorithms (i) to (vii) mentioned earlier in this Section. A box plot, which in Figure 3.12 summarizes the LCE values obtained corresponding to all the segmentation ground truths available for an image, is given for all the 100 images considered. We find from the box plots that the LCE values corresponding to the algorithms (i), (ii), (v) and (vii) are in general smaller compared to that corresponding to the other algorithms. It is also evident that algorithms (ii) and (vii) perform almost equally well, with algorithm (ii) doing slightly better. From all the box plots in Figure 3.12, we find that segmentation results achieved by algorithms (i) and (v) are equally good and they give the best performance among the algorithms considered. The average of all the LCE values obtained using an algorithm is minimum when algorithm (v) is used. However, maximum number of zero LCE values are obtained when algorithm (i) is used. Hence, we say from quantitative analysis that algorithms (i) and (v) are equally good and they give the best segmentation results.
3.6
Conclusion
In this chapter, image thresholding operations using rough set theory and its certain generalizations have been introduced. Classes of entropy measures based on generalized rough sets have been proposed and their properties have been discussed. A novel image thresholding methodology based on grayness ambiguity in images has then been presented. For bilevel thresholding, every element of the graylevel histogram of an image has been associated with one of the two regions by comparing the corresponding errors of association. The errors of association have been based on the grayness ambiguity measures of the underlying regions and the grayness ambiguity measures have been calculated using the proposed entropy measures. Multilevel thresholding has been carried out using the proposed bilevel thresholding method in a binary tree structured algorithm. Segmentation and edge extraction have been performed using the proposed image thresholding methodology. Qualitative and quantitative experimental results have been given to demonstrate the utility of the proposed entropy measures and the effectiveness of the proposed image thresholding methodology.
Image Thresholding using Generalized Rough Sets
Bibliography Beaubouef, Theresa, Frederick E Petry, and Gurdial Arora. 1998. Information-theoretic measures of uncertainity for rough sets and rough relational databases. Information Sciences 109(1-4):185–195. Bhatt, R B, and M Gopal. 2004. Frid: fuzzy-rough interactive dichotomizers. In Proceedings of the ieee international conference on fuzzy systems, 1337–1342. Breiman, L, J H Friedman, R A Olshen, and C J Stone. 1998. Classification and regression trees. Boca Raton, Florida, U.S.A.: CRC Press. Canny, J. 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6):679–698. D¨ untsch, Ivo, and G¨ unther Gediga. 1998. Uncertainty measures of rough set prediction. International Journal of General Systems 106(1):109–137. Dubois, D, and H Prade. 1990. Rough fuzzy sets and fuzzy rough sets. International Jounral of General Systems 17(2-3):191–209. ———. 1992. Putting fuzzy sets and rough sets together. In Slowi´ nski, r., (ed.), intelligent decision support, handbook of applications and advances of the rough sets theory, 203–232. Ebanks, Bruce R. 1983. On measures of fuzziness and their representations. Journal of Mathematical Analysis and Applications 94(1):24–37. Henstock, Peter V, and David M Chelberg. 1996. Automatic gradient threshold determination for edge detection. IEEE Trans. Image Process. 5(5):784–787. Hu, Qinghua, and Daren Yu. 2005. Entropies of fuzzy indiscrenibility relation and its operations. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 12(5):575–589. Jumarie, G. 1990. Relative information: theories and applications. New York, NY, USA: Springer-Verlag New York, Inc. Kapur, J N, P K Sahoo, and A K C Wong. 1985. A new method for gray-level picture thresholding using the entropy of the histogram. Computer Vision, Graphics, and Image Processing 29:273–285. Kittler, J, and J Illingworth. 1986. Minimum error thresholding. Pattern Recognition 19(1):41–47. Klir, George, and Bo Yuan. 2005. Fuzzy sets and fuzzy logic: Theory and applications. New Delhi, India: Prentice Hall. Liang, Jiye, K S Chin, Chuangyin Dang, and Richard C M Yam. 2002. A new method for measuring uncertainty and fuzziness in rough set theory. International Journal of General Systems 31(4):331–342. Martin, D, C Fowlkes, D Tal, and J Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of 8th international conference on computer vision, vol. 2, 416–423.
3–27
3–28
Rough Fuzzy Image Analysis
Mi, Ju-Sheng, Xiu-Min Li, Hui-Yin Zhao, and Tao Feng. 2007. Information-theoretic measure of uncertainty in generalized fuzzy rough sets. In Rough sets, fuzzy sets, data mining and granular computing, 63–70. Lecture Notes in Computer Science, Springer. Otsu, N. 1979. A threshold selection method from gray-level histogram. IEEE Trans. Syst., Man, Cybern. 9(1):62–66. Pal, Nikhil R, and James C Bezdek. 1994. Measuring fuzzy uncertainity. IEEE Trans. Fuzzy Syst. 2(2):107–118. Pal, Nikhil R, and Sankar K Pal. 1991. Entropy: A new definition and its application. IEEE Trans. Syst., Man, Cybern. 21(5):1260–1270. Pal, S K. 1982. A note on the quantitative measure of image enhancement through fuzziness. IEEE Trans. Pattern Anal. Mach. Intell. 4(2):204–208. Pal, S K, R A King, and A A Hashim. 1983. Automatic grey level thresholding through index of fuzziness and entropy. Pattern Recognition Letters 1(3):141–146. Pal, Sankar K. 1999. Fuzzy models for image processing and applications. Proc. Indian National Science Academy 65(1):73–90. Pal, Sankar K, B Uma Shankar, and Pabitra Mitra. 2005. Granular computing, rough entropy and object extraction. Pattern Recognition Letters 26(16):2509–2517. Pawlak, Zdzislaw. 1991. Rough sets: Theoritical aspects of reasoning about data. Dordrecht, Netherlands: Kluwer Academic. Radzikowska, Anna Maria, and Etienne E Kerre. 2002. A comparative study of fuzzy rough sets. Fuzzy Sets and Systems 126(2):137–155. Rosin, Paul L. 2001. Unimodal thresholding. Pattern Recognition 34(11):2083–2096. Sen, D, and S K Pal. 2007. Histogram thresholding using beam theory and ambiguity measures. Fundamenta Informaticae 75(1-4):483–504. Shannon, C E. 1948. A mathematical theory of communication. Bell System Technical Journal 27:379–423. Skowron, Andrzej, and Jaroslaw Stepaniuk. 1996. Tolerance approximation spaces. Fundamenta Informaticae 27(2-3):245–253. Thiele, H. 1998. Fuzzy rough sets versus rough fuzzy sets – an interpretation and a comparative study using concepts of modal logics. Technical Report CI-30/98, University of Dortmund. Tsai, Wen-Hsiang. 1985. Moment-preserving thresholding: a new approach. Computer Vision, Graphics, and Image Processing 29:377–393. Tukey, John W. 1977. Exploratory data analysis. Addison-Wesley. Udupa, J K, and P K Saha. 2003. Fuzzy connectedness in image segmentation. Proc. IEEE 91(10):1649–1669.
Image Thresholding using Generalized Rough Sets Wierman, M J. 1999. Measuring uncertainty in rough set theory. International Journal of General Systems 28(4):283–297. Yager, Ronald R. 1992. Entropy measures under similarity relations. International Journal of General Systems 20(4):341–358.
3–29
4 Mathematical Morphology and Rough Sets Homa Fashandi
James F. Peters
4.1 4.2 4.3 4.4 4.5
Computational Intelligence Laboratory,University of Manitoba, Winnipeg R3T 5V6 Manitoba Canada
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–13 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–14
Computational Intelligence Laboratory,University of Manitoba, Winnipeg R3T 5V6 Manitoba Canada
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic Concepts from Topology . . . . . . . . . . . . . . . . . . . . . . Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Morphology and Rough Sets . . . . . . .
4–1 4–1 4–3 4–5 4–9
Some Experiments
Introduction
This chapter focuses on the relation between mathematical morphology (MM) (Serra, 1983) operations and rough sets (Pawlak, 1981, 1982; Pawlak and Skowron, 2007c,b,a) mainly based on topological spaces considered in the context of image retrival (see, e.g., (Fashandi, Peters, and Ramanna, 2009)) and the basic image correspondence problem (see, e.g., (Peters, 2009, 2010; Meghdadi, Peters, and Ramanna, 2009)). There are some obvious similarities between MM operations and set approximations in rough set theory. There have been several attempts to link MM and rough sets. Two major works have been published in this area (Polkowski, 1993; Bloch, 2000). L. Polkowski defines hit-or-miss topology on rough sets and proposed a scheme to approximate mathematical morphology within the general paradigm of soft computing (Polkowski, 1993),(Polkowski, 1999). Later, I. Bloch tries to demonstrate a direct link between MM and rough sets through relations, a pair of dual operations and neighbourhood systems (Bloch, 2000). I.Bloch’s approach is carried forward by J.G. Stell, who defines a single framework that includes the principal constructions of both mathematical morphology and rough sets (Stell, 2007). To make this chapter fairly self-contained, background information on the basics of topology is presented, first. The chapter then presents the basics of mathematical morphology. Then principles of rough set theory are considered and the links between them are discussed. Finally, a proposed application of the ideas from these two areas is given in terms of image retrieval.
4.2
Basic Concepts from Topology
This section introduces the basic concepts of topology (Engelking, 1989; Gemignani, 1990). For the sake of completeness, basic definitions from topology are briefly presented in this 4–1
4–2
Rough Fuzzy Image Analysis
section. Those readers who are familiar with these concepts, may ignore this section. The main reference of this section is the book written by M.C. Gemignani,(Gemignani, 1990). Topology: Let X be a non-empty set. A collection τ of subsets of X is said to be a topology on X if • X and φ belongs to τ • The union of any finite or infinite number of sets in τ belongs to τ • The intersection of any two sets in τ belongs to τ . The pair (X, τ ) is called a topological space. Several topologies can be defined on every set X. Discrete Topology: if (X, τ )is a topological space such that, for every x ∈ X, the singleton set {x} is in τ , then τ is the discrete topology. Open and closed sets are pivotal concepts in topology. Open sets: Let (X, τ ) be a topology. Then the members of τ are called open sets. Therefore, • X and φ are open sets. • The union of any finite or infinite number of open sets are open sets. • The intersection of any finite number of open sets is an open set. Closed Sets: Let (x, τ ) be a topological space. A set S ⊆ X is said to be closed, if X\S is open. • φ and X are closed sets. • The intersection of any finite or infinite number of closed sets is a closed set. • The union of any finite number of closed set is a closed set. Some subsets of X, may be both closed and open. In a discrete space, every set is both open and closed, while in a non-discrete space all subsets of X are neither open nor closed, except X and φ. Clopen Sets: A subset S of a topological space (X, τ ) is said to be clopen if it is both closed and open in (X, τ ). The concept of limit points are closely related to topological closure of a set. Limit Points: Let A be a subset of a topological space (X, τ ). A point x ∈ X is said to be a limit point (cluster point or accumulation point)of A, if every open set, O containing x contains a point of A different from x. The following propositions provide a way of testing a set to determine if it is closed or not. Proposition 4.2.1 Let A be a subset of a topological space (X, τ ). Then A is closed if it contains all of its limit points. Proposition 4.2.2 Let A be a subset of a topological space (X, τ ), and A0 be the set of all limit point of A, then A ∪ A0 is closed. The topological concepts of closure and interior play an important role in this chapter. A brief explanation of these concepts is given next.
Mathematical Morphology and Rough Sets
4–3
Closure Let A be a subset of a topological space (X, τ ). Then the subset A ∪ A0 consisting of all its limit points is called the closure of A and is denoted by A. Interior Let (X, τ ) be any topological space and A be any subset of X. The largest open ˚ set contained in A is called the interior of A, A. Recall that in algebra every vector is a linear combination of the basis. In topology, every open set can be obtained by a union of members of the basis. Basis of a Topology Let (X, τ ) be a topological space. A collection B of open subsets of X is said to be a basis for the topology τ , if every open set is a union of members of B. In other words, B generates the topology.
4.3
Mathematical Morphology
Objects or images in our application are considered as subsets of the euclidean space E n or subsets of an affinely closed subspace X ⊆ E n . For digital objects(images) space is considered to be Z n , where Z is the set of integer numbers. Dilation and erosion are two primary mathematical morphology operators and can be defined by Minkowski sum and Minkowski difference: A ⊕ B = {x + y : x ∈ A, y ∈ B}. (4.1) where A, B ⊆ X, and ’+’ is the sum in euclidean space E n . A B = {x ∈ X : x ⊕ B ⊆ A}.
(4.2)
For simplicity, a set B is assumed to be symmetric about the origin, therefor B = −B = {−x : x ∈ B}. Mathematical morphology operators are defined in different ways. For example, consider two binary images A⊂ Z 2 and B⊂ Z 2 . The dilation of A by B is also defined as ˆx ) ∩ A 6= φ}. A ⊕ B = {x | (B (4.3) ˆ where Bx is obtained by first reflecting B about its origin, and then shifting it such that its origin is located at point x. B is called a structuring element(SE) and it can have any shape, size and connectivity. The characteristic of SE is application dependent. As ˆx = Bx . Based on equation 4.3, dilation mentioned earlier, for simplicity, we consider B ˆx and A have overlapping. The of an image A by B, is the set of all points x such that B erosion of binary images A by B is defined: A B = {x | (Bx ) ⊆ A}
(4.4)
Erosion of A by B is the collection of all points x such that Bx is contained in A. To be consistent with Polkowski (Polkowski, 1999), we use dB (A) for dilation of A by B and eB (A) for erosion of A by B. New morphological operations could be obtained by composition of mappings. Opening (oB (A)) and closing (cB (A)) are two operators obtained by the following compositions, respectively: oB (A) = dB (eB (A)) = {x ∈ X : ∃y, (x ∈ {y} ⊕ B ⊆ A)}.
(4.5)
cB (A) = eB (dB (A)) = {x ∈ X : ∀y, (x ∈ y ⊕ B ⇒ A ∩ ({y} ⊕ B) 6= φ)}.
(4.6)
By moving the structuring element B on the image A, we are gathering information about the medium A in terms of B. The simplest relationships can be obtained by B moving on
4–4
Rough Fuzzy Image Analysis
the medium A, where B ⊂ A (remember erosion- or opening) and A ∩ B 6= φ(dilation or closing). We clarify this idea by citing what G. Matheron wrote in his book in 1975,page xi (Matheron, 1975)(Serra also referred to this part in his book (Serra, 1983, p.84)): ”In general, the structure of an object is defined as the set of relationships existing between elements or parts of the object. In order to experimentally determine this structure, we must try, one after the other, each of the possible relations and examine whether or not it is verified. Of course, this image constructed by such a process will depend to the greatest extent on the choice made for the system < of relationships considered as possible. Hence this choice plays a priori a constructive role in (in the Kantian meaning) and determines the relative worth of the concept of structure at which we will arrive. In the case of a porous medium, let A be a solid component(union of grains), and Ac the porous network. In this medium, we shall move a figure B, called the structuring pattern, playing the role of a probe collecting information. This operation is experimentally attainable.” We can get more information about the object if we gather more information about it. The information can be obtained through the relations, whether it is false or true. Assume that we have a family of structuring elements B, each B ∈ B and each relation (B ⊂ A, B ∩ A 6= φ)gives us some information about A. As an example of a set of structuring elements, consider a sequence {Bi } made up of compact disks of radius ri = ro + 1/i, which tend toward the compact disk of radius r0 . Topology and Mathematical Morphology
Topological properties of mathematical morphology have been introduced and studied by Matheron and Serra in (Matheron, 1975) and (Serra, 1983), respectively. Here we briefly mention some of them. We start with opening and closing. The concepts of topological closure and interior are comparable with morphological closing and opening. The only difference is that in morphology the closing and opening are obtained with respect to a given structuring element (Serra, 1983) but in topology closure and interior are defined in terms of closed and open sets of the topology (Engelking, 1989). To blur this difference and obtain a closer relation between mathematical morphology operators and topological interior and closure, consider the following proposition. Proposition 4.3.1 Let (X, d) be a metric space. Then the collection of open balls is a basis for a topology τ on X (Gemignani, 1990). The topology τ induced by the metric d and (X, τ ) is called the induced topological space. If d is the euclidean metric on R, then a set of open balls is a basis for the topology τ induced ˚r and B r be an open ball and closed ball of radius r, respectively. by a metric d. Let B ˚ as follows: Consider the sets of structuring elements B and B B = {B r | r > 0}. (4.7) ˚ = {B ˚r | r > 0}. B (4.8) ˚ form a basis for the topology on the euclidean space Based on the above proposition, B or image plane. The following equations are showing the relations between mathematical morphology’s opening and closing to topological closing and opening for a set A ⊂ X in ˚ and A be interior and closure of a set A. Then euclidean space, let A [ [ ˚= oB (A). (4.9) oB (A) = A ˚ B∈B
B∈B
4–5
Mathematical Morphology and Rough Sets and A=
\
cB (A) =
˚ B∈B
\
cB (A).
(4.10)
B∈B
The following equations also relate the interior and closure to erosion and dilation, (Serra, 1983): [ [ ˚= A eB (A) = eB (A). (4.11) and A=
˚ B∈B
B∈B
\
\
˚ B∈B
dB (A) =
dB (A).
(4.12)
B∈B
In words, the interior of a set A is the union of the opening or erosions of the set A with open or closed balls of different sizes. The closure of a set A is the union of the closing or dilations of A with open/closed balls of different sizes.
4.4
Rough Sets
In rough set theory, objects in a universe X, are perceived by means of their attributes(features). Let ϕi denote a real-valued function that represents an object feature. Each element x ∈ A ⊆ X is defined by its feature vector, ϕ(x) = (ϕ1 (x), ϕ2 (x), . . . , ϕn (x)) Peters and Wasilewski (2009). An equivalence (called an indiscernibility relation (Pawlak and Skowron, 2007c)) can be defined on X. Let ∼ be an equivalence relation defined on X, i.e., ∼ is reflexive, symmetric and transitive. An equivalent relation(∼) on X classifies objects (x ∈ X) into classes called equivalence classes. Objects in each class have the same feature-value vectors and are treated as one generalized item. The indiscernibility(equivalence) relation ∼X,ϕ is defined in (4.13). ∼X,ϕ = {(x, y) ∈ X × X : ϕ(x) = ϕ(y)}.
(4.13)
∼X,ϕ partitions universe X into non-overlapping equivalence classes denoted by X/∼ϕ or simply X/∼ . Let x/∼ denote a class containing an element x as in (4.14). x/∼ = {y ∈ X | x ∼ϕ y}.
(4.14)
Let X/∼ denote the quotient set as defined in (4.15). X/∼ = {x/∼ | x ∈ X}.
(4.15)
The relation ∼ holds for all members of each class x/∼ in a partition. In a rough set model, elements of the universe are described based on the available information about them. For each subset A ⊆ X, rough set theory defines two approximations based on equivalence classes, lower approximation A− and upper approximation, A− : A− = {x ∈ X : x/∼ ⊆ A}
(4.16)
A− = {x ∈ X : x/∼ ∩ A 6= φ}
(4.17)
The set A ⊂ X is called a rough set if A− 6= A− , otherwise it is called exact set, (Pawlak, 1991). Figure 4.1 shows the set A and lower and upper approximation of it in terms of the partitioned space. The space X is partitioned in to squares of the form (j, j + 1]2 .
4–6
Rough Fuzzy Image Analysis
(4.1a) Set A and partitioned space
(4.1b) lower approximation
(4.1c) upper approximation
FIGURE 4.1: Upper and lower approximation of a set A in the partitioned space of the form (j, j + 1]2
Topology of Rough sets
To study the topological properties of rough set, we define a partition topology on X, based on partitions induced by the equivalence relation ∼X,ϕ . The equivalence classes x/∼ form the basis for the topology τ (Steen and Seebach, 1995). Let A denote the closure of a set A. Since the topology is a partition topology, to find the closure A of a subset A ⊂ X, we have to consider all of the closed sets containing the ˚ is the set A and then select the smallest closed set. The interior of a set A (denoted A) largest open set that is contained in A. Open sets and their corresponding closed sets in the topology are: • X is an open set, φ is the corresponding closed set. • φ is a closed set, X is the corresponding closed set. • xi/∼ is an open set ⇒ X\xi/∼ = xci/∼ , is a closed set. where i = 1, . . . , n, and n is the total number of equivalence classes in the partition topology and \ is the set difference. Recall that the union of any finite or infinite number of open sets are open sets and the intersection of any finite number of open sets is an open set. For a set A ⊆ X and the partition topology τ , we have: A = A−
(4.18)
˚ = A− A
(4.19)
˚ − and A− are closure, interior, upper and lower approximation of A, respecwhere A, A,A tively. Now it is clear that we could define properties of a rough set in the language of ˚ where A and A ˚ are closure and topology. Next, define π − boundary of A as: Ab = A\A, interior of A, (Lashin, Kozae, Abo Khadra, and Medhat, 2005). A set A is said to be rough, if Ab 6= φ, otherwise it is an exact set. Generally, for a given topology τ and A ⊆ X, we have: • • • •
A A A A
is is is is
˚ = A. totally definable, if A is an exact set, A = A ˚ A 6= A. internally definable, if A = A, ˚ A = A. externally definable, if A 6= A, ˚ and A 6= A. undefinable, if A 6= A
Mathematical Morphology and Rough Sets
4–7
(4.2a) Set A in the partitioned space X, where the (4.2b) Sets A (solid line) and Y (dashed line) have boundary region contains shaded rectangles equivalence relation with each other
FIGURE 4.2: Equivalence relation between sets
Polkowski(Polkowski, 1993), also defined an equivalence relation based on topological prop˚ 6= A, the equivalence erties of rough set. For a rough subset of universe, A ⊂ X, with A class A/∼ is defined as: ˚= Y ˚ and A = Y }. A/∼ = {Y ⊆ X|A
(4.20)
In other words, the equivalence class of a set X is the collection of those sets with the same interior and closure of a set X. Notice that in equation 4.14, an equivalence class is based on an element x ∈ X, whereas, in equation 4.20, an equivalence class of a set A ⊆ X is calculated. Figure 4.2 demonstrates the idea of sets that have an equivalence relation with each other. Those rough sets with equal interior and closure have an equivalence relation with each other. In other words, all those sets that fall into boundary regions of the set X have an equivalence relation with A. Figure4.2b shows the boundary region of a set A (solid line), and a sample set Y (dashed line) that has an equivalence relation with A. Instead of using an equivalence relation, arbitrary binary relations can be used to form an approximation space that is called a generalized approximation space. In this case, instead of having partitions on X, a covering can be defined by a tolerance relation. That is, if we use the tolerance relation ∼ =X,ϕ,ε =X,ϕ,ε defined in (4.21) instead of equivalence relation ∼, ∼ defines a covering on X, i.e., the tolerance classes in the covering may or may not disjoint sets. The result from A. Skowron and J. Stepaniuk is called a tolerance approximation space (Skowron and Stepaniuk, 1996). E.C. Zeeman formally defined a tolerance relation ∼ = on a set X as a reflexive and symmetric relation and introduced the notion of a tolerance space (Zeeman, 1962)∗ . A special kind of tolerance relation is a well known equivalence relation, which is reflexive, symmetric and transitive and is similar to equation 4.13. For example, we can define a tolerance relation on the set X as given in (4.21). ∼ =X,ϕ,ε = {(x, y) ∈ X × X :| ϕ(x) − ϕ(y) |≤ }.
(4.21)
∗ It has been observed by A.B. Sossinsky (Sossinsky, 1986) that it was J.H. Poincar´ e who informally introduced tolerance spaces in the context of sets of similar sensations (Poincar´ e, 1913). Both E.Z. Zeeman and J.H. Poincar´ e introduce tolerance spaces in the context of sensory experience.
4–8
Rough Fuzzy Image Analysis
(4.3a) Two overlapped sets with (4.3b) Two disjoint set with tol- (4.3c) Inclusion and tolerance tolerance relation erance relation relation
FIGURE 4.3: Tolerance relation between sets; Space X is partitioned into squares, Sets A(solid line), set Y dashed lines have tolerance relation. A ∩ Y are colored in gray The relation ∼ =B,ε (Peters, 2009) =X,ϕ,ε is a special case of the tolerance near set relation ∼ (see, also, the weak nearness relation in (Peters and Wasilewski, 2009)). For conciseness, ∼ = is used to denote ∼ =X,ϕ,ε . In a manner similar to what L. Polkowski has done in defining the equivalence class of a rough set A ⊂ X, we introduce the following equation as a tolerance ˚ 6= A: class for a rough subset of the universe, A ⊂ X, with A A/∼ = = {Y ⊆ U |A ∩ Y 6= φ}.
(4.22)
In other words we are suggesting that two rough sets of universe A, Y ⊂ X have tolerance relation(∼ =) with each other iff A ∩ Y 6= φ. Notice that equation 4.21 is defined on elements in a set X. By contrast, equation 4.22 is defined for two sets. Proposition 4.4.1 ∼ = is a tolerance relation. Proof. To show ∼ = is a tolerance relation, we have to show that it is reflexive and symmetric: • if A = Y ⇒ A = Y therefore A ∩ Y 6= φ; So A ∼ = is a = Y . This means that ∼ reflexive relation, A ∼ = A. • if A ∼ = Y ⇒ A ∩ Y 6= φ ⇒ Y ∩ A 6= φ ⇒ Y ∼ = is a symmetric = Y ; therefore ∼ relation. Proposition 4.4.2 Equation4.20 is the specialization of equation 4.22. Proof. We have to show that the equivalence class of a set A is a special case of its tolerance class. In other words, an equivalence class A/∼ is included in A/∼ = . Let Y ⊂ X and ˚ ˚ Y ∈ A/∼ , we have to show that Y ∈ A/∼ = . Since Y ∈ A/∼ , we have A = Y and A = Y . We only need the first part; by A = Y we get A ∩ Y = A 6= φ. Therefore A/∼ is included in A/∼ = . Figure 4.3 are showing three different sets that have tolerance relation with set A ⊂ X. In figure 4.3b the set Y ⊂ X is disjoint from the set A with A ∩ Y = φ, but has a tolerance relation with A because A ∩ Y 6= φ.
Mathematical Morphology and Rough Sets
4.5
4–9
Mathematical Morphology and Rough Sets
There are two main papers connecting mathematical morphology to rough set theory. One by Polkowski(Polkowski, 1999), who uses the language of topology to connect the two fields and Bloch’s work that is mainly based on the language of relations,(Bloch, 2000). We begin this section with some examples from (Polkowski, 1999). The first one is partitioning Z ⊆ E n into a collection {P1 , P2 , ..., Pn }, where Pi is the partition of the i − th axis Ei into intervals of (j, j +1]. In the second example, a structuring element B = (0, 1]n is selected. It is easily seen that oB (X) = X− for each X ⊆ Z. Also, cB (X) = X − . These two examples clarify the relation between mathematical morphology and rough sets in a cogent way. Mathematical morphology is mainly developed for the image plane and a structuring element has a geometrical shape in this space. L. Polkowski defined a partition in this space (not necessarily through the definition of equivalence relation) and then obtained the upper and lower approximation of a set X based on these partitions. Equivalence classes have the same characteristics and they are in the form of (j, j + 1]n . At the same time, a structuring element B with the same characteristics of equivalent classes is defined, (0, 1]n . Then Bx , translation of B by x, can hit (overlap) any of the equivalence classes. In classical rough set theory, objects are perceived by their attributes and classified into equivalence classes based on the indiscernibility of the attribute values. In the above examples, the geometrical position of the pixels in the image plane, act as attributes and form the partitions. In (Polkowski, 1993), the morphological operators are defined on equivalence classes. In her article, I. Bloch tried to connect rough set theory to mathematical morphology based on relations (Bloch, 2000). She uses general approximation spaces, where, instead of the indiscernibility relation, an arbitrary binary relation is used. She suggests that upper and lower approximation can be obtained from erosion and dilation,(Bloch, 2000). The binary relation defined in her work is xRy ⇔ y ∈ Bx . Then from the relation R, r(x) is derived in the following way. ∀x ∈ X, r(x) = {y ∈ X | y ∈ Bx } = Bx Consider ∀x ∈ X, x ∈ Bx and let be B be symmetric. Then erosion and lower approximation coincide: ∀A ⊂ X, A− = {x ∈ X | r(x) ⊂ A} = {x ∈ X | Bx ⊂ X} = eB (A) The same method is used to show that upper approximation and dilation coincide. I. Bloch also extends the idea to dual operators and neighbourhood systems. The common result of both (Polkowski, 1999) and (Bloch, 2000) is the suggestion that upper and lower approximations can be linked to closing(dilation) and opening(erosion). Topology, neighbourhood systems, dual operators and relations are used to show the connection.
4.5.1
Some Experiments
In the following subsection we tried some experiments to demonstrate the ways of incorporating both fields in image processing. Lower Approximation as Erosion Operator
When mathematical morphology is used in image processing applications, a structuring element mainly localized in the image plane is used. The result is the interaction of the
4–10
Rough Fuzzy Image Analysis
structuring element with the image underneath. On the other hand in rough set theory, the universe is partitioned or covered by some classes(indiscernibility, tolerance or arbitrary) and the objects in the universe are perceived based on the knowledge available in these classes. The interaction of the structuring element with the image underneath is localized in space; In other words, the underneath image is being seen(characterized) through the small window opened by the structuring element. In rough set theory a set is approximated by the knowledge gathered in equivalence classes from the whole universe. This is the main difference in mathematical morphology and rough sets. In mathematical morphology, especially when the concept of lattice is introduced in this field, the universe consists of all the possible images. But when we are applying morphological operators on images, there is only one image and a set of structuring elements. The result is a new image belonging to the universe. In the application of rough set theory, the available data in databases forms the universe and all the approximations are based on the available data. To have almost the same milieu, we also consider a set of finite images to be our universe (this is possible, if we view the universe X as a set of points and each image A in the universe to be a subset of the universe, i.e., particular set of points A ⊂ X). Ten sets of images in different categories are considered. Each set consists of 100 images. Figure 4.4 shows samples of some of the categories. A category of images is defined as a collection of images with visual/semantic similarities. For instance, the category of seaside images, mountain images, dinosaurs or elephants can be derived from images in the Simplicity image archive (Group, 2009). The categorization is done by an individual and it is not unique. Each image may belong to different categories. For instance, in figure 4.5a the elephant pictures are categorized into elephant category; but in figure 4.5b, they are in animal and/or nature categories. Therefore the categorization is completely application dependent and subjective.
FIGURE 4.4: Some image samples from different categories
The aim is to approximate a query image, I ∈ X, based on one and/or several of these
4–11
Mathematical Morphology and Rough Sets
(4.5a) Sample image categories
(4.5b) Sample image categories
FIGURE 4.5: Image universe and categories
(4.6a) The subspace C, some sam- (4.6b) original query image,A ples from flowers category. the set C contains 100 flower images
(4.6c) lower approximation in terms of the universe X
FIGURE 4.6: Obtaining lower approximation of a query image in terms of the universe three color components as features
categories. This will reveal important information about the degree of similarity between the query image and the categories. We define an equivalence relation on the set of images X. Three color components in addition to an image index are considered as features: ϕ(pij ) = (ϕ1 (pij ), ϕ2 (pij ), ϕ3 (pij ), ϕ4 (pij )) ϕ1 (pij ) = R(pij ), ϕ2 (pij ) = G(pij ), ϕ3 (pij ) = B(pij ), ϕ4 (pij ) = i, i = 1, 2, . . . , j = 1, 2, . . . , Mi , where pij is the j th pixel of ith image(Ii ) in the universe; Mi is the number of pixels in the image Ii . R extracts the red component of the pixel, G extracts green and B, blue component. Equivalence classes are formed on each image, I ∈ X. So each image I is partitioned into equivalence classes based on features defined by ϕ, I/∼,ϕ .
4–12
Rough Fuzzy Image Analysis
(4.7a) The subspace C, some sam- (4.7b) original query image,A ples from elephant category. the set C contains 100 elephant images
(4.7c) lower approximation in terms of the universe X
FIGURE 4.7: Obtaining lower approximation of a query image in terms of the universe three color components as features
(4.8a) Subspace C, some sea shore samples (set X contains 100 sea shore images)
(4.8b) (See color insert) Sample (4.8c) (See color insert) Sample (4.8d) lower approximation seashore query image lower approximation in terms of in terms of the universe the universe X, excluding the X,including the query image query image
FIGURE 4.8: (Please see color insert for Figures 4.8b and c) Obtaining lower approximation of a query image in terms of the universe and three color components as features
Then we define a partition topology on X, where the basis of the topology is the set of
Mathematical Morphology and Rough Sets
4–13
partitions formed on the images inside the universe. The empty set φ is also added to the basis (Steen and Seebach, 1995). Let (X, τ ) be the partition topology on X. We consider each category of images as a subspace topology. Let C be a non-empty subset of X. The collection τC = {T ∩ C : T ∈ τ } of subsets of C is called the subspace topology. The topological space (C, τC ) is said to be a subspace of (X, τ ), (Gemignani, 1990). Let I ∈ X be an image. We want to find the interior of the set I in terms of open sets of different subspaces. I˚C = {c ∈ τc | c ⊆ I} (4.23) where (C, τc ) is a subspace topology for a category of images. Based on equation 4.19, we could say that I˚C = IC− (4.24) In other words, the interior of a set I relative to subspace C is equal to the lower approximation of the set I with respect to subspace C. Figures 4.6 and 4.7 demonstrate the results of lower approximation of a query image in terms of the specified subspace. As it is obvious in the examples, the more similar the query image to the subspace in terms of the predefined features, the more complete is the lower approximation. The black pixels in the lower approximation image are those parts of the image that are filtered. In these examples the features are three color components in RGB space. Notice that we used 4 features to form the partition topology.
4.6
Conclusion
In summary, this chapter presents the basic definitions of mathematical morphology and rough set theory. Mathematical morphology is defined and expanded in the image processing domain. Rough set theory is first introduced for image archives. Although the initial domains and applications of these two fields are different, there are connections between the two. This chapter brings together the common aspects of mathematical morphology and rough set theory. The lower approximation of rough set theory is analogous to opening/erosion of mathematical morphology. The same is true for upper approximation and closing/dilation. We have proposed a method to use the idea of lower approximation to find the similarity between images. A partition topology is defined on images gathered as a universe of images. Four features including color information and image indices are used to form image partitions. Subspace topologies are used to model each category of images. An interior of a query image is then calculated based on different subspaces. In other words, we find the lower approximation of the query image in terms of different subspaces. We are proposing that the closer the lower approximated image is to the query image, the more similar the query image is to the subspace.
4–14
Rough Fuzzy Image Analysis
Bibliography Bloch, I. 2000. On links between mathematical morphology and rough sets. Pattern Recognition 33(9):1487–1496. Engelking, R. 1989. General topology, revised & completed edition. Berlin: Heldermann Verlag. Fashandi, H., J.F. Peters, and S. Ramanna. 2009. L2-norm length-based image similarity measures: Concrescence of image feature histogram distances. In Signal & image processing, int. assoc. of science & technology for development, honolulu, hawaii, 178–185. Gemignani, M.C. 1990. Elementary topology. Courier Dover Publications. Group, James Z. Wang Research. 2009. Simplicity: Content based image retrieval image database search engine. Lashin, EF, AM Kozae, AA Abo Khadra, and T. Medhat. 2005. Rough set theory for topological spaces. International journal of approximate reasoning 40(1-2):35–43. Matheron, G. 1975. Random sets and integral geometry. Wiley New York. Meghdadi, A.H., J.F. Peters, and S. Ramanna. 2009. Tolerance classes in measuring image resemblance. In Kes 2009, part ii, lnai 5712, 127–134. Berlin: Springer. Pawlak, Z. 1981. Classification of objects by means of attributes. Polish Academy of Sciences 429. ———. 1982. Rough sets. International J. Comp. Inform. Science 11:341–356. ———. 1991. Rough sets: Theoretical aspects of reasoning about data. Kluwer Academic Print on Demand. Pawlak, Z., and A. Skowron. 2007a. Rough sets and boolean reasoning. Information Sciences 177:41–73. ———. 2007b. Rough sets: Some extensions. Information Sciences 177:28–40. ———. 2007c. Rudiments of rough sets. Information Sciences 177:3–27. Peters, James F. 2009. Tolerance near sets and image correspondence. International Journal of Bio-Inspired Computation 1(4):239–245. ———. 2010. Corrigenda and addenda: Tolerance near sets and image correspondence. International Journal of Bio-Inspired Computation 2(5). in press. Peters, James F., and Piotr Wasilewski. 2009. Foundations of near sets. Information Sciences 179:3091–3109. Digital object identifier: doi:10.1016/j.ins.2009.04.018, in press. Poincar´e, H. 1913. Mathematics and science: Last essays, trans. by j.w. bolduc. N.Y.: Kessinger Pub. Polkowski, L. 1999. Approximate mathematical morphology. Rough set approach. Rough Fuzzy Hybridization: A New Trend in Decision-Making.
Mathematical Morphology and Rough Sets Polkowski, LT. 1993. 8. Mathematical Morphology of Rough Sets. Bulletin of the Polish Academy of Sciences-Mathematics 41(3):241. Serra, J. 1983. Image analysis and mathematical morphology. Academic Press, Inc. Orlando, FL, USA. Skowron, A., and J. Stepaniuk. 1996. Tolerance Approximation Spaces. Fundamenta Informaticae 27(2/3):245–253. Sossinsky, A.B. 1986. Tolerance space theory and some applications. Acta Applicandae Mathematicae: An International Survey Journal on Applying Mathematics and Mathematical Applications 5(2):137–167. Steen, L.A., and J.A. Seebach. 1995. Counterexamples in topology. Courier Dover Publications. Stell, J.G. 2007. Relations in mathematical morphology with applications to graphs and rough sets. In Spatial information theory, vol. 4736 of LNCS, 438–454. Springer. Zeeman, E. C. 1962. The topology of the brain and visual perception. In Topology of 3-manifolds and related topics (proc. the univ. of georgia institute, 1961), ed. JR M.K. Fort, 240–256. Prentice Hall.
4–15
5 Rough Hybrid Scheme: An application of breast cancer imaging 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Fuzzy sets, rough sets and neural networks: Brief Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5–1 5–3
Fuzzy Sets • Rough sets • Neural networks • Create gray-level co-occurrence matrix from image
Aboul Ella Hassanien Cairo University, Egypt
Hameed Al-Qaheri Kuwait University
Ajith Abraham Norwegian University of Science and Technology
5.1
5.3 Rough Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5–5
Pre-processing: Intensity Adjustment through Fuzzy histogram hyperbolization algorithm • Clustering and Feature Extraction: Modified fuzzy c-mean clustering algorithm and Gray level co-occurrence matrix • Rough sets analysis • Rough Neural Classifier
5.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–12 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–14
Introduction
Breast carcinomas are a leading cause of death for women throughout the world. It is second or third most common malignancy among women in developing countries (Rajendra, Ng, Y. H. Chang, and Kaw, 2008). The incidence of breast cancer is increasing globally and the disease remains a significant public health problem. Statistics from the National Cancer Institute of Canada show that the lifetime probability of a woman developing breast cancer is one in nine, with a lifetime probability of one in 27 of death due to the disease, also about 385,000 of the 1.2 million women diagnosed with breast cancer each year occur in Asia (Organization, 2005). Because only localized cancer is deemed to be treatable and curable, as opposed to metastasized cancer, early detection of breast cancer is of utmost importance. Mammography is, at present, the best available tool for early detection of breast cancer. However, the sensitivity of screening mammography is influenced by image quality and the radiologists level of expertise. Contrary to masses and calcifications, the presence of architectural distortion is usually not accompanied by a site of increased density in mammograms. The detection of architectural distortion is performed by a radiologist through the identification of subtle signs of abnormality, such as the presence of spiculations and distortion of the normal oriented texture pattern of the breast. Mammography is currently the gold standard/method to detect early breast cancer before it becomes clinically palpable. The use of mammography results in a 25% to 30% decreased mortality rate in screened women compared with controls after 5 to 7 years (Nystrom, 5–1
5–2
Rough Fuzzy Image Analysis
Andersson, Bjurstam, Frisell, Nordenskjold, and Rutqvist, 2002). Breast screening aims to detect breast cancers at the very early stage (before lymph node dissemination). Randomized trials of mammographic screening have provided strong evidence that early diagnosis and treatment of breast cancer reduce breast cancer mortality (Nystrom et al., 2002). Breast cancer usually presents with a simple feature or a combination of the following features: a mass, associated calcifications, architectural distortion, asymmetry of architecture, breast density or duct dilation and skin or nipple changes (Nystrom et al., 2002; Rajendra et al., 2008). In fact, a large number of mammogram image analysis systems have been employed for assisting physicians in the early detection of breast cancers on mammograms (Guo, Shao, and Ruiz, 2009; Ikedo, Morita, Fukuoka, Hara, Lee, Fujita, Takada, and Endo, 2009). The earlier a tumor is detected, the better the prognosis. Usually, breast cancer detection system starts with preprocessing that includes digitization of the mammograms with different sampling and quantization rates. Then, the regions of interest selected from the digitized mammogram are enhanced. The segmentation process is designed to find suspicious areas, and to separate the suspicious areas from the background that will be used for extracting features of suspicious regions. In the feature selection process, the features of suspicious areas will be extracted and selected, and suspicious regions will be classified into two classes: cancer or non cancer (Aboul Ella, Ali, and Hajime, 2004; Aboul Ella, 2003; Setiono, 2000; Rajendra et al., 2008; Ikedo et al., 2009; Maglogiannis, Zafiropoulos, and Anagnostopoulos, 2007). Rough set theory (Aboul Ella et al., 2004; Hirano and Tsumoto, 2005; Pawlak, 1982) is a fairly new intelligent technique that has been applied to the medical domain and is used for the discovery of data dependencies, evaluates the importance of attributes, discovers the patterns of data, reduces all redundant objects and attributes, and seeks the minimum subset of attributes. Moreover, it is being used for the extraction of rules from databases. One advantage of the rough set is the creation of readable if-then rules. Such rules have a potential to reveal new patterns in the data material. Other approaches like case based ´ ezak, 2000; Aboul Ella, 2009) are also reasoning and decision trees (Aboul Ella, 2003; Sl¸ widely used to solve data analysis problems. Each one of these techniques has its own properties and features including their ability of finding important rules and information that could be useful for data classification. Unlike other intelligent systems, rough set analysis requires no external parameters and uses only the information presented in the given data. The combination or integration of more distinct methodologies can be done in any form, either by a modular integration of two or more intelligent methodologies, which maintains the identity of each methodology, or by integrating one methodology into another, or by transforming the knowledge representation in one methodology into another form of representation, characteristic to another methodology. Neural networks and rough sets are widely used for classification and rule generation (Greco, Inuiguchi, and Slowinski, 2006; Aboul Ella, 2007; Henry and Peters, 1996; Peters, Liting, and Ramanna, 2001; Peters, Skowron, Liting, and Ramanna, 2000; Sandeep and Rene, 2006; Aboul Ella, 2009). Instead of solving a problem using a single intelligent technique such as neural networks, rough sets, or fuzzy image processing alone, the proposed approach in this chapter is to integrate the three computational intelligence techniques (forming a hybrid classification method) to reduce their weaknesses and increase their strengths. An application of breast cancer imaging has been chosen to test the ability and accuracy of a hybrid approach in classifying breast cancer images into two outcomes: malignant cancer or benign cancer. This chapter introduces a rough hybrid approach to detecting and classifying breast cancer images into two outcomes: cancer or non-cancer.
Rough Hybrid Scheme: An application of breast cancer imaging
5–3
This chapter is organized as follows: Section 5.2 gives a brief mathematics background of fuzzy and rough sets and neural network. Section 5.3 discusses the proposed rough hybrid scheme in detail. Experimental analysis and discussion of the results are described in Section 5.4. Finally, conclusions are presented in Section 5.5.
5.2
Fuzzy sets, rough sets and neural networks: Brief Introduction
Recently various intelligent techniques and approaches have been applied to handle the different challenges posed by data analysis. The main constituents of intelligent systems include fuzzy logic, neural networks, genetic algorithms, and rough sets. Each of them contributes a distinct methodology for addressing problems in its domain. This is done in a cooperative, rather than a competitive, manner. The result is a more intelligent and robust system providing a human-interpretable, low cost, exact enough solution, as compared to traditional techniques. This section provides a brief introduction into fuzzy sets, rough sets and neural networks.
5.2.1
Fuzzy Sets
Professor Lotfi Zadeh (Zadeh, 1965) introduced the concept of fuzzy logic to present vagueness in linguistics, and further implement and express human knowledge and inference capability in a natural way. Fuzzy logic starts with the concept of a fuzzy set. A fuzzy set is a set without a crisp, clearly defined boundary. It can contain elements with only a partial degree of membership. A Membership Function (MF) is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 and 1. The input space is sometimes referred to as the universe of discourse. Let X be the universe of discourse and x be a generic element of X. A classical set A is defined as a collection of elements or objects x ∈ X, such that each x can either belong to or not belong to the set A, A v X. By defining a characteristic function (or membership function) on each element x in X, a classical set A can be represented by a set of ordered pairs (x, 0) or (x, 1), where 1 indicates membership and 0 non-membership. Unlike conventional set mentioned above fuzzy set expresses the degree to which an element belongs to a set. Hence the characteristic function of a fuzzy set is allowed to have value between 0 and 1, denoting the degree of membership of an element in a given set. If X is a collection of objects denoted generically by x, then a fuzzy set A in X is defined as a set of ordered pairs: A = {(x, µA (x)) | x ∈ X}
(5.1)
µA (x) is called the membership function of linguistic variable x in A, which maps X to the membership space M, M = [0, 1], where M contains only two points 0 and 1, A is crisp and µA (x) is identical to the characteristic function of a crisp set. Triangular and trapezoidal membership functions are the simplest membership functions formed using straight lines. Some of the other shapes are Gaussian, generalized bell, sigmoidal and polynomial based curves. The adoption of the fuzzy paradigm is desirable in image processing because of the uncertainty and imprecision present in images, due to noise, image sampling, lightning variations and so on. Fuzzy theory provides a mathematical tool to deal with the imprecision and ambiguity in an elegant and efficient way. Fuzzy techniques can be applied to different phases
5–4
Rough Fuzzy Image Analysis
of the segmentation process; additionally, fuzzy logic allows to represent the knowledge about the given problem in terms of linguistic rules with meaningful variables, which is the most natural way to express and interpret information. Fuzzy image processing (Aboul Ella et al., 2004; Kerre and Nachtegael, 2000; Nachtegael, Van-Der-Weken, Van-De-Ville, Kerre, Philips, and Lemahieu, 2001; Sandeep and Rene, 2006; Sushmita and Sankar, 2005; Rosenfeld, 1983) is the collection of all approaches that understand, represent and process the images, their segments and features as fuzzy sets. An image I of size M xN and L gray levels can be considered as an array of fuzzy singletons, each having a value of membership denoting its degree of brightness relative to some brightness levels.
5.2.2
Rough sets
Due to space limitations we provide only a brief explanation of the basic framework of rough set theory, along with some of the key definitions. A more comprehensive review can be found in sources such as (Polkowski, 2002). Rough sets theory provides a novel approach to knowledge description and to approximation of sets. Rough theory was introduced by Pawlak during the early 1980s (Pawlak, 1982) and is based on an approximation space-based approach to classifying sets of objects. In rough sets theory, feature values of sample objects are collected in what are known as information tables. Rows of a such a table correspond to objects and columns correspond to object features. Let O, F denote a set of sample objects and a set of functions representing object features, respectively. Assume that B ⊆ F, x ∈ O. Further, let x∼B denote x/∼B = {y ∈ O | ∀φ ∈ B, φ(x) = φ(y)} , i.e., x ∼B y (description of x matches the description of y). Rough sets theory defines three regions based on the equivalent classes induced by the feature values: lower approximation BX, upper approximation BX and boundary BN DB (X). A lower approximation of a set X contains all equivalence classes x/∼B that are proper subsets of X, and upper approximation BX contains all equivalence classes x/∼B that have objects in common with X, while the boundary BN DB (X) is the set BX \ BX, i.e., the set of all objects in BX that are not contained in BX. Any set X with a non-empty boundary is roughly known relative, i.e., X is an example of a rough set. The indiscernibility relation ∼B (also written as IndB ) is a mainstay of rough set theory. Informally, ∼B is a set of all classes of objects that have matching descriptions. Based on the selection of B (i.e., set of functions representing object features), ∼B is an equivalence relation that partitions a set of objects O into classes (also called elementary sets (Pawlak, 1982)). The set of all classes in a partition is denoted by O/∼B (also by O/IndB ). The set O/IndB is called the quotient set. Affinities between objects of interest in the set X ⊆ O and classes in a partition can be discovered by identifying those classes that have objects in common with X. Approximation of the set X begins by determining which elementary sets x/∼B ∈ O/∼B are subsets of X.
5.2.3
Neural networks
Neural networks (NN) is an Artificial Intelligent (AI) methodology based on the composition of the human brain, as well as made up of a wide network of interconnecting processors. The basic parts of every NN are the processing elements, connections, weights, transfer functions, as well as the learning and feedback laws.
Rough Hybrid Scheme: An application of breast cancer imaging
5–5
Throughout most of this work the classical multi-layer feed-forward network, as described in (Henry and Peters, 1996), is utilized. The most commonly used learning algorithm is back-propagation. The signals flow from neurons in the input to those in the output layer, passing through hidden neurons, organized by means of one or more hidden layers. By a sigmoidal excitation function for a neuron we will understand a mapping of the form: f (x) =
1 1 + e−βx
(5.2)
where x represents weighted sum of inputs to a given neuron and β is the coefficient called gain, which determines the slope of the function. Let Icni , Ocnj and wij be input to neuron i, output from neuron j, and the weight of connection between i and j, respectively. We put: Icni =
n X
wij Ocnj
(5.3)
j=1
Ocni = f (Icni )
5.2.4
(5.4)
Create gray-level co-occurrence matrix from image
Statistically, texture is a unity of local variabilities and spatial correlations. Gray level co-occurrence matrix (GLCM) is one of the most known texture analysis methods that estimates image properties related to second-order statistics. Each entry (i, j) in GLCM corresponds to the number of occurrences of the pair of gray levels i and j which are a distance d apart in original image. In order to estimate the similarity between different gray level co-occurrence matrices, Haralick (Haralick, 1979) proposed 14 statistical features extracted from them. To reduce the computational complexity, only some of these features were selected. In this paper we use energy, entropy, contrast and inverse difference moment. For further reading see,e.g, (Aboul Ella, 2007).
5.3
Rough Hybrid Approach
In this section, an application of breast cancer imaging has been chosen and hybridization scheme that combines the advantages of fuzzy sets, rough sets and neural networks in conjunction with statistical feature extraction techniques, have been applied to test their ability and accuracy in detecting and classifying breast cancer images into two outcomes: cancer or non-cancer. The architecture of the proposed rough hybrid approaches is illustrated in Figure 1. It is comprised of four fundamental building phases: In the first phase of the investigation, a preprocessing algorithm based on fuzzy image processing is presented. It is adopted to improve the quality of the images and to make the segmentation and feature extraction phase more reliable. It contains several sub-processes. In the second phase, a modified version of the standard fuzzy c-mean clustering algorithm is proposed to initialize the segmentation, then the set of features relevant to region of interest is extracted, normalized and represented in a database as vector values. The third phase is rough set analysis. It is done by computing the minimal number of necessary attributes, their significance and by generating a sets of rules. Finally, a rough neural network is designed to discriminate different regions of interest in
5–6
Rough Fuzzy Image Analysis
order to separate them into malignant and benign cases. These four phases are described in detail in the following sections along with the steps involved and the characteristic features of each phase.
FIGURE 5.1: (Please see color insert) Fuzzy rough hybrid scheme
Rough Hybrid Scheme: An application of breast cancer imaging
Algorithm 1 fuzzy-based histogram hyperbolization Step-1: Parameter initialization 1: 2:
3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:
16: 17: 18: 19: 20: 21:
22: 23:
Setting the shape of membership function (triangular) Setting the value of fuzzifier β. such that β = −0.75µ + 1.5. Step-2: Fuzzy data for (i=0;i¡hieght;i++) do for (j=0;j¡width;j++) do if data[i][j]<100 then FuzzyData[i][j]=0 end if if (data[i][j]>=100) & (data[i][j]<=200) then FuzzyData[i][j]=(0.01*data[i][j])-1 end if if ((data[i][j]>200)&(data[i][j]<=255)) then FuzzyData[i][j]=1 end if end for end for Step-3: Modify the membership values Set ModificationBeta=2 for i=0;i
5–7
5–8
5.3.1
Rough Fuzzy Image Analysis
Pre-processing: Intensity Adjustment through Fuzzy histogram hyperbolization algorithm
In this phase, fuzzy image processing techniques have been applied to enhance the contrast of the whole image and to enhance the edges surrounding the region of interest (Aboul Ella and Dominik, 2006). The gray level modification is one of the most popular methods to perform image enhancement because it is simple in implementation and fast in computing. But the selection of suitable mathematical function for the gray level transformation depends on the specific grayness properties of the image, it is necessary to develop some techniques for automatic selection of an appropriate function. In recent years, many researchers have applied the fuzzy set theory to develop new techniques for contrast improvement. It is based on gray level mapping into a fuzzy plane, using a membership transformation function. The aim is to generate image of higher contrast than the original image by giving a larger weight to the gray levels that are closer to the mean gray level of the image than to those that are farther from the mean. We present a fuzzy-based histogram hyperbolization algorithm to enhance the edges surrounding the region of interest. It starts by initialization the parameters of the image phase. Then by fuzzification of the gray levels phase (i.e. membership values such as dark, gray and bright) sets of gray levels. It followed by the grey level modification phase. Finally, generation of a new gray levels phase. The main steps of the fuzzy-based histogram hyperbolization algorithm are given in Algorithm 1.
5.3.2
Clustering and Feature Extraction: Modified fuzzy c-mean clustering algorithm and Gray level co-occurrence matrix
The standard fuzzy c-means objective function for partitioning data set y = {y1 , . . . , yN } into c clusters is given by: J=
N c X X
(uik )p kyk − vi k2
(5.5)
i=1 k=1
where vi are the prototypes of the clusters and the array uik = U represents a partition matrix. The parameter p is a weight exponent determining the amount of fuzziness in the resulting classification. It has the effect of reducing the square distance error by an amount that depends on the observation’s membership in the cluster. With p decreasing, partitions that minimize J become increasingly crisp. Conversely, higher values of p tend to soften memberships and partitions become more blurred. Generally p should be selected experimentally. The FCM objective function is minimized when high membership values are assigned to pixel whose intensities are close to the centroid of its particular class, and low membership values are assigned when the pixel data is far from the centroid. The FCM algorithm is computationally expensive and sensitive to the noise. To solve these problems, we present a modified version of the fuzzy c-mean clustering algorithm, (refer to (Aboul Ella, 2007) for more details and algorithm steps). To reduce the computational complexity, only some of these features were selected. As for feature extraction and to reduce the computational complexity, only some features were selected; energy, entropy, contrast and inverse difference moment features. Energy, also called Angular Second Moment, is a measure of textural uniformity of an image. Energy reaches its highest value when gray level distribution has either a constant or a periodic form.
Rough Hybrid Scheme: An application of breast cancer imaging
5–9
A homogenous image contains very few dominant gray tone transitions, and therefore the normalized entry of the co-occurrence matrix for this image will have fewer entries of larger magnitude resulting in large value for energy feature. In contrast, if the normalized entry of the co-occurrence matrix contains a large number of small entries, the energy feature will have smaller value. The second feature is entropy which measures the disorder of an image and it achieves its largest value when all elements in normalized entry of the co-occurrence matrix are equal. When the image is not texturally uniform many GLCM elements have very small values, which implies that entropy is very large. Therefore, entropy is inversely proportional to GLCM energy. The third feature is contrast which is a difference moment of the normalized entry of the co-occurrence matrix and it measures the amount of local variations in an image. The last feature is the inverse difference moment. It measures image homogeneity. This parameter achieves its largest value when most of the occurrences in GLCM are concentrated near the main diagonal. Inverse difference moment is inversely proportional to GLCM contrast [For more details, the reader can see (Aboul Ella, 2007)].
5.3.3
Rough sets analysis
One way to construct a simpler model computed from data, easier to understand and with more predictive power, is to create a set of minimal number of rules. Some condition values may be unnecessary in a decision rule produced directly from the database. Such values can then be eliminated to create a more comprehensible minimal rule preserving essential information. • Pre-processing stage. This stage includes tasks such as extra variables addition and computation, decision classes assignments, data cleansing, completeness, correctness, attribute creation, attribute selection and discretization. • Analysis and Rule Generating stage. This stage includes the generation of preliminary knowledge, such as computation of object reducts from data, derivation of rules from reducts, rule evaluation and prediction processes. • Classification and Prediction stage. This stage utilize the rules generated from the previous phase to predict the stock price movement The computation of the core and reducts from a decision table is a way of selecting relevant features (Bazan, Nguyen, Nguyen, Synak, and Wr´ oblewski, 2000; Starzyk, Dale, and Sturtz, 1981). It is a global method in the sense that the resultant reducts represent the minimal sets of features which are necessary to maintain the same classificatory power given by the original and complete set of attributes. A straighter manner for selecting relevant features is to assign a measure of relevance to each attribute and choose the attributes with higher values. Based on the reduct system, we generate the list of rules that will be used for building the rough net classifier model for the new objects (Aboul Ella and Dominik, 2006).
5.3.4
Rough Neural Classifier
Rough neural networks (Henry and Peters, 1996; Peters et al., 2001, 2000) used in this study consist of one input layer, one output layer and one hidden layer. The input layer neurons accept input from the external environment. The outputs from input layer neurons are fed to the hidden layer neurons. The hidden layer neurons feed their output to the output layer
5–10
Rough Fuzzy Image Analysis
neurons which send their output to the external environment. The number of hidden neurons is determined by the following inequality (Chiang and Braun, 2004; Hu, Lin, and Han, 2004). Nhn ≤
Nts ∗ Te ∗ Nf Nf + N o
(5.6)
Nhn is the number of hidden neurons, Nts is the number of training samples, Te is the tolerance error, Nf is the number of attributes, and No is the number of the output. The output of a rough neuron is a pair of upper and lower bounds, while the output of a conventional neuron is a single value. Rough neuron was introduced in 1996 by Lingras (Henry and Peters, 1996). It was defined relative to upper bound (Un ), lower bound (Ln ), and inputs were assessed relative to boundary values. Rough neuron has three types of connections: Step 1. Input-Output connection to Un Step 2. Input-Output connection to Ln Step 3. Connection between Un and Ln (Rough neuron) A rough neuron Rn is a pair of usual rough neurons Rn = (Un , Ln ), where Un and Ln are the upper rough neuron and the lower rough neuron, respectively.
DEFINITION 5.1
Let (IrLn , OrLn ) be the input/output of a lower rough neuron and (IrUn , OrUn ) be the input/output of an upper rough neuron. Calculation of the input/output of the lower/upper rough neurons is given as follows: IrLn =
n X
wLnj Onj
(5.7)
wUnj Onj
(5.8)
j=1
IrUn =
n X j=1
OrLn = min(f (IrLn ), f (IrUn ))
(5.9)
OrUn = max(f (IrLn ), f (IrUn ))
(5.10)
The output of the rough neuron (Orn ) will be computed as follows: Orn =
OrUn − OrLn avarge(OrUn , OrLn )
(5.11)
The basic structure of rough neural network is given in (Aboul Ella and Dominik, 2006). Classification error estimation and convergence is taken up in the backward runs of the rough neural network. We estimate an error function for every neuron in a layer, the output of which is propagated further based on inverse transfer function and weights on the links (Sandeep and Rene, 2006; Aboul Ella and Dominik, 2006). A rule importance measure RI was used as an evaluation to study the quality of the generated rule. It is defined by: RI =
τr , ρr
(5.12)
5–11
Rough Hybrid Scheme: An application of breast cancer imaging
image 1
HE
FHH
Fuzzy rule
image 2
HE
FHH
Fuzzy rule
image 3
HE
FHH
Fuzzy rule
TABLE 5.1
Contrast enhancement results
where τr is the number of times a rule appears in all reduct and ρr is the number of reduct sets. The quality of rules is related to the corresponding reduct(s) which are generating rules that cover the largest parts of the universe U . Covering U with more general rules implies smaller size of a rule set. Importance rule criteria introduced in (Aboul Ella and Dominik, 2006) were used to study the rules’ importance.
5.4
Results and Discussion
In this section, the results of all processes using the proposed hybridization technique are discussed. To evaluate the visual performance of the algorithm, a number of images containing masses from the Mammographic Image Analysis Society (MIAS) database were selected (MIA, 2003). Table 5.1 illustrates the contrast enhancement results: (a) images 1, 2, 3, show the original images; (b) the HE images show the histogram equalization enhancement (HE) result; (c) the FHH images show the Fuzzy Histogram Hyperbolization (FHH) enhancement result; (d)the fuzzy rule images represent the fuzzy rule based result. Table 5.2 depicts S-FCM and M-FCM visual clustering results with different initiation parameters. The weight parameter of cost function is ranging from 0.001 to 0.0000001 and the clusters’ number ranging from three to eight. From the obtained results, we observe that both algorithms get the same good results with higher number of clusters and small weight of the cost function. The average of segmentation accuracy of the introduced algorithm is about 3.9837% error, which means that it is robust enough.
5–12
Rough Fuzzy Image Analysis
image 4
M-FCM with 3 clusters
M-FCM with 5 clusters
M-FCM with 6 clusters
M-FCM with 7 clusters
S-FCM with 7 clusters
TABLE 5.2
FCM Segmentation results with different number of clusters
(a) Rules numbers
TABLE 5.3
(b) Classification accuracy
Rules and classification accuracy
Table 5.3 shows the number of generated rules before and after pruning process and the overall classification accuracy. From Table 5.3(a) we can observe that the number of generated rules for all algorithms is very large. It is greater than the number of objects and that makes classification unacceptably slow. Therefore, it is necessary to prune the rules during their generation. Table 5.3(b) shows that rough hybrid approach accuracy is much better than the neural networks, rough sets, support vector machine and decision tree.
5.5
Conclusion
In this chapter, an application of breast cancer imaging has been chosen and rough hybridization scheme that combines the advantages of fuzzy sets, rough sets and neural networks in conjunction with statistical feature extraction techniques, have been applied to test their ability and accuracy in detecting and classifying breast cancer images into two
Rough Hybrid Scheme: An application of breast cancer imaging
5–13
outcomes: cancer or non-cancer. Algorithms based on fuzzy image processing are first applied to enhance the contrast of the original image. Then the features characterizing the underlying texture of the interesting regions are built. Feature extraction is derived from the gray-level co-occurrence matrix. Rough set approach for attribute selection and rule generation is used. Rough neural network is designed to discriminate different regions of interest and test whether they are cancer or non-cancer specific. The application of breast cancer imaging is chosen as a case study. Finally, a rough neural network is designed to discriminate different regions of interest in order to separate them into malignant and benign cases. Experimental results show that the introduced scheme is very successful and has high detection and classification accuracy. The results show that the rough hybrid approaches have high detection accuracy, reaching over 98%.
5–14
Rough Fuzzy Image Analysis
Bibliography 2003. http://www.wiau.man.ac.uk/services/mias/miasmini.htm. The Mammographic Image Analysis Society: Mini Mammography Database. Aboul Ella, Hassanien. 2003. Classification and feature selection of breast cancer data based on decision tree algorithm. International Journal of Studies in Informatics and Control 12(1):33–39. ———. 2007. Fuzzy-rough hybrid scheme for breast cancer detection. Image and computer vision 25(2):172–183. ———. 2009. Intelligence techniques for prostate ultrasound image analysis. International Journal of Hybrid Intelligence System 6(4):155–167. Aboul Ella, Hassanien, J.M. Ali, and N. Hajime. 2004. Detection of spiculated masses in mammograms based on fuzzy image processing. In Proceedings of the 7th international conference on artificial intelligence and soft computing (icaisc2004), lnai, springer, vol. 3070, 1002–1007. Aboul Ella, Hassanien, and Selzak Dominik. 2006. Rough neural intelligent approach for image classification: A case of patients with suspected breast cancer. International Journal of Hybrid Intelligent Systems 3(4):205–218. Bazan, J., H.S. Nguyen, S.H. Nguyen, P. Synak, and J. Wr´oblewski. 2000. Rough set algorithms in classification problem. In Rough set methods and applications, 49–88. Physica Verlag. Chiang, F., and R. Braun. 2004. Intelligent failure domain prediction in complex telecommunication networks with hybrid rough sets and adaptive neural nets. In 3rd international information and telecommunication technologies symposium, i2ts2004. Sao Carlos Federal Universit. Greco, S., M. Inuiguchi, and R. Slowinski. 2006. Fuzzy rough sets and multiple-premise gradual decision rules. International Journal of Approximate Reasoning 41(2):179– 211. Guo, Qi, Jiaqing Shao, and Virginie Ruiz. 2009. Characterization and classification of tumor lesions using computerized fractal-based texture analysis and support vector machines in digital mammograms. International Journal of Computer Assisted Radiology and Surgery 4(1):11–25. Haralick, R.M. 1979. Statistical and structural approaches to texture. Photogrammetric Engineering and Remote Sensing 49(1):55–64. Henry, C., and J. F. Peters. 1996. Rough neural networks. In Proc. of the 6th int. conf. on information processing and management of uncertainty in knowledge-based systems ipmu96, 1445–1450. Hirano, S., and S. Tsumoto. 2005. Rough representation of a region of interest in medical images. International Journal of Approximate Reasoning 40(1-2):2334. Hu, X., T.Y. Lin, and J. Han. 2004. A new rough sets model based on database systems. Fundamenta Informaticae 59(2-3):135–152.
Rough Hybrid Scheme: An application of breast cancer imaging Ikedo, Yuji, Takako Morita, Daisuke Fukuoka, Takeshi Hara, Gobert Lee, Hiroshi Fujita, Etsuo Takada, and Tokiko Endo. 2009. Automated analysis of breast parenchymal patterns in whole breast ultrasound images: preliminary experience. International Journal of Computer Assisted Radiology and Surgery 4(3):299–306. Kerre, E., and M. Nachtegael. 2000. Fuzzy techniques in image processing: Techniques and applications. In Studies in fuzziness and soft computing, vol. 52. Maglogiannis, Ilias, Elias Zafiropoulos, and Ioannis Anagnostopoulos. 2007. An intelligent system for automated breast cancer diagnosis and prognosis using svm based classifiers. Applied Intelligence 30(1):24–36. Nachtegael, M., M. Van-Der-Weken, D. Van-De-Ville, D. Kerre, W. Philips, and I. Lemahieu. 2001. An overview of classical and fuzzy-classical filters for noise reduction. In In: 10th international ieee conference on fuzzy systems fuzz-ieee’2001, 3–6. Nystrom, L., I. Andersson, N. Bjurstam, J. Frisell, B. Nordenskjold, and L. E. Rutqvist. 2002. Long-term effects of mammography screening: updated overview of the swedish randomised trials. Lancet 359:909919. Organization, World Health. 2005. Iinternational agency for research on cancer, biennial report. Pawlak, Z. 1982. Rough sets. Int.J. Computer and Information Sci. 11:341–356. Peters, J.F., H. Liting, and S. Ramanna. 2001. Rough neural computing in signal analysis. Computational Intelligence 17(3):493–513. Peters, J.F., A. Skowron, H. Liting, and S. Ramanna. 2000. Towards rough neural computing based on rough membership functions: Theory and application. In In: Rsctc’2000- lnai-springer, 611–618. Polkowski, L. 2002. Rough sets:mathematical foundations. In Rough sets:mathematical foundations. Physica-Verlag. Rajendra, Acharya U, E. Y. K. Ng, amd J. Yang Y. H. Chang, and G. J. L. Kaw. 2008. Computer-based identification of breast cancer using digitized mammograms. J Med Syst 32:499507. Rosenfeld, A. 1983. On connectivity properties of grayscale pictures. Pattern Recognition 16:47–50. Sandeep, Chandana, and V. Mayorga Rene. 2006. Rough adaptive neuro-fuzzy inference system. International Journal of Computational Intelligence 3(4):289–295. Setiono, R. 2000. Generating concise and accurate classification rules for breast cancer diagnosis. Artificial Intelligence in Medicine 18(3):205–219. ´ ezak, D. 2000. Various approaches to reasoning with frequency-based decision Sl¸ reducts: a survey. In Rough sets in soft computing and knowledge discovery: New developments, ed. T.Y. Lin L. Polkowski, S. Tsumoto. Physica-Verlag. Starzyk, J.A., N. Dale, and K. Sturtz. 1981. A mathematical foundation for improved reduct generation in information systems. Knowledge and Information Systems Journal 2(2):131–147.
5–15
5–16
Rough Fuzzy Image Analysis
Sushmita, M., and K. Pal Sankar. 2005. Fuzzy sets in pattern recognition and machine intelligence. Fuzzy Sets and Systems 156(3,):381386. Zadeh, L.A. 1965. Fuzzy sets. Information and Control 201:72–81.
6 Applications of Fuzzy Rule-based Systems in Medical Image Understanding 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Fundamentals of fuzzy rule-based classification system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Fuzzy rule generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wojciech Tarnawski Wroclaw University of Technology, Poland
Gerald Schaefer School of Engineering & Applied Science, Aston University, U.K.
6–1 6–2 6–4
Fuzzy rule generation with grade of certainty and weighted input patterns • Weighted fuzzy classification • Weighted fuzzy classifier with integrated learning • Optimisation of fuzzy rules base • Fuzzy rule generation with clustering and learning by examples • Fuzzy clustering of the input feature space • Generation of continuous membership functions and class labeling • Generation and optimisation of fuzzy rule base by learning from examples
6.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–16
Tomoharu Nakashima College of Engineering, Osaka Prefecture University, Japan
Lukasz Miroslaw Wroclaw University of Technology, Poland
6.1
Breast cancer diagnosis based on histopathology • Breast cancer classification based on thermograms • Diagnosis of precancerous and cancerous lesions in contact laryngoscopy
6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–25 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–29
Introduction
Image understanding is a process representing the complex interaction between a computer vision system and one or more digital images. According to (Tsotos, 1987), given a goal or a reason for looking at a particular scene, image understanding system should produce descriptions of both the images and the world scenes that the images represent. In terms of medical imaging, image undertanding should lead to a successful interpretation of the images and ideally contribute to an accurate diagnosis. Recent research aimed at machine learning methods to develop strategies with the use of ad-hoc knowledge about the analysed images and their context. Among them, one of the most promising approaches to explain human image understanding is rule-based symbolic processing. Here, rules are extracted through learning from examples or directly from expert knowledge. Many image understanding applications involve tasks such as image segmentation and edge detection that extract significant information from an image which then often represents the input to a 6–1
6–2
Rough Fuzzy Image Analysis
pattern classification procedure. A set of rules may be used for image feature extraction (e.g., in image segmentation) as well as for classifying them (e.g., for image recognition) into the set of relevant image descriptors. While in the past fuzzy rule-based systems have been applied mainly to control problems (Sugeno, 1985; Lee, 1990), recently they have been also used in pattern recognition tasks (Nozaki, Ishibuchi, and Tanaka, 1996; Klir and Yuan, 1995; Grabisch, 1996; Grabisch and Nicolas, 1994; Ishibuchi and Nakashima, 1999b,a; Grabisch and Dispot, 1992; Ishibuchi, Nozaki, and Tanaka, 1992; Tarnawski and Cichosz, 2008; Tarnawski, Fraczek, Krecicki, and Jelen, 2008b; Tarnawski, Fraczek, Jelen, Krecicki, and Zalesska-Krecicka, 2008a). A fuzzy rule base consists of a set of fuzzy If-Then rules which together with an inference engine, a fuzzifier, and a defuzzifier, form a fuzzy rule-based system. The role of the fuzzifier is to map inputs related to crisp image features to fuzzy subsets by applying appropriate membership functions. In rule-based systems, inference (reasoning) is understood as the final unique assignment of an object under consideration to a specified class. For fuzzy classification systems, this assessment corresponds to a defuzzification process, also called fuzzy reasoning, which chooses the class with the highest membership degree. One might ask the question: what are the advantages of fuzzy rules over crisp rules for image understanding problems? In image understanding tasks the antecedents and the consequents of an If-Then rule are often represented in the form of fuzzy rules. The reason for this is that in real images it is usual to have noisy or imprecise information. Image objects attributes such as “rather dark” , “well contrasted”, “highly patterned” or the spatial relationships between image objects described as “close to”, defy a precise definition, and are hence better modelled by fuzzy sets. In this chapter we show how fuzzy rule-based systems can be successfully employed in medical image undertanding tasks. We first provide an introductory section which covers the fundamentals of fuzzy rule-based classification systems. The following sections are focussed on several approaches of problem-oriented methodologies for fuzzy rules generation. We group them into two parts where the first is concerned with weighted fuzzy rule-based system which allow additional adjustment through weighted input patterns, and the second comprises fuzzy clustering and learning by examples. In the first strategy the antecedent part of the rules is initialized manually, while for the second membership functions reflecting the input training data distribution are obtained by fuzzy clustering. In both approaches the consequent part is determined from the given training patterns, but in two completely different ways. The approach based on weighted fuzzy rule-base systems requires the full training data set, i.e. input and output of every element in the training set. Therefore, this approach represents a supervised generation of fuzzy rule bases. In the second approach, we perform automatic labelling of input patterns with class labels. Also, in some cases we require the system to propose the optimal number of classes of input feature space. Therefore, this leads to an unsupervised generation of rules. Both methods are task-dependent and the choice one of these depends on the form of the training data used for rule generation. The book chapter provides an overview on the current trends in fuzzy-rule based systems with a special emphasis on medical image understanding. Original methods developed by the authors serve as examples related to computer aided diagnosis in medical imaging, in particular breast cancer diagnosis from digitised images of fine needle aspirates, and from thermograms, and for diagnosis of precancerous and cancerous lesions by contact laryngoscopy. Experimental results confirm the efficacy of the presented fuzzy rule base approaches.
6.2
Fundamentals of fuzzy rule-based classification system
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–3
Let us define the fuzzy classification task for image understanding as follows (Hoeppner, Klawonn, and Runkler, 1999). We have n real variables (image features) x1 , . . . , xn with domains Xi = [ai , bi ], ai ≤ bi , i = 1, 2, . . . , n, a finite set C of M classes and a partial mapping class : Xi × · · · × Xn → C (6.1) that assigns classes to some, but not necessarily to all, vectors (x1 , . . . , xn ) ∈ X1 × · · · × Xn . The aim is to find a fuzzy classifier that solves this classification problem. A fuzzy classifier is based on a finite set R of rules of the form Rj ∈ R, j = 1, 2, . . . , N : (1)
Rule Rj : IF xi is Aj
(n)
and . . . and xn is Aj
(1)
(n)
where C ∈ C is one of the classes Aj , . . . , Aj membership functions these functions
(i) µj , i
(1) (n) µj , . . . , µn .
T HEN class is C
(6.2)
are antecedent fuzzy sets described by fuzzy
In order to keep the notation clear, we incorporate
= 1, 2, . . . , n directly in the rules: (1)
Rule Rj : IF xi is µj
(n)
and . . . and xn is µj
T HEN class is C
(6.3)
In real image understanding tasks one would replace them by suitable linguistic values like “rather dark”, “well contrasted”, “highly patterned”, etc. and associate these linguistic value with corresponding fuzzy membership functions. Fig. 6.1 shows a simplified block diagram of a fuzzy rule-based system (Yager and Filev, 1994) that realises process of fuzzy reasoning. The fuzzy result as output of the fuzzy rule base indicates the degree to which an input pattern X1 × . . . × Xn satisfies our decision criteria. The deffuzification problem defines the strategy of using the fuzzy result in the selection of one representative class of the set C.
FIGURE 6.1: Block diagram of fuzzy rule-based classification system
Fuzzy reasoning is realizable via a number of strategies such as max-min reasoning, maxmatching, max-accumulated matching, and centroid defuzzification (Bezdek J.C., 1992; Zimmerman, 1991). Here, however, we restrict our attention to the simplest max-min strategy, i.e., we evaluate the conjunction in the rules by the minimum and aggregate the results of the rules by the maximum. Therefore, we define µRj (x1 , . . . , xn ) =
min
i∈{1,...,n}
(i)
{µRj (xi )}
(6.4)
as the degree to which the antecedents of rule Rj are satisfied, and (R)
µCk (x1 , . . . , xn ) = max {µRj (x1 , . . . , xn )|C = Ck }
(6.5)
is the degree to which the vector (x1 , . . . , xn ) is assigned to class Ck ∈ C, k = 1, . . . , M . The defuzzification, the second and final process, which provides an assignment of a unique
6–4
Rough Fuzzy Image Analysis
(crisp) class to a given vector (x1 , . . . , xn ) is carried out by the following mapping, if (R) (R) µCk (x1 , . . . , xn ) > µCl (x1 , . . . , xn ) R(x1 , . . . , xn ) =
Ck unknown class ∈ /C
∀Cl ∈C,Cl 6=Ck
(6.6)
If there are two or more classes that are assigned the maximal degree by the rules, then we do not classify it and assign it to an unknown class.
6.3
Fuzzy rule generation
The process of generating fuzzy If-Then rules, also called system learning, consists of two steps: specification of antecedent part, and determination of a consequent class. Various approaches have been proposed for the automatic generation of rules (Ishibuchi and Nakashima, 1999b,a; Grabisch and Dispot, 1992). In this section, we describe two approaches: the first on is related to fuzzy rule generation with a grade of certainty and weighted input patterns, while the second one performs rule generation through fuzzy clustering with learning by examples (Wang L., 1991). For image understanding problems, it is common that a rather large number of fuzzy rules are produced. Therefore, we often want to optimise the generated fuzzy rule-base to make recognition system more practical and as well as to speed up the process of reasoning. This optimisation of the rule system can be achieved with or without reducing the number of rules in the base. For every generated rule we assign a special value, called grade of certainty or soundness degree, that can be used for optimisation of the rule base. We also present a hybrid fuzzy classification system which incorporates a genetic algorithm to optimise the rule base. Fuzzy rule-base optimisation can also be realised through rule splitting, and removal or weighting of rules. All these techniques allow us to generate comprehensible image understanding systems with high classification performance.
6.3.1
Fuzzy rule generation with grade of certainty and weighted input patterns
In here, we present a strategy where each generated rule is assigned a grade of certainty. The rule defined in Eq.6.3 is then modified into (1)
(n)
T HEN class is C with CFj , j = 1, . . . , N (6.7) where CFj describes the grade of certainty of the j-th rule. Let us assume that m training patterns xp = (xp1 , . . . , xpn ), p = 1, . . . , m are given for an n-dimensional M -class pattern classification problem. The consequent class Cj ∈ C and the grade of certainty CFj of a j-th fuzzy If-Then rule Rj are determined during the following two steps: Rule Rj : IF xi is µj
and . . . and xn is µj
1. Calculate βC (j) for class C as βC (j) =
X xp ∈C
where
µj (xp ),
(6.8)
6–5
Applications of Fuzzy Rule-based Systems in Medical Image Understanding (1)
(n)
µj (xp ) = µj (xp1 ) · . . . · µj (xpn ), (n)
(6.9)
(n)
and µj (·) is the membership function of fuzzy set Aj . Here, we use triangular fuzzy sets as in Fig. 6.2. 2. Find Cˆ that has the maximum value of βC (j): βCˆ (j) = max {βCk }(j)}.
(6.10)
Membership value
1≤k≤M
1.0
0.0
1.0
Attribute value
FIGURE 6.2: Triangular fuzzy membership function
If two or more classes take the maximum value, the consequent class Cj of rule Rj cannot be determined. In this case, Cj = ∅. Thus, each fuzzy If-Then rule can have only a single consequent class. ˆ The grade of certainty If a single class C takes the maximum value, let Cj be a class C. CFj is determined as βC (j) − β¯ (6.11) CFj = P C βC (j) with
P β¯ =
ˆ C6=C
βC (j)
M −1
.
(6.12)
Using this rule generation procedure, we can generate N fuzzy If-Then rules defined by Eq. 6.7. After both the consequent class Cj and the grade of certainty CFj are determined for all N rules, a new pattern x = (x1 , . . . , xn ) can be classified by the following procedure of fuzzy reasoning: 1. Calculate αC (x) for C, j = 1, . . . , M , as αC (x) = max{µj (x) · CFj |Cj = C}, C ∈ C
(6.13)
2. Find Class C 0 that has the maximum value of αC (x): αC 0 (x) = max {αCk (x)}. 1≤k≤M
(6.14)
6–6
Rough Fuzzy Image Analysis
If two or more classes take the maximum value then the classification of x is rejected (i.e. x is left as an unclassifiable pattern), otherwise x is assigned to the class C.
6.3.2
Weighted fuzzy classification
In this section, we regard pattern classification as a cost minimisation problem and introduce the concept of weighted training patterns. The idea is based on the assumption that in certain cases misclassification of a particular input pattern introduces an extra costs. For example in cancer diagnosis, false positives cases could be penalised more than false negatives, i.e. diagnosing healthy individuals as cancer candidates. To reformulate the pattern classification problem as a cost minimisation problem for each training pattern we introduce a concept of a weight. The weight of an input pattern can be viewed as the cost of its misclassification or rejection. Fuzzy If-Then rules are generated by considering the weights as well as the compatibility of training patterns. In order to incorporate the concept of the weight, Eq. 6.8 of the fuzzy rule generation is modified to X βC (j) = µj (xp ) · ωp (6.15) xp ∈C
where ωp is the weight associated with training pattern p. We note that this fuzzy rule generation method can also be applied to the standard pattern classification problem with no pattern weights. In this case, the class and the grade of certainty are determined from training patterns by specifying a pattern weight as ωp = 1 for p = 1, . . . , m. Under the assumption that a weight is assigned to each training pattern which can be viewed as the relative importance of the patterns, we use the concept of classification/rejection cost to construct a weighted fuzzy classification. We define a cost function Cost(F S) of a fuzzy classification system F S as Cost(F S) =
m X
ωp · zp (F S),
(6.16)
p=1
where m is the number of training patterns, ωp is the weight of the training pattern xp , and zp (F S) is a binary variable set according to the classification result of the training pattern xp by F S: zp (F S) = 0 if xp is correctly classified by F S, and zp (F S) = 1 otherwise (i.e. xp is misclassified or rejected). We use this cost function as well as the classification rate as performance measures. The number of generated fuzzy If-Then rules in a fuzzy classification system depends on the partition of attributes and the dimensionality of the pattern classification problem. Since there are three fuzzy sets for each attribute, the possible number of combinations of antecedent fuzzy sets is N = 3n where n is the number of attributes. We use two weight assigment methods for determining the weights of patterns. In the first method, we assume it is important to correctly classify a certain class. Thus the weights of training patterns of this focussed class are specified as ωp = 1.0. On the other hand the weights of the other training patterns are specified as ωp = 0.5. That is, the cost of misclassifying/rejecting a training pattern from the focussed class is twice as large as that from the other classes. It is clear that different values of ωp can also be used to put more or less emphasis on certain classes. The second weight assigning method considers the distribution of classes in a data set. The weight of a class specified by this method is large if the proportion of the class is small. Thus it is assumed that classification of minor
6–7
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
classes with a small number of patterns is more important than major classes with a large number of training patterns, a situation that is often the case for medical datasets where the number of malignant cases far exceeds those of benign ones. The weight of a training pattern xp from class C is specified by the inverse of the proportion of the class over the given training patterns as ωp = ωC =
1 m · , Z NC
p = 1, . . . , m, C = 1, . . . , M,
(6.17)
where ωp is the weight of the training pattern xp that is from class C, ωC is the weight of patterns from the class C, m is the number of given training patterns, Nc is the number of patterns from the class C, and Z is a normalisation factor that makes the maximum value of weights from the class C a unit value (i.e. max ωC = 1). C
6.3.3
Weighted fuzzy classifier with integrated learning
Learning Fuzzy If-Then rules for weighted training patterns is a strategy that adjusts the grades of certainty CFj and can be employed to achieve improved classification performance. It is based on an incremental learning approach where the adjustment occurs whenever classification of training patterns is performed. When a training pattern is correctly classified we reinforce the grade of certainty of the fuzzy rule that is used for the classification. On the other hand, we decrease the grade of certainty of rule if a training pattern is not successfully classified. Let us assume that we have generated fuzzy If-Then rules by the rule-generation procedure detailed by Eq. 6.8 and Eq. 6.9. We also assume that a fuzzy If-Then rule Rj is used for the classification of a training pattern xp . That is, Rj has the maximum product of the compatibility and the grade of certainty (see Eq. 6.13). The proposed learning method adjusts the grades of certainty of Rj as CFjnew = CFjold − η · ωp · CFjold if xp is misclassified
(6.18)
CFjnew = CFjold + η · ωp · (1 − CFjold ) if xp is correctly classified
(6.19)
and where ωp is the weight of the training pattern xp , and η (the learning rate) is a positive constant value in the interval [0; 1]. One epoch of the proposed learning method involves examining all given training patterns. Thus, there will be m adjustments of fuzzy If-Then rules after all m training patterns are examined. The learning process is summarised as follows: 1. Generate fuzzy If-Then rules from m given training patterns by the procedure described by Eq. 6.4 and Eq. 6.5. 2. Set K as K = 1. 3. Set p as p = 1. 4. Classify xp by using the rules generated in Step 1. 5. After xp is classified, adjust the grades of certainty using Eq. 6.18 or eq. 6.19. 6. If p < m let p := p + 1 and go to Step 4. Otherwise go to Step 7. 7. If K reaches a pre-specified value, stop the learning procedure. Otherwise let K := K + 1 and go to Step 3. Note that K in the above learning procedure corresponds to the number of epochs.
6–8
6.3.4
Rough Fuzzy Image Analysis
Optimisation of fuzzy rules base
While the basic fuzzy rule based system provides a reliable and accurate classifier, it suffers - as do many other approaches - from the curse of dimensionality. In the case of our fuzzy classifier, the number of generated rules increases exponentially with the number of attributes involved and with the number of partitions used for each attribute. The resulting complexity of the generated rule base is very resource demanding, both in terms of computational complexity and in required memory allocation. We are therefore interested in arriving at a more compact classifier that affords the same classification performance while not suffering from the problems. One possibility is to apply a rule splitting step to deal with the problem of dimensionality as suggested in (Schaefer, Nakashima, Zavisek, Yokota, Drastich, and Ishibuchi, 2007). By limiting the number of attributes in each rule to 2, a much smaller rule base was developed. In this section, we take a different approach to arrive at an even more compact rule base by developing a hybrid fuzzy classification system through the application of a genetic algorithm (GA). The fuzzy If-Then rules used do not change and are still of the same form as the one given in Eq. (6.7), i.e. they contain a number of fuzzy attributes and a consequent class together with a grade of certainty. Our approach of using GAs to generate a fuzzy rulebased classification system is a Michigan style algorithm (Ishibuchi and Nakashima, 1999a) which represents each rule by a string and handles it as an individual in the population of the GA. A population consists of a pre-specified number of rules. Because the consequent class and the rule weight of each rule can be easily specified from the given training patterns they are not used in the coding of each fuzzy rule (i.e., they are not included in a string). Each rule is represented by a string using its antecedent fuzzy sets. First, the algorithm randomly generates a pre-specified number Nrule of rules as an initial population. Next, the fitness value of each fuzzy rule in the current population is evaluated. Let S be the set of rules in the current population. The evaluation of each rule is performed by classifying all the given training patterns by the rule set S using the single winner-based method. The winning rule receives a unit reward when it correctly classifies a training pattern. After all the given training patterns are classified by the rule set S, the fitness value f itness(Rq ) of each rule Rq in S is calculated as f itness(Rq ) = NCP(Rq ),
(6.20)
where NCP(Rq ) is the number of correctly classified training patterns by Rq . It should be noted that the following relation holds between the classification performance NCP(Rq ) of each rule Rq and the classification performance NCP(S) of the rule set S used in the fitness function: X NCP(S) = NCP(Rq ). (6.21) Rq ∈S
The algorithm is implemented so that only a single copy is selected as a winner rule when multiple copies of the same rule are included in the rule set S. In GA optimisation problems, multiple copies of the same string usually have the same fitness value. This often leads to undesired early convergence of the current population to a single solution. In our algorithm, only a single copy can have a positive fitness value and the other copies have zero fitness which prevents the current population from being dominated by many copies of a single or few rules. Then, new rules are generated from the rules in the current population using genetic operations. As parent strings, two fuzzy If-Then rules are selected from the current population and binary tournament selection with replacement is applied. That is, two rules are randomly selected from the current population and the better rule with the higher fitness value
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–9
is chosen as a parent string. A pair of parent strings is chosen by iterating this procedure twice. From the selected pair of parent strings, two new strings are generated by a crossover operation. We use a uniform crossover operator where the crossover points are randomly chosen for each pair of parent strings. The crossover operator is applied to each pair of parent strings with a pre-specified crossover probability pc . After new strings are generated, each symbol of the generated strings is randomly replaced with a different symbol by a mutation operator with a pre-specified mutation probability pm . Usually the same mutation probability is assigned to every position of each string. Selection, crossover, and mutation are iterated until a pre-specified number Nreplace of new strings are generated. Finally, the Nreplace strings with the smallest fitness values in the current population are removed, and the newly generated Nreplace strings added to form a new population. Because the number of removed strings is the same as the number of added strings, every population consists of the same number of strings. That is, every rule set has the same number of rules. This generation update can be viewed as an elitist strategy where the number of elite strings is (Nrule − Nreplace ). The above procedures are applied to the new popaulation again. The generation update is iterated until a pre-specified stopping condition is satisfied. In our experiments we use the total number of iterations (i.e., the total number of generation updates) as stopping condition. Algorithm summary To summarise, our hybrid fuzzy rule-based classifier works as follows: Step 1: Parameter specification. Specify the number of rules Nrule , the number of replaced rules Nreplace , the crossover probability pc , the mutation probability pm , and the stopping condition. Step 2: Initialisation. Randomly generate Nrule rules (i.e., Nrule strings of length n) as an initial population. Step 3: Genetic operations. Calculate the fitness value of each rule in the current population. Generate Nreplace rules using selection, crossover, and mutation from existing rules in the current population. Step 4: Generation update (elitist strategy). Remove the worst Nreplace rules from the current population and add the newly generated Nreplace rules to the current population. Step 5: Termination test. If the stopping condition is not satisfied, return to Step 3. Otherwise terminate the execution of the algorithm. During the execution of the algorithm, we monitor the classification rate of the current population on the given training patterns. The rule set (i.e. population) with the highest classification rate is chosen as the final solution.
6.3.5
Fuzzy rule generation with clustering and learning by examples
This section presents how a fuzzy clustering approach can be used to generate fuzzy If-Then rules. In the approaches described in the previous section the antecedent part of the rules is initialised manually. Very often, heuristically chosen membership functions do not reflect the actual data distribution in the input space. Moreover, in image understanding tasks the amount of analysed data can be enormous and complex. According to the original tendency in fuzzy classification systems, membership functions should be subjective, in contrary to probabilities that are objective. However, a set of heuristically chosen membership functions
6–10
Rough Fuzzy Image Analysis
are either too difficult to choose due to the lack of understanding of the human approach or cannot produce a satisfactory result. Membership functions generated from the large amount of image data by a clustering technique is one way to tackle this problem. Clustering is an unsupervised learning method, i.e. samples in the input set are unlabelled (not classified) and, in many cases, the exact number of classes is unknown (Hoeppner et al., 1999). Our task is to divide these samples into several groups according to a similarity measure or inherent structure of the data. In order to build the membership functions from the available image data we can use a clustering technique to partition them, and then produce membership functions from the resulting clusters. Every generated cluster can be interpreted as a single class. Therefore, during an object labelling step, a set of m desired input-output data pairs are produced (the training data set) in the n-dimensional input space: {xp = (xp1 , . . . , xpn ), Ck }, p = 1, . . . , m, Ck ∈ C, k = 1, . . . , M (6.22) We have to distinguish between the training data set which is used for clustering and building the classifier, and the test data set which is classified without influencing the clusters. Given an input pattern from test data represented as the image feature vector, the classifier determines its membership in all the classes (not clusters). These output values can be interpreted as the result from classifier rules and can be considered as the “examples” for the fuzzy rule base that is later created. In the following, we describe the process of generating fuzzy rules by learning from examples. For this purpose we use a training data set (Eq. 6.22) and use the generated rule to determine a mapping (Eq. 6.1). This approach was originally proposed in (Wang and Mendel, 1991) to derive fuzzy rules for function approximation. In (Chi and Yan, 1995b,a, 1993) this idea was extended to solve problems related to image understanding. Therefore, the method of generation of fuzzy rule base consists of the three main steps: (1) fuzzy clustering of the input feature space (2) generation of a set of membership functions for each input in the feature space and generation of the class labeling and (3) generation and minimisation of fuzzy rule base by learning from examples.
6.3.6
Fuzzy clustering of the input feature space
In our methodology of fuzzy rule generation we focus on objective function-based cluster analysis whose aim is to assign data to clusters so that a given objective function is optimised. The objective function assigns a quality or error to each cluster arrangement based on the distance between data and typical representatives of the clusters called cluster prototypes. A large family of objective functions results from the following basic function (Bezdek J.C., 1992; Hoeppner et al., 1999): XX J(U, V ) = µη (x)(v)d˙2 (x, v) (6.23) x∈X v∈V
where d(x, v) is a distance between the datum x and the cluster prototype v, µ(x)(v) is an M × m matrix of the membership values, m denotes the number of data patterns, M the number of clusters (classes) and η is an exponent weight factor called a fuzzifier factor. The criterion for the optimisation of the objective function is obvious: it has to be minimised. Because the values of function J are dependent on U and V , the clustering process is related to iterative calculations of cluster prototypes v ∈ V . and the membership values ∀x∈v µ(x)(v) ∈ U . Based on the objective function (Eq. 6.23) several models with various distance measures and different prototypes have been developed. The most popular approach proposed by
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–11
Dunn (Dunn, 1973) and Bezdek (Bezdek J.C., 1992) is known as the fuzzy c-means (FCM) algorithm. The FCM algorithm is only capable of generating spherical clouds of points that results from employing the Euclidean metric in the calculations. This can be a significant limitation because the spherical shape does not necessarily represent the data in the optimal way. An alternative algorithm, proposed by Gustafson and Kessel (GK) (Gustafson and Kessel, 1979), has a different metric that permits the shape of clusters to be ellipsoidal which can be better fitted to the data. Additionally, for ellipsoidal clusters the loss of information represented by a box formed by overlapping regions is lower, i.e. the box is rectangular and not square as for FCM (see. Fig. 6.3). Each cluster is induced by a symmetric and positive definite matrix defined as a norm of its own: p (6.24) ||y||B = y T By
FIGURE 6.3: Illustration of the loss of information for ellipsoidal clusters (Hoeppner et al., 1999)
Using this approach the distance value d(x, v) is defined as the Mahalanobis metric q (6.25) d(x, v) = ||x − v||B = ||x − v||T B||x − v||
6–12
Rough Fuzzy Image Analysis
where we assume that det(B) = 1 to avoid a minimisation of the objective function by matrices with zero or almost zero entries. The elements of matrix B for cluster v are calculated according to q B=
n
det(Gv ) ∗ G−1 v
where Gv is an n × n diagonal matrix, the fuzzy covariance matrix defined as X Gv = µηv (x)(x − v)(x − v)T
(6.26)
(6.27)
x∈X
In this concept only the cluster shapes can vary now, while the cluster size is kept constant, which in turn allows for a more intuitive partition of the data. The cluster algorithm determines prototype locations, prototype parameters and a matrix of memberships using the iterative procedure as described in (Gustafson and Kessel, 1979). The membership matrix contains the membership values µ(xj )(v) of the j-th input pattern xj for cluster v and these values satisfy the following conditions: 0 ≤ µ(xj )(v) M X v=1
µ(xj )(v)
≤ 1, j = 1, 2, . . . , m, v = 1, 2, . . . , M =
1, 0 <
m X
µ(xj )(v) < m
j=1
If the data set is representative for the system, we can assume that additional data could cause only slight modifications of the clusters shape. First of all, we might want to determine the memberships of all possible data. Therefore, we have to extend the discrete membership matrix to a continuous membership function.
6.3.7
Generation of continuous membership functions and class labeling
It is commonly known that membership functions can often be assigned linguistic labels. In particular, in image understanding tasks we very often use attributes such as “rather colorful”, “well contrasted”, etc. This would make fuzzy system easy to read and interpret by humans. But it is often very difficult to specify meaningful labels in a multi-dimensional feature space. Assigning labels is typically easier in one-dimensional domains. We therefore project the discrete membership values µ(xj )(v) to the respective axes related to image features and denoted as x1 , . . . , xn . To obtain continuous membership functions from projected membership various methods have been proposed (Gustafson and Kessel, 1979; Sugeno and Yasukawa, february 1993; Zheru, Hong, and P., 1996). In our approach we suggest to approximate each generated cluster by a hyper-ellipsoid described by its cluster prototype. The lengths of hyper-ellipsoid axes are defined by the variance of the cluster projected on each dimension. The projection produces a triangular membership function with a peak pointing to the corresponding cluster prototype v = {v (1) , . . . , v (n) }. The size of the triangular base depends on variances {var(1) (v), . . . , var(n) (v)} of each prototype (see Fig. 6.4). In our case we have described the triangular function as the special case of the trapezoidal function, Π defined as 0 x≤a (x − a)(y − b) a < x ≤ b 1 b<x≤c Π(x, a, b, c, d) = (6.28) (d − x)(d − c) c < x ≤ d 0 x>d
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–13
FIGURE 6.4: GK clusters projections and derived membership functions. h(i) are defined in Eq.6.29 and the lower index indicates the cluster number.
(i)
where a = v (i) − h2 , b = c = v (i) , d = v (i) + width of the membership function
h(i) 2
for i = 1, 2, . . . , n and h(i) defines the
h(i) = 4 · var(i) (v)
(6.29)
Since membership functions overlap each other, they are merged. In detail, two neighboring membership functions Π(x, al−1 , bl−1 , cl−1 , dl−1 ) and Π(x, al , bl , cl , dl ) are merged if the following condition is satisfied: bl−1 + cl−1 bl + cl − ≤ lt 2 2
(6.30)
where lt = {1 , . . . , n } are pre-specified thresholds defined for each input. The resulting membership function after the merging has the following form: Π(x, a = min(al , al−1 ), b = min(bl , bl−1 ), c = min(cl , cl−1 ), d = min(dl , dl−1 )). As a result of the merging process, some membership functions have trapezoidal shapes instead of triangular ones. After merging, pattern labelling is performed. Given the input feature vector from the training data set (Eq. 6.22) the resulting class CK is assigned the label of a cluster with the highest membership value.
6–14
Rough Fuzzy Image Analysis
In image understanding tasks the system learning can achieved in several ways. The first step is related to feature selection which in practice is always task-dependent. Moreover, in many cases only the feature set is defined. The number of classes is difficult to estimate or is completely unknown. In these cases the learning process is “supervised” by the knowledge expert. In our approach, expert validation can be performed in two ways: (1) the expert defines the number of classes or (2) the expert accepts the number of classes proposed by the fuzzy clustering algorithm. In the second case, the suggestion is based on cluster validity measures (Zheru et al., 1996) and classification results on the images used for system learning. We have used the following three cluster validity measures, defined for m input patterns and M clusters (classes), µ(xj )(v), denotes membership of the j-th object to the cluster and vk denotes a prototype of the k-th cluster: 1. Partition coefficient 1 XX (µ(xj )(vk ))2 m j=1 M
PK =
m
(6.31)
k=1
Additionally, suppose that ΩM ∗ represents the clustering result, then the optimal choice of M ∗ is given by maxM ∗ {maxΩ∗M P K(M ∗ )}, M ∗ ≥ 2. 2. Separation coefficient S=
M X m X
(µ(xj )(vk ))η (||xj − vk ||2 − ||
Pm
k=1 j=1
l=1
xl
m
− vk ||2 )
(6.32)
The optimal choice of M ∗ is given by minM ∗ {minΩ∗M S(M ∗ )}. 3. Separation and compactness coefficient PM Pp CS =
k=1
2 j=1 (µ(xj )(vk )) ||vk mink,l ||vk − vl ||2
− xj ||
(6.33)
Here, the optimal choice of M ∗ is given by minM ∗ {minΩ∗M CS(M ∗ )}. The clustering procedure can be useful when the partition of the feature space is as crisp as possible. It means that clustering with optimal number of clusters should make all input patterns as close to their cluster prototypes as possible and all cluster prototypes should be separated as much as possible. In this case, the possible loss of information is relatively small.
6.3.8
Generation and optimisation of fuzzy rule base by learning from examples
The next task is to generate a set of fuzzy rules from the training data defined in Eq. 6.22 and use these fuzzy rules to determine a mapping in Eq. 6.1. The method of learning by examples consists of the following three steps: Step 1: Find the intervals for each input of the feature space Find the domain intervals for each input by finding the crossing points between the generated continuous membership functions. It makes partition of each input into q regions denoted as r1 , r2 , . . . , rq and generate a crisp set of hyper-cubes in the feature space (see Fig. 6.5).
6–15
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
FIGURE 6.5: Illustration of definition of intervals r1 , r2 , . . . , rq for a single feature x
Step 2: Generate fuzzy rules from given training data set Produce a rule for each input-output data pair included in the training data by assigning the given inputs to the regions with maximum membership value, e.g. for a 2-dimensional feature vector it is processed as: (1)
(2)
x1 = 0.3, x1 = 0.8, CK = 1 ⇒ (1)
(1)
(1)
(2)
(1)
x1 (µ(x1 ))(Ck = 1) = 0.65 in r2 (max)), x1 (µ(x1 ))(Ck = 1) = 0.75 in r4 (max)) ⇒ (1)
Rule1 : IF x1
(1)
is R2
(2)
AND x1
(2)
is R4
THEN C is Class = 1
Step 3: Minimisation of fuzzy rules As mentioned above, each data pair from the training data set generates one rule. Usually there are a large number (several thousand) of available data pairs, so it is very likely that some conflicting rules are produced. These conflicting rules have the same IF part but different THEN parts. One way to solve this problem is to assign a soundness degree SD, and then to select from the subset of conflicting rules that rule with the maximal soudness degree. Two strategies are proposed to assign a soundness degree to a rule Rj , j = 1, 2, . . . , N where N denotes the number of rules. Strategy 1 The soudness degree SD(Rj ) for j-th rule is determined by the ratio of the number of training data pairs which supports the rule WRj and the total number of patterns which have the same IF part WIFj : SD(Rj ) =
WRj WIFj
(6.34)
This strategy works better when a large number of training patterns are available. Using this strategy we incorporate the statistical information into the fuzzy system resulting in more reliable decision. Strategy 2 The soundness degree is determined by the membership grades of inputs and outputs: (1)
(n)
SD(Rj ) = [µ(xj )(v = C) ∗ . . . ∗ µ(xj )(v = C)] ∗ µ(xj )(v = C)
(6.35)
6–16
Rough Fuzzy Image Analysis For this strategy we can also introduce the concept of weighted input patterns presented already in Section 6.3.1. The weight would be interpreted as degree of belief to the usefulness of each data pair from training data set. Suppose that for every input-output pattern we can assign a weight ω then we have: (1)
(n)
SD(Rj ) = [µ(xj )(v = C) · . . . · µ(xj )(v = C)] · µ(xj )(v = C)ω
(6.36)
This strategy gives good results even when a small number of training samples is available.
FIGURE 6.6: Example of two clusters and the corresponding rule bank in two-dimensional space
After the minimisation of fuzzy rules we create a rule bank which later will be used for classification. The form of the rule bank for a 2-dimensional feature space with 2 clusters is shown in Fig. 6.6. The boxes of the bank are filled with fuzzy rules originating either from the numerical data or from linguistic rules given by the expert. For conjunction AND defined between two features only one box of the rule bank is filled. For OR relations all the boxes are filled that are in the rows or columns corresponding to the regions of the antecedent IF part. To determine the mapping (Eq. 6.1) in fuzzy systems, a deffuzification method needs to be adopted. The deffuzification problem defines the strategy of using the fuzzy result given by degree values Eq. 6.4 and Eq.6.5 to guide us in the selection of one representative element (class) of the set C.
6.4
Applications
To demonstrate the applicability of the ideas described in this chapter we present a few applications of fuzzy If-Then rule-based systems in medical image understanding. We present computer-aided diagnosis systems for breast cancer diagnosis (Section 6.4.1) and thermography (Section 6.4.2) where the idea of weighted fuzzy classifier with integrated learning
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–17
is adopted. For these two cases the system expects a complete pre-classified training set. Fuzzy rule generation with clustering and learning by examples in contact endoscopy is presented in Section6.4.3. Computer-aided diagnosis in medical imaging can be defined as a diagnosis that is made by a physician who uses the output from computerised analysis of medical images as a “second opinion” in the process of making diagnostic decisions. This automated second opinion is formed on the basis of medical image data. The diagnostic system related to image understanding should cover all aspects of low-level processing to high-level data recognition/interpretation. Diagnostic accuracy depends not only on the quality of data interpretation but also on data acquisition.
6.4.1
Breast cancer diagnosis based on histopathology
The first application where we evaluate the fuzzy classifier is the diagnosis of breast cancer based on images of fine needle aspirates (see Fig. 6.7 for an example image (Street, Wolberg, and Mangasarian, 1993)).
FIGURE 6.7: Image of fine needle aspirates of breast mass
Fluid samples were extracted with a fine needle from the patient’s breast mass, placed on a glass slide and stained to highlight the nuclei of constituent cells (Street et al., 1993). From the captured images, a number of features was extracted as follows. First, curve-fitting techniques were applied to detect boundaries of the nuclei. For each nucleus 10 features were extracted, namely radius, standard deviation of gray-scale values, perimeter, area, smoothness (local variation in radius lengths), compactness, concavity, number of concave parts, symmetry and fractal dimension. The mean, standard deviation and maximum values of these features over all nuclei in the image were then calculated to provide a feature vector with 30 values (Street et al., 1993). In total the dataset comprises 569 samples of which 357 are known to constitute benign and the remaining 212 malignant cases. Since there are two classes (i.e. benign and malignant), we examined the performance of fuzzy classifiers with three weight assignment: benign focussed (1.0 for benign patterns and 0.5 for malignant patterns), malignant focussed (1.0 for malignant patterns and 0.5 for benign patterns), and class-proportional. 10-fold cross validation was performed where the given data set is divided into ten subsets and each subset is used as test data set while the other nine data sets are used as training patterns. The experimental results, expressed in terms of
6–18
Rough Fuzzy Image Analysis
classification rate and total cost are listed in Tables 6.1 and 6.2 which also provide results for a conventional fuzzy rule-based classifier as described in Section 6.3.1. We note that the performance of the conventional method is constant as it does not consider the weight of training patterns. Table 6.1 shows the performance for the training patterns and Table 6.2 that of the test dataset. From there we can see that in two of the three cases there is a clear improvement, both in terms of overall cost (the main aim of the proposed classifier) as well as in classification performance. For both the benign-focussed and class-proportional cases the cost is more than halved as compared to a standard fuzzy classifier. In turn the classification rate is improved from 89.98 to 92.97 and 94.38 respectively.
Benign focussed Malignant focussed Class-proportional
TABLE 6.1
Proposed 2.74 7.89 2.88
Cost Conventional 6.04 3.12 6.05
Experimental results for training patterns on breast cancer diagnosis.
Benign focussed Malignant focussed Class-proportional
TABLE 6.2
Classification rate (%) Proposed Conventional 92.26 89.28 72.37 89.28 93.86 89.28
Classification rate (%) Proposed Conventional 92.97 89.98 72.58 89.98 94.38 89.98
Proposed 24 78 26.31
Cost Conventional 57 28.5 57
Experimental results for test patterns on breast cancer diagnosis.
The experimental results for the weighted classifier with integrated learning described in Section 6.3.3 are presented in Table 6.3. Except for the malignant focussed case the total cost of the weighted classifier with learning is now lower in all cases as compared to classifier without learning strategy.
Benign focussed Malignant focussed Class-proportional
TABLE 6.3
6.4.2
Classification rate (%) With learning Conventional 89.99 89.98 88.06 89.98 91.74 89.98
Cost With learning Conventional 45 57 37 28.5 39.65 57
Classification results on breast cancer dataset after learning.
Breast cancer classification based on thermograms
The second application for computer-aided diagnosis of breast cancer is related to thermal medical imaging. This method uses a camera with sensitivities in the infrared to provide a picture of the temperature distribution of the human body or parts thereof. It is a non-
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–19
invasive, non-contact, passive, radiation-free technique that can also be used in combination with anatomical investigations based on x-rays and three-dimensional scanning techniques such as CT and MRI and often reveals problems when the anatomy is otherwise normal. It is well known that the radiance from human skin is an exponential function of the surface temperature which in turn is influenced by the level of blood perfusion in the skin. Thermal imaging is hence well suited to pick up changes in blood perfusion which might occur due to inflammation, angiogenesis or other causes. Asymmetrical temperature distributions as well as the presence of hot and cold cold are known to be strong indicators of an underlying dysfunction (Uematsu, 1985). Despite earlier, less encouraging studies, which were based on low capability and poorly calibrated equipment infrared imaging has been shown to be well suited for task of detecting breast cancer, in particular when the tumor is in its early stages or in dense tissue (Anbar, Milescu, Naumov, Brown, Button, Carly, and AlDulaimi, 2001; Head, Wang, Lipari, and Elliott, 2000). Early detection is important as it provides significantly higher chances of survival (Ng and Sudarshan, 2001) and in this respect infrared imaging outperforms the standard method of mammography which can detect tumors only once they exceed a certain size. On the other hand, tumors that are small in size can be identified using thermography. As cancer cells have a high metabolic activity this leads to an increase in local temperature which can be picked up in the infrared. In this section, we perform breast cancer detection based on thermography, using a series of statistical features extracted from the thermograms coupled with a fuzzy rule-based classification system for diagnosis. The features stem from a comparison of left and right breast areas and quantify the bilateral differences encountered. Following this asymmetry analysis the features are fed to a fuzzy classification system. The approach presented in section 3.1 is used for generation fuzzy If-Then rules based on a training set of known cases. Experimental results on a set of nearly 150 cases show the proposed system to work well accurately classifying about 80% of cases, a performance that is comparable to other imaging modalities such as mammography. Thermograms for breast cancer diagnosis are usually taken based on a frontal view and/or some lateral views. In our work we restrict out attention to frontal view images (we show an example in Fig. 6.8). As has been shown earlier an effective approach to automatically detect cancer cases is to study the symmetry between the left and right breast (Qi, Snyder, Head, and Elliott, 2000). In the case of cancer presence the tumor will recruit blood vessels resulting in hot spots and a change in vascular pattern and hence an asymmetry between the temperature distributions of the two breast. On the other hand, symmetry typically identify healthy subjects. We therefore follow this approach and segment the areas corresponding to the left and right breast from the thermograms. We then convert the breast regions to a polar co-ordinates representation as it simplifies the calculation of several of the features that we employ. A series of statistical features is then calculated all of which are aimed to provide indications of symmetry between the regions of interest (i.e. the two breasts). In the following we briefly characterise the features we employ. Basic statistical features Clearly the simplest feature to describe a temperature distribution such as those encountered in thermograms is to calculate its statistical mean. As we are interested in symmetry features we calculate the mean for both sides and use the absolute value of the difference of the two. Similarly we calculate the standard temperature deviation and use the absolute difference as a feature. Furthermore we employ the absolute differences of the median temperature and the 90-percentile as further descriptors. Moments
6–20
Rough Fuzzy Image Analysis
Image moments are defined as mpq =
−1 M −1 N X X
xp y q g(x, y)
(6.37)
y=0 x=0
where x and y define the pixel location and N and M the image size. We utilise moments m01 and m10 which essentially describe the centre of gravity of the breast regions. Histogram features Histograms record the frequencies of certain temperature ranges of the thermograms. In our work we construct normalised histograms of both regions of interest. As features we use the cross-correlation between the two histograms. From the difference histogram (i.e. the difference between the two histograms) we compute the absolute value of its maximum, the number of bins exceeding a certain threshold (0.01 in our experiments), the number of zero crossings, energy and the difference of the positive and negative parts of the histogram. Cross co-occurrence matrix Co-occurrence matrices have been widely used in texture recognition tasks (Haralick, 1979) and can be defined as (k)
γTi ,Tj (I) = PRp1 ∈ITi ,p2 ∈I [p2 ∈ ITj , |p1 − p2 | = k]
(6.38)
|p1 − p2 | = max |x1 − x2 |, |y1 − y2 |
(6.39)
with where Ti and Tj denote two temperature values and (xk , yk ) denote pixel locations. In other words, given any temperature Ti in the thermogram, γ gives the probability that a pixel at distance k away is of temperature Tj . In order to arrive at an indication of asymmetry between the two sides we adopted this concept and derived what we call a cross co-occurrence matrix defined as (k)
γTi ,Tj (I(1), I(2)) = PRp1 ∈I(1)Ti ,p2 ∈I(2) [p2 ∈ I(2)Tj , |p1 − p2 | = k]
(6.40)
i.e. temperature values from one breast are related to temperatures of the second side. From this matrix we can extract several features (Haralick, 1979). the ones we are using are XX γk,l Homogeneity G = (6.41) 1 + |k − l| k
l
XX
Energy E =
k
Contrast C =
XX k
and Symmetry S = 1 −
2 γk,l
|k − l|γk,l
(6.43)
l
XX k
|γk,l − γl,k |
(6.44)
l
We further calculate the first four moments m1 to m4 of the matrix XX mp = (k − l)p γk,l k
(6.42)
l
(6.45)
l
Mutual information The mutual information M I between two distribution can be calculated from the joint entropy H of the distributions and is defined as M I = H L + HR + H
(6.46)
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–21
with HL
= −
X
PL (k) log2 pL (k)
(6.47)
k
HR
= −
X
PR (l) log2 pR (l)
l
H
=
XX k
PLR (k, l) log2 pL,R (k, l)
l
and pLR (k, l) pL (k)
= xk,l / =
X
X
x(k, l)
(6.48)
k,l
pLR (k, l)
l
pR (k)
=
X
pLR (k, l)
k
and is employed as a further descriptor employ. Fourier analysis As last feature descriptors we calculate the Fourier spectrum and use the difference of absolute values of the ROI spectra. The features we adopt are the difference maximum and the distance of this maximum from the centre. To summarise we characterise each breast thermogram using the following set of features: 4 basic statisical features, 8 histogram features, 8 cross co-occurrence features, mutual information and 2 Fourier descriptors. We further apply a Laplacian filter to enhance the contrast and calculate another subset of features from the resulting images. In total we end up with 38 descriptors per breast thermogram which describe the asymmetry between the two sides. We normalise each feature to the interval [0;1] to arrive at comparable units between descriptors. For our experiment we gathered a dataset of 146 thermograms of which the correct diagnosis (i.e. malignant or benign) is known. It should be noted that this dataset is significantly larger than those used in previous studies (e.g. (Qi et al., 2000)). For all thermograms we calculate a feature vector of length 38 as outlined above. We then train the fuzzy classifier explained in the previous section using this data to obtain a classifier that is capable of distinguishing cancer patients from healthy individuals. As a first test we wish to examine how the classifier is able to separate the two classes and hence train the classification system on all data available (i.e. on all 146 cases) and then test it on all samples, that is for this experiment the training and test data are identical. We experiment with different number of fuzzy partitions per attribute, from 2 to 15, and show the results in Table 6.4 in terms of classification rate, i.e. the percentage of correctly classified patterns. Looking at Table 6.4 we see that in general classification performance increases with an increase in the number of fuzzy partitions used. A classification rate of above 90% is reached only based on partitioning the attribute values into 9 or more intervals; the best performance is achieved with 15 partitions resulting in a classification rate close to 98%. We notice that even though the classifiers are tested on the same data that was used for training we do not achieve perfect classification. This suggests that we have indeed a challenging data set to deal with as the two classes cannot even be separated by the non-linear division our fuzzy classifier is capable of.
6–22
Rough Fuzzy Image Analysis
FIGURE 6.8: Example of a thermogram of a breast cancer patient (malignant)
# fuzzy partitions 2 3 4 5 6 7 8 9 10 11 12 13 14 15
TABLE 6.4
classification rate [%] 80.82 82.19 84.25 84.93 84.93 89.04 88.36 90.41 91.78 92.47 92.47 97.26 94.52 97.95
Results of breast cancer thermogram classification on training data.
While results on training data provides us with some basic indication of the classification performance only validation on unseen test data will provide real insights into the generalisation capabilities of a classifier as normally classification on such unseen patterns is lower than on previously encountered training samples. We therefore perform standard 10-fold cross-validation on the dataset where the patterns are split into 10 disjoint sets and the classification performance of one such set based on training the classifier with the remaining 90% of samples evaluated in turn for all 10 combinations. We restrict our attention on classifiers with partition sizes of 10 or more as only those achieved good enough classification performance on the training data.
# fuzzy partitions 10 11 12 13 14 15
TABLE 6.5
CR from (Schaefer et al., 2007) 78.05 76.57 77.33 78.05 79.53 77.43
CR hybrid fuzzy approach [%] 80.27 79.18 80.89 77.74 79.25 78.90
Results of breast cancer thermogram classification on test data based on 10-fold cross validation.
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–23
The results received for fuzzy classification with and without the optimisation procedure described in Section6.3.4 are listed in Table 6.5. Here we applied a genetic algorithm to generate an optimised rule base of 100 rules; crossover probability was set to 0.9 and a mutation probability to 0.1. It can be seen that a correct classification rate of about 80% is achieved, which is comparable to that achieved by other techniques for breast cancer diagnosis with mammography typically providing about 80%, ultrasonography about 70%, MRI systems about 75% and DOBI (optical systems) reaching about 80% diagnostic accuracy (Zavisek and Drastich, 2005). We can therefore conclude that the presented approach is indeed useful as an aid for diagnosis of breast cancer and should prove even more powerful when coupled with another modality such as mammography.
6.4.3
Diagnosis of precancerous and cancerous lesions in contact laryngoscopy
The third application we present deals with diagnosis of laryngeal pathology using image data acquired from contact endoscopy. Contact endoscopy (CE) is an in vivo technique (Hamou, 1979; Wardrop, Sim, and McLaren, 2000) of obtaining detailed images of living epithelia. It exploits a modified glass rod lens endoscope which is placed on the tissue surface. Contact endoscopy images depict the cell organisation (see Fig. 6.9, left column) and the critical examination of this biological material is important to understand both normal and pathological biological processes. In many cases, this examination depends on the physician’s level of suspicion regarding the malignancy of the lesion. In this step the physician must visually assess various morphometric characteristics of image objects which are visible as cell nuclei. These features help to decide if the abnormality is likely to be malignant or benign, and to determine the recommended course of action, i.e. whether to repeat the screening, advise a follow-up or perform a biopsy. Until now no cell nuclei classification system in CE imaging was published. The main problem with diagnosis using CE images is connected with the intuitive description of the image objects attributes (see Tab.6.6). Intuitive feature description such as: “rather larger size of the cells nucle”, “highly deformed shape of nuclei”, “high density of cell nuclei” or “cell nuclei are grouped very closely” used in histopathological evaluations defy the precise description of the morphometric attributes such as object size or shape coefficient used by the system for description of the image objects. For these reasons the intuitive expert description can be very well modelled by fuzzy sets. Because of the large number of input patterns and imprecise histological evaluation we decided to apply the method of rules generation with fuzzy clustering. The aim of our endeavours was the design and implementation of a prototype imaging system. Functionality of the system is the following: (1) acquisition of image data for contact endoscopy, (2) processing and interpretation of endoscopic image data and (3) visualisation of results. The nuclei classification results may be treated as quantitative and qualitative additional factors for computer aided cancer diagnosis of the larynx and may be used for selecting suitable areas of tissue prior to biopsy. Classification results were verified by statistical analysis to verify the potential of the proposed ideas. Computerised detection of objects (object segmentation) in contact endoscopy image aim at segmenting cell nuclei or cell including the nucleus. In this case, image segmentation can be understood as a process of grouping image pixels into significant regions denoting the nucleus or cells. The segmentation method used in the proposed system is described in (Tarnawski and Kurzynski, 2007) and consists of image enhancement based on nonlinear diffusion process, followed by a modified watershed segmentation step. The segmentation result is represented as a binary image with detected nuclei represented by white pixels.
6–24
Rough Fuzzy Image Analysis
Our image understanding system works in two modes: learning mode and classification mode. In the learning phase, the system is supervised by an expert/physician. The task here is to generate a set of fuzzy rules using fuzzy clustering and learning by examples as described in Section 6.3.5 for which training data are required. In our case only input values, e.g. object features such as object field, its shape coefficient, from training data are available. Outputs represented by class labels assigned to every object are generated during clustering where the clustering results in the system learning phase should be additionally accepted by the expert. On the authority of histopathological findings (the “the gold standard”), contact endoscopic images were classified by pathologists into the following classes: tumor (SCC), severe dysplasias (SD), mild dysplasias (MD). A control group, i.e. normal epithelium (NE) included the images captured from unchanged epithelium. The results related to the four specified histological classes are summarised in Table 6.6 and example images from the each group are depicted in Fig. 6.9 as well as in (Tarnawski et al., 2008a).
Image Type SCC
TABLE 6.6
Histopathological results Squamous (FigX)
cell
cancer
SD
Severe dysplasia
MD
Mild dysplasia
NE
Normal epithelium
Description of the contact endoscopy image characteristics Disordered, crowded cells, enlarged nuclei, high variable nuclear morphology, increased nucleus to cytoplasm ratio Enlarged nuclei, crowded cells with moderate variable nuclear morphology, increased nucleus to cytoplasm ratio Moderate changes in the size of the selected nuclei, nuclei spatial relationships slightly abnormal, not uniform shape of the nuclei Regular cell formation, uniform round nuclei with the slight size variation
Description of CE images in the four analysed groups of histopathological findings.
For every group, the automatic method of cell nuclei detection described in (Tarnawski and Kurzynski, 2007) was performed. Morphometric analysis was carried out on 26,260 cell nuclei in total. Special software designed for the aim of this study helped to calculate 13 morphometric parameters for each nucleus: • • • • • • • •
area and perimeter area to convex area ratio and perimeter to convex perimeter ratio length, width and length to width ratio (aspect ratio) elongateness coefficient feret shape coefficient blair-bliss shape coefficient ratio of nucleus perimeter to the circle perimeter of the same size nuclei density index (based on the partition of the image space using multiple grids of different size, also called multi-resolution grids). The value of this index is defined as the combination of interpolated and weighted superposition of the multi-resolution values of the nuclei density function. It is calculated for every nucleus (for detailed description of this density function see (Tarnawski and Cichosz, 2008)). This index takes decimal values from zero to one and describes local nuclei distribution and density in the image.
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–25
The area and perimeter of the nuclei are expressed in image pixels. All the remaining parameters, except for the nuclei density index, are shape parameters. All of them are scalar quantities and take decimal values in the range 0-3. Variation analysis selected the four best discriminative parameters for specified histology evaluations as: area, nuclei density index, elongateness coefficient, and area to convex area ratio. A training set, formed from half of the analysed CE images (about 10,000 cell nuclei), was used in the classification phase based on cluster analysis. During this phase, three classes of nuclei were found. The rule bank for nuclei classification was created using learning by examples as described in Section 6.3.5. The second half of nuclei served as test set for the system testing. For minimization of fuzzy rules we used Strategy 1 using soundness degree defined by Eq. 6.34 for every generated rule. In this system, we define the following centroid deffuzification formula to determine the output class for each input pattern: PN j=1 IFRj CRj (6.49) C = PN j=1 IFRj where N is the number of rules, CRj is the class number generated by rule Rj (CRj = 0, 1, . . . , M ) and IFRj is defined as IFRj = Π4t=1 µRj (x(t) ) and µRj (x(t) ) denotes the membership grade of t-th feature in the fuzzy regions that the j-th rule occupies. Nuclei classification results using generated fuzzy rule base are shown in Fig. 6.9 in the right column. The classification with clustering was linguistically described by pathologist from abnormal (class 1) to normal nuclei (class 3). While classes 1 and 3 are easily distinguishable, class 2 shall be considered as an intermediate nuclei category. Since it is more similar to class 1 (pathological cases) than to class 3 (normal) it was considered as an abnormal category. The number of nuclei belonging to class 1 on NE images was low (in the range 0 to 3) and the area occupied by this type of nuclei was less than 1% of the total nuclei area. Therefore, these nuclei were excluded from further analysis. The last step was the verification of the hypothesis that amount of area occupied by nuclei from every class is characteristic for each type of laryngeal lesion. The results of variation analysis showed that the input parameters related to amount of area occupied by every category of nuclei on analysed CE image originate from different distributions. Mean values of area occupied by every category of nuclei in relation to specified histological evaluations are presented in Fig. 6.6. The most significant finding is a low number of nuclei of class 1 (most pathological) on the NE images. This finding increases the diagnostic accuracy between malignant lesion (SCC, SD) and precancerous or normal cases. Our results may also suggest that diagnosis of carcinoma and severe dysplasia can definitely be made when the nuclei in class 1 (i.e. highly pathological) cover more than 5% of the total nuclei area, and when nuclei in class 2 (i.e. moderately pathological) cover more that 40% of total nuclei area. Concluding, the observations from our research have direct implications. Specifically, we have confirmed that practical assessment of the nuclear morphometry for the diagnosis of laryngeal lesions by contact endoscopy is possible and can be improved by fuzzy rule-based system. Another advantage of contact endoscopy is that it can indicate appropriate tissue area for biopsy. The method proposed herein may aid the clinician especially in the initial phase of the learning curve as the valuable contribution.
6.5
Conclusions
In general, in order to define a fuzzy rule-based system for image understanding tasks after pre-processing, feature selection and extraction the following steps have to be taken:
6–26
Rough Fuzzy Image Analysis
1. Construction of the training data set. 2. Generation of suitable set of membership functions manually or from the training patterns. 3. Generation of fuzzy rules with learning technique from training data or from the set of clusters obtained from a clustering algorithm. 4. Optimisation of fuzzy rule base to make the recognition system more practical and to speed up the process of reasoning. 5. Deffuzification scheme to perform the crisp classification. Two techniques for generation of fuzzy rules from numerical training data set were presented in this chapter. The first approach produces a rule base for a fuzzy classifier with weighted training patterns. The weight of an input pattern can be viewed as the cost of its misclassification. Fuzzy rules are then generated by considering the weights and the compatibility of training patterns. In medical diagnosis of cancer, false negatives (diagnosing people with cancer as healthy) could be penalized more than false positives (diagnosing healthy individuals as having tumor). A labelled training set is required, i.e. input patterns represented as the feature vector and outputs as the class label. The input membership functions needed for the antecedent part of fuzzy rules are also defined in advance. Two applications related to computer aided diagnosis of breast cancer described that this weighted fuzzy classifier is capable of providing a high classification accuracy. We also presented a learning method that is based on incremental learning principles where for each generated rules a special value called a grade of certainty is assigned. Proper adjustment of grades of certainty has been shown to lead to improved classification performance and reduced overall costs. The second presented approach, based on clustering and learning by examples is “less supervised” as we do not need the complete labelled training data set. Moreover, we can ask the system for suggestions about the optimal number of classes, although it should be noted that optimal does not always mean better classification. The clustering approach generates an optimal feature space partition using a cluster validity measure. We used this approach for computer-aided diagnosing in contact endoscopy imaging because of the huge amount of input patterns often described in an intuitive way by the expert. Alas, the system performance is very much dependent on the partition of the input and output domain and for this reason the clustering results should be verified by the expert during the rule generation. In this way the system is able to generate a set of membership functions which actually reflects the real data distribution. Another problem usually present in image understanding tasks is that a large number of fuzzy rules are generated due to a large number of training samples. This can be partially solved by limiting the number of clusters with a clustering procedure to receive a smaller set of fuzzy rules or by rule minimisation strategies.
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
6–27
FIGURE 6.9: Input CE images of specified laryngeal lesions: squamous cells (1st row), severe dysplasia (2nd row), mild dysplasia (3rd row), normal epithelium (4th row) with corresponding image analysis results (right column)
6–28
Rough Fuzzy Image Analysis
FIGURE 6.10: Mean values of area occupied by each class of nuclei in relation to the histological diagnosis (SCC - squamous cell cancer; SD - sever dysplasia; MD - mild dysplasia; NE - normal epithelium)
Applications of Fuzzy Rule-based Systems in Medical Image Understanding
Bibliography Anbar, N., L. Milescu, A. Naumov, C. Brown, T. Button, C. Carly, and K. AlDulaimi. 2001. Detection of cancerous breasts by dynamic area telethermometry. IEEE Engineering in Medicine and Biology Magazine 20(5). Bezdek J.C., Pal. S.K. 1992. Fuzzy models for pattern recognition. New York: IEEE Press. Chi, Z., and H. Yan. 1993. Map image segmentation based on thresholding and fuzzy rules,. Electronic Letters 29(21):1841–1843. ———. 1995a. Handwritten numeral recognition using self-organizing maps and fuzzy rules. Pattern Recognition 28(1):59–66. ———. 1995b. Image segmentation using fuzzy rules derived from k-means clusters. Journal of Electronic Imaging 4(2):199–206. Dunn, J.C. 1973. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Cybernetics and Systems 3:95–104. Grabisch, M. 1996. The representation of importance and interaction of features by fuzzy measures. Pattern Recognition Letters 17:567–575. Grabisch, M., and F. Dispot. 1992. A comparison of some methods of fuzzy classification on real data. In 2nd int. conference on fuzzy logic and neural networks, 659–662. Grabisch, M., and J.-M. Nicolas. 1994. Classification by fuzzy integral: performance and tests. Fuzzy Sets and Systems 65(2/3):255–271. Gustafson, E.E., and W.C. Kessel. 1979. Fuzzy clustering with a fuzzy covariance matrix. In Proceedings of the ieee confonfrence on decision and control, 761–766. Hamou, J.E. 1979. Microendoscopy and contact endoscopy. Tech. Rep., International Patent PCT/FR80/0024; US Patent 4,385,810, Paris, Washington D.C. Haralick, R.M. 1979. Statistical and structural approaches to texture. Proceedings of the IEEE 67(5):786–804. Head, J.F., F. Wang, C.A. Lipari, and R.L. Elliott. 2000. The important role of infrared imaging in breast cancer. IEEE Engineering in Medicine and Biology Magazine 19: 52–57. Hoeppner, F., F. Klawonn, and T. Runkler. 1999. Fuzzy cluster analysis, methods for classification, data analysis and image recognition. John Wiley & Sons Ltd. Ishibuchi, H., and T. Nakashima. 1999a. Improving the performance of fuzzy classifier systems for pattern classification problems with continuous attributes. IEEE Trans. on Industrial Electronics 46(6):1057–1068. ———. 1999b. Performance evaluation of fuzzy classifier systems for multi-dimensional pattern classification problems. IEEE Trans. Systems, Man and Cybernetics - Part B: Cybernetics 29:601–618.
6–29
6–30
Rough Fuzzy Image Analysis
Ishibuchi, H., K. Nozaki, and H. Tanaka. 1992. Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy Sets and Systems 52(1): 21–32. Klir, G.J., and B. Yuan. 1995. Fuzzy sets and fuzzy logic. Prentice-Hall. Lee, C.C. 1990. Fuzzy logic in control systems: Fuzzy logic controller part I and part II. IEEE Trans. Systems, Man and Cybernetics 20:404–435. Ng, E.Y.K., and N.M. Sudarshan. 2001. Numerical computation as a tool to aid thermographic interpretation. Journal of Medical Engineering and Technology 25(2): 53–60. Nozaki, K., H. Ishibuchi, and H. Tanaka. 1996. Adaptive fuzzy rule-based classification systems. IEEE Trans. Fuzzy Systems 4(3):238–250. Qi, H., W. E. Snyder, J. F. Head, and R. L. Elliott. 2000. Detecting breast cancer from infrared images by asymmetry analysis. In 22nd IEEE Int. Conference on Engineering in Medicine and Biology. Schaefer, G., T. Nakashima, M. Zavisek, Y. Yokota, A. Drastich, and H. Ishibuchi. 2007. Thermography breast cancer diagnosis by fuzzy classification of statistical image features. In Ieee int. conference on fuzzy systems, 1096–1100. Street, W.N., W.H. Wolberg, and O.L. Mangasarian. 1993. Nuclear feature extraction for breast tumor diagnosis. In Spie international symposium on electronic imaging: Science and technology, volume 1905, vol. 1905, 861–870. Sugeno, M. 1985. An introductory survey of fuzzy control. Information Science 30(1/2):59–83. Sugeno, M., and T. Yasukawa. February 1993. A fuzzy-logic-based approach to qualitative modeling. IEEE Trans. Fuzzy Systems 1(1):7–31. Tarnawski, W., and J. Cichosz. 2008. Fuzzy rule-based system for diagnosis of laryngeal pathology based on contact endoscopy images. In Information technologies in medicine, ed. J. Kacprzyk, E. Pietka, and J. Kawa, vol. 47 of Advances in Soft Computing, 217–224. Berlin: Springer Verlag. Tarnawski, W., M. Fraczek, M. Jelen, T. Krecicki, and M. Zalesska-Krecicka. 2008a. The role of computer-assisted analysis in the evaluation of nuclear characteristics for the diagnosis of precancerous and cancerous lesions by contact laryngoscopy. Advances in Medical Sciences 53(2):221–227. Tarnawski, W., M. Fraczek, T. Krecicki, and M. Jelen. 2008b. Fuzzy rule-based classification system for computer-aided diagnosis in contact endoscopy imaging. In Proceedings of the 5th international conference on soft computing as transdisciplinary science and technology. New York, U.S.A. Tarnawski, W., and M. Kurzynski. 2007. An improved diffusion driven watershed algorithm for image segmentation of cells. Journal of Medical Informatics & Technologies 11:213–220. Tsotos, J.K. 1987. Image understanding, encyclopedia of artificial intelligence. Stuart c. shapiro ed. New York: John Wiley & Sons.
Applications of Fuzzy Rule-based Systems in Medical Image Understanding Uematsu, S. 1985. Symmetry of skin temperature comparing one side of the body to the other. Thermology 1(1):4–7. Wang, J., and J.M. Mendel. 1991. Generating fuzzy rules by learning from examples. In Proceedings of the ieee international symposium on intelligent control, 263–268. Arlington, Viriginia U.S.A. Wang L., Mendel J.M. 1991. Generating fuzzy rules by learning from examples. In Proceedings of the ieee international symposium on intelligent control, 263–268. Arlington, Viriginia U.S.A. Wardrop, P., S. Sim, and K. McLaren. 2000. Contact endoscopy of the larynx: A quantitative study. Journal of Laryngology and Otology 114(6):437–440. Yager, R.R., and D. P. Filev. 1994. Essentials of fuzzy modeling and control. John Wiley & Sons Inc. Zavisek, M., and A. Drastich. 2005. Thermogram classification in breast cancer detection. In 3rd. european and biological conference. 1727-1783. Zheru, Ch., Y. Hong, and Tuan P. 1996. Fuzzy algorithms: With applications to image processing and pattern recognition. World Scientific Publishing. Zimmerman, H.J. 1991. Fuzzy set theory and its applications. Boston: Kluwer.
6–31
7 Near Set Evaluation And Recognition (NEAR) System 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Near sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perceptual Tolerance relation
•
7–1 7–2
Nearness Measure
7.3 Perceptual Image Processing . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Probe functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7–7 7–8
Average greyscale value • Normalized RGB • Shannon’s entropy • Pal’s entropy • Edge based probe functions
Christopher Henry Computational Intelligence Laboratory, Electrical & Computer Engineering, Rm. E2-390 EITC Bldg., 75A Chancellor’s Circle, University of Manitoba, Winnipeg R3T 5V6 Manitoba Canada
7.1
7.5 Equivalence class frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Tolerance class frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Segmentation evaluation frame . . . . . . . . . . . . . . . . . . . . . . 7.8 Near image frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 Feature display frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7–12 7–14 7–15 7–17 7–18 7–18 7–19
Introduction
Near sets introduced in (Peters, 2007b,c; Henry and Peters, 2009d), elaborated in (Peters and Wasilewski, 2009; Peters, 2009c, 2010; Peters and Wasilewski, 2010) and their applications (Peters, 2009b,c; Peters and Ramanna, 2009; Henry and Peters, 2007, 2008; Hassanien, Abraham, Peters, Schaefer, and Henry, 2009; Peters, Shahfar, Ramanna, and Szturm, 2007a; Peters and Ramanna, 2007; Gupta and Patnaik, 2008; Henry and Peters, 2009c; Fashandi, Peters, and Ramanna, 2009; Meghdadi, Peters, and Ramanna, 2009; Peters and Puzio, 2009) grew out of a generalization of the approach to the classification of objects proposed by Z. Pawlak during the early 1980s (see, e.g., (Pawlak, 1981, 1982), elaborated in (Pawlak and Skowron, 2007a,b,c)), E. Orlowska’s suggestion that approximation spaces are the formal counterpart of perception or observation (Orlowska, 1982), and a study of the nearness of objects (Peters, Skowron, and Stepaniuk, 2006, 2007b). This chapter introduces the NEAR system (available for downloading for free at (Peters, 2009a)), an application implemented to demonstrate and visualize concepts from near set theory reported in (Henry and Peters, 2007; Peters, 2007a,c; Henry and Peters, 2008; Peters, 2008; Peters and Ramanna, 2009; Peters, 2009b; Henry and Peters, 2009c; Peters and Wasilewski, 2009; Peters, 2009c; Hassanien et al., 2009; Henry and Peters, 2009b). The NEAR system implements a Multiple Document Interface (MDI) (see, e.g., Fig. 7.1) where each separate processing task is performed in its own child frame. The objects (in the 7–1
7–2
Rough Fuzzy Image Analysis
near set sense) in this system are subimages of the images being processed and the probe functions (features) are image processing functions defined on the subimages. The system was written in C++ and was designed to facilitate the addition of new processing tasks and probe functions∗ . Currently, the system performs five major tasks: displaying equivalence and tolerance classes for an image; performing segmentation evaluation; measuring the nearness of two images; and displaying the output of processing an image using an individual probe functions. This report is organized as follows: Section 7.2 gives some background on near set theory, and Section 7.3 demonstrates the application of near set theory to images. Finally, Sections 7.5-7.8 describe the operation of the GUI.
FIGURE 7.1: NEAR system GUI.
7.2
Near sets
Near set theory focuses on sets of perceptual objects with matching descriptions. The discovery of near sets begins with the selection of probe functions that provide a basis for describing and discerning affinities between sample objects. A probe function is a realvalued function representing features of physical objects. Specifically, let O represent the set of all objects. The description of an object x ∈ O is given by φB (x) = (φ1 (x), φ2 (x), . . . , φi (x), . . . , φl (x)),
∗ Parts
of the Graphical User Interface (GUI) were inspired by the GUI reported in (Christoudias, Georgescu, and Meer, 2002) and the wxWidgets example in (wxWidgets, 2009).
Near Set Evaluation And Recognition (NEAR) System
7–3
where l is the length of the description, and each φi : O −→ R in B is a probe function that represents a feature used in the description of the object x (Pavel, 1983). Furthermore, a set F contains all the probe functions used to describe an object x with B ⊆ F. Next, a perceptual information system S can be defined as S = hO, F, {V alφi }φi ∈F i, where F is the set of all possible probe functions that take as the domain objects in O, and {V alφi }φi ∈F is the value range of a function φi ∈ F. For simplicity, a perceptual system is abbreviated as hO, Fi when the range of the probe functions is understood. It is the notion of a perceptual system that is at the heart of the following definitions. Definition 1 Perceptual Indiscernibility Relation (Peters, 2009c). Let hO, Fi be a perceptual system. For every B ⊆ F the indiscernibility relation ∼B is defined as follows: ∼B = {(x, y) ∈ O × O : k φB (x) − φB k= 0}, where k·k represents the l2 norm. If B = {φ} for some φ ∈ F, instead of writing ∼{φ} , we write ∼φ . Defn. 1 is a refinement of the original indiscernibility relation given by Pawlak in 1981 (Pawlak, 1981). Using the indiscernibility relation, objects with matching descriptions can be grouped together forming granules of highest object resolution determined by the probe functions in B. This gives rise to an elementary set (also called an equivalence class) x/∼B = {x0 ∈ O | x0 ∼B x}, defined as a set where all objects have the same description. Similarly, a quotient set is the set of all elementary sets defined as O/∼B = {x/∼B | x ∈ O}.
(7.1)
Defn. 1 provides the framework for comparisons of sets of objects by introducing a concept of nearness within a perceptual system. Sets can be considered near each other when they have “things” in common. In the context of near sets, the “things” can be quantified by granules of a perceptual system, i.e., the elementary sets. The simplest example of nearness between sets sharing “things” in common is the case when two sets have indiscernible elements. This idea leads to the definition of a weak nearness relation. Definition 2 Weak Nearness Relation (Peters, 2009c) Let hO, Fi be a perceptual system and let X, Y ⊆ O. A set X is weakly near to a set Y within the perceptual system hO, Fi (X./F Y ) iff there are x ∈ X and y ∈ Y and there is B ⊆ F such that x ∼B y. In the case where sets X, Y are defined within the context of a perceptual system as in Defn 2, then X, Y are weakly near each other. An example of Defn. 2 is given in Fig. 7.2 where the grey lines represent equivalence classes. The sets X and Y are weakly near each other in Fig. 7.2 because they both share objects belonging to the same equivalence class.
7.2.1
Perceptual Tolerance relation
When dealing with perceptual objects (especially, components in images), it is sometimes necessary to relax the equivalence condition of Defn. 1 to facilitate observation of associations in a perceptual system. This variation is called a perceptual tolerance relation that defines yet another form of near sets (Peters and Ramanna, 2009; Peters, 2009b,c). A tolerance
7–4
Rough Fuzzy Image Analysis
O/∼B X Y FIGURE 7.2: Example of Defn. 2.
space-based approach to perceiving image resemblances hearkens back to the observation about perception made by Ewa Orlowska in 1982 (Orlowska, 1982) (see, also, (Orlowska, 1985)), i.e., classes defined in an approximation space serve as a formal counterpart of perception. The term tolerance space was coined by E.C. Zeeman in 1961 in modeling visual perception with tolerances (Zeeman, 1962). A tolerance space is a set X supplied with a binary relation ∼ = x) = ⊂ X × X) that is reflexive (for all x ∈ X, x ∼ = (i.e., a subset ∼ and symmetric (i.e., for all x, y ∈ X, x ∼ = is not = x) but transitivity of ∼ = y implies y ∼ required. The tolerance is directly related to the exact idea of closeness or resemblance (i.e., being within some tolerance) in comparing objects. The basic idea is to find objects such as images that resemble each other with a tolerable level of error. Sossinsky (Sossinsky1986, 1986) observes that the main idea underlying tolerance theory comes from Henri Poincar´e (Poincar´e, 1913). Physical continua (e.g., measurable magnitudes in the physical world of medical imaging (Hassanien et al., 2009)) are contrasted with the mathematical continua (real numbers) where almost solutions are common and a given equation have no exact solutions. An almost solution of an equation (or a system of equations) is an object which, when substituted into the equation, transforms it into a numerical ‘almost identity’, i.e., a relation between numbers which is true only approximately (within a prescribed tolerance) (Sossinsky1986, 1986). Equality in the physical world is meaningless, since it can never be verified either in practice or in theory. Hence, the basic idea in a tolerance space view of images is to replace the indiscerniblity relation in rough sets (Pawlak, 1982) with a tolerance relation. The use of image tolerance spaces in this work is directly related to recent work on tolerance spaces (see, e.g., (Hassanien et al., 2009; Peters, 2009c,b; Peters and Ramanna, 2009; Bartol, Mir´ o, Pi´ oro, and Rossell´o, 2004; Gerasin, Shlyakhov, and Yakovlev, 2008; Schroeder and Wright, 1992; Shreider, 1970; Skowron and Stepaniuk, 1996; Zheng, Hu, and Shi, 2005)). Formally, the Perceptual Tolerance Nearness Relation is defined in Defn. 3. Definition 3 Perceptual Tolerance Nearness Relation (Peters, 2009c) Let hO, Fi be a perceptual system and let ∈ R. For every B ⊆ F the tolerance relation ∼ =B is defined as follows: ∼ =B,ε = {(x, y) ∈ O × O : k φB (x) − φB (y) k≤ ε}. ∼{φ} we write ∼ If B = {φ} for some φ ∈ F, instead of = =φ . Further, for notational convenience, we will write ∼ =B, with the understanding that ε is inherent to the =B instead of ∼ definition of the tolerance relation. As in the case with the perceptual indiscernibility relation, a tolerance class can be defined
7–5
Near Set Evaluation And Recognition (NEAR) System as 0 ∼ 0 x/∼ =B = {y ∈ O | y =B x ∀ x ∈ x/∼ =B }.
(7.2)
Note, Defn. 3 covers O instead of partitioning it because an object can belong to more than one class. As a result, Eq. 7.2 is called a tolerance class instead of an elementary set. In addition, each pair of objects x, y in a tolerance class x/∼ =B must satisfy the condition k φ(x) − φ(y) k≤ . Next, a quotient set for a given a tolerance relation is the set of all tolerance classes and is defined as O/∼ =B = {x/∼ =B | x ∈ O}. Notice that the tolerance relation is a generalization of the perceptual indiscernibility relation given in Defn. 1 (obtained by setting = 0). As a result, Defn. 2 can be redefined with respect to the tolerance relation ∼ =B,ε ∗ . The following simple example highlights the need for a tolerance relation as well as demonstrates the construction of tolerance classes from real data. Consider the 20 objects in Table 7.1 where |φ(xi )| = 1. Letting = 0.1 gives the following tolerance classes: X/ ∼ =B
= {{x1 , x8 , x10 , x11 }, {x1 , x9 , x10 , x11 , x14 }, {x2 , x7 , x18 , x19 }, {x3 , x12 , x17 }, {x4 , x13 , x20 }, {x4 , x18 }, {x5 , x6 , x15 , x16 }, {x5 , x6 , x15 , x20 }, {x6 , x13 , x20 }}
Observe that each object in a tolerance class satisfies the condition k φB (x) − φB (y) k≤ , and that almost all of the objects appear in more than one class. Moreover, there would be twenty classes if the indiscernibility relation was used since there are no two objects with matching descriptions. TABLE 7.1
7.2.2
Tolerance Class Example
xi
φ(x)
xi
φ(x)
xi
φ(x)
xi
φ(x)
x1 x2 x3 x4 x5
.4518 x6 .9166 x7 .1398 x8 .7972 x9 .6281 x10
.6943 .9246 .3537 .4722 .4523
x11 x12 x13 x14 x15
.4002 .1910 .7476 .4990 .6289
x16 x17 x18 x19 x20
.6079 .1869 .8489 .9170 .7143
Nearness Measure
2
A l norm-based Nearness Measure (NM) is useful in discerning resemblances between images (Henry and Peters, 2009a,b; Peters, 2009b,c, 2010; Peters and Ramanna, 2009), and can be defined between two sets X and Y using Defn. 2 (Hassanien et al., 2009; Henry and
∗ The
relations was treated separately in the interest of clarity.
7–6
Rough Fuzzy Image Analysis
Peters, 2009a,b). Let Z = X ∪ Y and let the notation [z/∼ =B ]X = {z ∈ z/∼ =B | z ∈ X}, denote the portion of the tolerance class z/∼ =B that belongs to X, and similarly, use the notation [z/∼ =B ]Y = {z ∈ z/∼ =B | z ∈ Y }, to denote the portion that belongs to Y . Further, let the sets X and Y be weakly near each other using Defn. 2. Then, a NM between X and Y is given in (Henry and Peters, 2009a,b) by N M∼ =B (X, Y ) =
X
!−1 |z∼ =B |
z∼ =B =B ∈Z∼
X z∼ =B =B ∈Z∼
|z∼ =B |
min(|[z∼ =B ]X |, |[z∼ =B ]Y |) max(|[z∼ =B ]X |, |[z∼ =B ]Y |)
(7.3)
The idea behind Eq. 7.3 is that similar sets should contain a similar number of objects in each tolerance class. Thus, for each tolerance class obtained from Z = X ∪Y , Eq. 7.3 counts the number of objects that belong to X and Y and takes the ratio (as a proper fraction) of their cardinalities. Furthermore, each ratio is weighted by the total size of the tolerance class (thus giving importance to the larger classes) and the final result is normalized by dividing by the sum of all the cardinalities. The range of Eq. 7.3 is in the interval [0,1], where a value of 1 is obtained if the sets are equivalent and a value of 0 is obtained if they have no elements in common. As an example of the degree of nearness between two sets, consider Fig. 7.3 in which each image consists of two sets of objects, X and Y . Each color in the figures corresponds to an elementary set where all the objects in the class share the same description. The idea behind Eq. 7.3 is that the nearness of sets in a perceptual system is based on the cardinality of the classes that they share. Thus, the sets in Fig. 7.3a are closer (more near) to each other in terms of their descriptions than the sets in Fig. 7.3b.
(7.3a) Sample low degree of nearness
(7.3b) Sample high degree of nearness
FIGURE 7.3: (See color insert) Examples of degree of nearness between two sets: (a) High degree of nearness, and (b) low degree of nearness (Henry and Peters, 2009a).
7–7
Near Set Evaluation And Recognition (NEAR) System
7.3
Perceptual Image Processing
Near set theory can be easily applied to images; for example, define a RGB image as f = {p1 , p2 , . . . , pT }, where pi = (c, r, R, G, B)T , c ∈ [1, M ], r ∈ [1, N ], R, G, B ∈ [0, 255], and M, N respectively denote the width and height of the image, and M × N = T . Further, define a square subimage as fi ⊂ f with the following conditions: f1 ∩ f2 . . . ∩ fs = ∅, f1 ∪ f2 . . . ∪ fs = f,
(7.4)
where s is the number of subimages in f . The approach taken in the NEAR system is to restrict all subimages to be square except when doing so violates Eq. 7.4. For example, the images in the Berkeley Segmentation Dataset (Martin, Fowlkes, Tal, and Malik, 2001) often have the dimension 321 × 481. Consequently, a square subimage size of 25 will produce 6240 square subimages, 96 subimages of size 1 × 5, 64 subimages of size 5 × 1 and 1 subimage consisting of a single pixel. Next, O can be defined as the set of all subimages, i.e., O = {f1 , . . . , fs }, and F is a set of functions that operate on images (see Section 7.4 for examples of probe functions used in the NEAR system or (Marti, Freixenet, Batlle, and Casals, 2001) for other examples). Once the set B has been selected, the elementary sets are simply created by grouping all objects with the same description, and the quotient set is made up of all the elementary sets. Finally, a simple example of these concepts is given in Fig. 7.4 where the left image contains an octagon with a radius of 100 pixels located at the centre of the 400 × 400 image, and the right image contains the elementary sets obtained using B = {φavg (fs )} and a subimage size of 10 × 10.
(7.4a)
(7.4b)
FIGURE 7.4: Example of near set theory in the context of image processing: (a) Original image, and (b) elementary sets obtained from (a) using φavg (fs ).
Observe that three elementary sets are obtained in Fig. 7.4b, namely, the blue background, the orange octagon interior, and the green squares along the diagonals. The green squares are created by subimages that contain both black and white pixels (in the original image) and are located only on the diagonals due to the subimage size and shape, and the position and radius of the hexagon. All other subimages are uniformly white or black. Thus, we are presented with perceptual information in the form of three equivalence classes when restricted to only being able to describe the original image with the probe function B = {φavg (fs )} and a subimage size of 10 × 10. This example clearly demonstrates that
7–8
Rough Fuzzy Image Analysis
perceptual information obtained from the application of near set theory is represented by the elementary sets (formed by the grouping of objects with similar descriptions), and the information gained is always presented with respect to the probe functions contained in B.
7.4
Probe functions
This section describes the probe functions used in the NEAR system, and gives example NEAR system output images processed using these probe functions.
7.4.1
Average greyscale value
Conversion from a RGB image to greyscale is accomplished using Magick++, the objectorientated C++ API to the ImageMagick image-processing library (Magick++, 2009). First, an RGB image is converted to greyscale using Gr = 0.299R + 0.587G + 0.114B,
(7.5)
and then the values are averaged over each subimage. An example is given in Fig. 7.5.
(7.5a)
(7.5b) average greyscale over subimages of size 5 × 5
(7.5c)
FIGURE 7.5: Example of average greyscale probe function: (a) Original image (Weber, 1999), (b) average greyscale over subimages of size 5 × 5, and (c) average greyscale over subimages of size 10 × 10.
7.4.2
Normalized RGB
The normalized RGB values is a feature described in (Marti et al., 2001), and the formula is given by X , NX = RT + GT + BT where the values RT , GT , and BT are respectively the sum of R, G, B components of the pixels in each subimage, and X ∈ [RT , GT , BT ]. See Fig. 7.6 for an example using this probe function. Note, these images were produces by finding the normalized value and multiplying it by 255.
7.4.3
Shannon’s entropy
7–9
Near Set Evaluation And Recognition (NEAR) System
(7.6a)
(7.6b)
(7.6c)
FIGURE 7.6: Example of normalized RGB probe function: (a) Original image (Martin et al., 2001), (b) normalized R over subimages of size 5 × 5, and (c) normalized R over subimages of size 10 × 10.
Shannon introduced entropy (also called information content) as a measure of the amount of information gained by receiving a message from a finite codebook of messages (Pal and Pal, 1991). The idea was that the gain of information from a single message is proportional to the probability of receiving the message. Thus, receiving a message that is highly unlikely gives more information about the system than a message with a high probability of transmission. Formally, let the probability of receiving a message i of n messages be pi , then the information gain of a message can be written as ∆I = log(1/pi ) = − log(pi ),
(7.6)
and the entropy of the system is the expected value of the gain and is calculated as H=−
n X
pi log(pi ).
i=1
This concept can easily be applied to the pixels of a subimage. First, the subimage is converted to greyscale using Eq. 7.5. Then, the probability of the occurrence of grey level i can be defined as pi = hi /Ts , where hi is the number of pixels that take a specific grey level in the subimage, and Ts is the total number of pixels in the subimage. Information content provides a measure of the variability of the pixel intensity levels within the image and takes on values in the interval [0, log2 L] where L is the number of grey levels in the image. A value of 0 is produced when an image contains all the same intensity levels and the highest value occurs when each intensity level occurs with equal frequency (Seemann, 2002). An example of this probe function is given in Fig. 7.7. Note, these images were formed by multiplying the value of Shannon’s entorpy by 32 since L = 256 (thus giving a maximum value of 8).
7.4.4
Pal’s entropy
Work in (Pal and Pal, 1991, 1992) shows that Shannon’s definition of entropy has some limitations. Shannon’s definition of entropy suffers from the following problems: it is undefined when pi = 0; in practise the information gain tends to lie at the limits of the interval [0, 1]; and statistically speaking, a better measure of ignorance is 1 - pi rather than 1/pi (Pal and Pal, 1991). As a result, a new definition of entropy can be defined with the following desirable properties: P1: ∆I(pi ) is defined at all points in [0, 1].
7–10
Rough Fuzzy Image Analysis
(7.7a)
(7.7b)
(7.7c)
FIGURE 7.7: Example of Shannon’s entropy applied to images: (a) Original image (Martin et al., 2001), (b) Shannon’s entropy applied to subimages of size 5 × 5, and (c) Shannon’s entropy applied to subimages of size 10 × 10. P2: P3: P4: P5: P6: P7:
limpi →0 ∆I(pi ) = ∆I(pi = 0) = k1 , k1 > 0 and finite. limpi →1 ∆I(pi ) = ∆I(pi = 1) = k2 , k2 > 0 and finite. k2 < k1 . With increase in pi , ∆I(pi ) decreases exponentially. ∆I(p) and H, the entropy, are continuous for 0 ≤ p ≤ 1. H is maximum when all pi ’s are equal, i.e. H(p1 , . . . , pn ) ≤ H(1/n, . . . , 1/n).
With these in mind, (Pal and Pal, 1991) defines the gain in information from an event as ∆I(pi ) = e(1−pi ) , which gives a new measure of entropy as H=
n X
pi e(1−pi ) .
i=1
Pal’s version of entropy is given in Fig. 7.8. Note, these images were formed by first converting the original image to greyscale, calculating the entropy for each subimage, and multiplying this value by 94 (since the maximum of H is e1−1/256 ).
(7.8a)
(7.8b)
(7.8c)
FIGURE 7.8: Example of Pal’s entropy applied to images: (a) Original image (Martin et al., 2001), (b) Pal’s entropy applied to subimages of size 5 × 5, and (c) Pal’s entropy applied to subimages of size 10 × 10.
7.4.5
Edge based probe functions
7–11
Near Set Evaluation And Recognition (NEAR) System
The edge based probe functions integrated in the NEAR system incorporate an implementation of Mallat’s Multiscale edge detection method based on Wavelet theory (Mallat and Zhong, 1992). The idea is that edges in an image occur at points of sharp variation in pixel intensity. Mallat’s method calculates the gradient of a smoothed image using Wavelets, and defines edge pixels as those that have locally maximal gradient magnitudes in the direction of the gradient. Formally, define a 2-D smoothing function θ(x, y) such that its integral over x and y is equal to 1, and converges to 0 at infinity. Using the smoothing function, one can define the functions ψ 1 (x, y) =
∂θ(x, y) ∂x
and ψ 2 (x, y) =
∂θ(x, y) , ∂y
which are, in fact, wavelets given the properties of θ(x, y) mentioned above. Next, the dilation of a function by a scaling factor s is defined as ξs (x, y) =
1 x y ξ( , ). s2 s s
Thus, the dilation by s of ψ 1 , and ψ 2 is given by ψs1 (x, y) =
1 1 ψ (x, y) s2
and ψs2 (x, y) =
1 2 ψ (x, y). s2
Using these definitions, the wavelet transform of f (x, y) ∈ L2 (R2 ) at the scale s is given by Ws1 f (x, y) = f ∗ ψs1 (x, y)
and Ws2 f (x, y) = f ∗ ψs2 (x, y),
which can also be written as 1 ∂ Ws f (x, y) ∂x (f ∗ θs )(x, y) = s∇(f ~ ∗ θs )(x, y). = s ∂ Ws2 f (x, y) ∂y (f ∗ θs )(x, y) Finally, edges can be detected by calculating the modulus and angle of the gradient vector defined respectively as p Ms f (x, y) = |Ws1 f (x, y)|2 + |Ws2 f (x, y)|2 and
As f (x, y) = argument(Ws1 f (x, y) + iWs2 f (x, y)),
and then finding the modulus maximum defined as pixels with modulus greater than the two neighbours in the direction indicated by As f (x, y) (see (Mallat and Zhong, 1992) for specific implementation details). Examples of Mallatt’s edge detection method obtained using the NEAR system are given in Fig. 7.9. Edge present
This prob function simply returns true if there is an edge pixel contained in the subimage (see, e.g., Fig. 7.10). Number of edge pixels
This probe function returns the total number of pixels in a subimage belonging to an edge (see, e.g., Fig. 7.11).
7–12
Rough Fuzzy Image Analysis
(7.9a)
(7.9b)
(7.9c)
(7.9d)
FIGURE 7.9: (See color insert) Example of NEAR system edge detection using Mallat’s method: (a) Original image, (b) edges obtained from (a), (c) original image, and (d) obtained from (c).
(7.10a)
(7.10b)
(7.10c)
FIGURE 7.10: Example of edge present probe function: (a) Edges obtained from Fig. 7.5a, (b) Application to image with subimages of size 5 × 5, and (c) Application to image with subimages of size 10 × 10.
Edge orientation
This probe function returns the average orientation of subimage pixels belonging to an edge (see, e.g., Fig. 7.12).
7.5
Equivalence class frame
7–13
Near Set Evaluation And Recognition (NEAR) System
(7.11a)
(7.11b)
(7.11c)
FIGURE 7.11: Example of number of edge pixels probe function: (a) Original image, (b) Application to image with subimages of size 5 × 5, and (c) Application to image with subimages of size 10 × 10.
(7.12a)
(7.12b)
(7.12c)
FIGURE 7.12: Example of average orientation probe function: (a) Original image, (b) Application to image with subimages of size 5 × 5, and (c) Application to image with subimages of size 10 × 10.
This frame calculates equivalence classes using the Indiscernibility relation of Defn. 1, i.e., given an image X, it will calculate X/∼B where the objects are subimages of X. See Section 7.3 for an explanation of the theory used to obtain these results. A sample calculation using this frame is given in Fig. 7.13 and was obtained by the following steps: 1. 2. 3. 4.
Click Load Image button. Select number of features (maximum allowed is four). Select features (see Section 7.4 for a list of probe functions). Select window size. The value is taken as the square root of the area for a square subimage, e.g., a value of 5 creates a subimage of 25 pixels. 5. Click Run. The result is given in Fig. 7.13 where the bottom left window contains an image of the equivalence classes where each colour represents a single class. The bottom right window is used to display equivalence classes by clicking in any of the three images. The coordinates
7–14
Rough Fuzzy Image Analysis
FIGURE 7.13: Sample run of the equivalence class frame using a window size of 5 × 5 and B = {φNormG , φHShannon }.
of the mouse click determine the equivalence class that is displayed. The results may be saved by clicking on the save button.
7.6
Tolerance class frame
This frame calculates tolerance classes using the Tolerance relation of Defn. 3, i.e., given an image X, it will calculate X/∼ =B where the objects are subimages of X. This approach is similar to the one given in Section 7.3 with the exception that Defn. 1 is replaced with Defn. 3. A sample calculation using this frame is given in Fig. 7.14 and was obtained by the following steps: 1. 2. 3. 4.
Click Load Image button. Select number of features (maximum allowed is four). Select features (see Section 7.4 for a list of probe functions). Select window size. The value is taken as the square root of the area for a square subimage, e.g., a value of 5 creates a subimage of 25 pixels. 5. Select , a value in the interval [0, 1]. 6. Click Run. The result is given in Fig. 7.14 where the left side is the original image, and the right side is used to display the tolerance classes. Since the tolerance relation does not partition an image, the tolerance classes are displayed upon request. For instance, by clicking on either of the two images, all the tolerance classes are displayed that are within of the subimage containing the coordinates of the mouse click. Further, the subimage containing the mouse click is coloured black.
Near Set Evaluation And Recognition (NEAR) System
7–15
FIGURE 7.14: Sample run of the tolerance class frame using a window size of 10 × 10, B = {φNormG , φHShannon }, and ε = 0.05.
7.7
Segmentation evaluation frame
This frame performs segmentation evaluation using perceptual morphology as described in (Henry and Peters, 2008, 2009c), where the evaluation is labelled the Near Set Index (NSI). Briefly, the NSI uses perceptual morphology (a form of morphological image processing based on traditional mathematical morphology (Henry and Peters, 2009c)) to evaluate the quality of an image segmentation. As is given in (Henry and Peters, 2009c), the perception-based dilation is defined as A ⊕ B = {x/∼B ∈ B | x/∼B ∩ A 6= ∅}, and the perception-based erosion is defined as [ A B = {x/∼B ∩ A}, x/∼B ∈B
where the set A ⊆ O is selected such that it has some a priori perceptual meaning associated with it, i.e. this set has definite meaning in a perceptual sense outside of the probe functions in B. Furthermore, the structuring element B is the quotient set given in Eq. 7.1, i.e., B = O/∼B ∗ . As was reported in (Henry and Peters, 2009c), the quotient set is used as the SE in perceptual morphology, since it contains the perceptual information necessary to augment the set A in a perceptually meaningful way. This perceptual information is in the form of elementary sets (collections of objects with the same descriptions) since, we perceive
∗ The quotient set is being relabelled only to be notationally consistent with traditional mathematical morphology.
7–16
Rough Fuzzy Image Analysis
objects by the features that describe them and that people tend to grasp not single objects, but classes of them (Orlowska, 1982). For instance, given a set of probe functions B, and an image A, this frame can perform the perceptual erosion or dilation using B = O/∼B as the SE. Also, the NSI is calculated if perceptual erosion was selected. A sample calculation using this frame is given in Fig. 7.15 and was obtained by the following steps:
FIGURE 7.15: Sample run of the segmentation evaluation frame using a window size of 2 × 2, and B = {φNormG , φHShannon }.
1. Click Load Image & Segment button. 2. Select an image click Open. 3. Select segmentation image and click Open. Image should contain only one segment and the segment must be white (255, 255, 255) and the background must be black (0, 0, 0). The image is displayed in the top frame, while the segment is displayed in the bottom right (make sure this is the case). 4. Select number of features (maximum allowed is four). 5. Select features (see Section 7.4 for a list of probe functions). 6. Select window size. The value is taken as the square root of the area for a square subimage, e.g., a value of 5 creates a subimage of 25 pixels. 7. Click Erode to perform perceptual erosion and segmentation evaluation. Click Dilate to perform perceptual dilation (no evaluation takes place during dilation). The result is given in Fig. 7.15 where the bottom left window contains the an image of the equivalence classes where each colour represents a different class. The bottom right window contains either the segments erosion or dilation. Clicking on any of the three images will display the equivalence class containing the mouse click in the bottom right image. The
Near Set Evaluation And Recognition (NEAR) System
7–17
NSI is also displayed on the left hand side.
7.8
Near image frame
This frame is used to calculate the nearness of two images using the nearness measure from Eq. 7.3 defined in Section 7.2. A sample calculation using this frame is given in Fig. 7.16 and was obtained by the following steps:
FIGURE 7.16: Sample run of the near image frame using a window size of 10 × 10, B = {φNormG , φHShannon }, and ε = 0.05.
1. 2. 3. 4.
Click Load Images button and select two images. Select number of features (maximum allowed is four). Select features (see Section 7.4 for a list of probe functions). Select window size. The value is taken as the square root of the area for a square subimage, e.g., a value of 5 creates a subimage of 25 pixels. 5. Select ε, a value in the interval [0, 1]. 6. Click Run. The result is given in Fig. 7.16 where the left side contains the first image, and the right side contains the second image. Clicking in any of the two images will display the tolerance classes from both images near to the subimage selected by the mouse click. The subimage matching the coordinates of the mouse click is coloured black and all subimages that are near to the black subimage are displayed using a different colour for each class. The NM is also displayed on the left hand side.
7–18
Rough Fuzzy Image Analysis
7.9
Feature display frame
This frame is used to display the output of processing an image with a specific probe function. A sample calculation using this frame is given in Fig. 7.17 and was obtained by the following steps:
FIGURE 7.17: Sample run of the feature display frame.
1. 2. 3. 4.
Click Load Image button and select an image. Select features (see Section 7.4 for a list of probe functions). Select probe function Click Display feature.
7.10
Conclusion
This chapter has presented details on the NEAR system available for download at (Peters, 2009a). Specifically, it has presented background on near set theory, introduced some useful features in image processing, and systematically discussed all functions of the NEAR system. This tool has proved to be vital in the study of near set theory. By design, the system is modular and easily adaptable, as can be seen by the varied results reported in (Henry and Peters, 2007; Peters, 2007a,c; Henry and Peters, 2008; Peters, 2008; Peters and Ramanna, 2009; Peters, 2009b; Henry and Peters, 2009c; Peters and Wasilewski, 2009; Peters, 2009c, 2010; Hassanien et al., 2009; Henry and Peters, 2009b). Future work will focus on improvements for measuring image similarity, as well as the ability to compare images from databases for use in image retrieval.
7–19
Near Set Evaluation And Recognition (NEAR) System
Bibliography Bartol, W., J. Mir´ o, K. Pi´ oro, and F. Rossell´o. 2004. On the coverings by tolerance classes. Information Sciences 166(1-4):193–211. Christoudias, C., B. Georgescu, and P. Meer. 2002. Synergism in low level vision. In Proceedings of the 16th international conference on pattern recognition, vol. 4, 150–156. Quebec City. Fashandi, H., J. F. Peters, and S. Ramanna. 2009. l2 norm lenght-based image similarity measures: Concrescence of image feature histogram distances 178–185. Gerasin, S. N., V. V. Shlyakhov, and S. V. Yakovlev. 2008. Set coverings and tolerance relations. Cybernetics and System Analysis 44(3):333–340. Gupta, S., and K. Patnaik. 2008. Enhancing performance of face recognition systems by using near set approach for selecting facial features. Journal of Theoretical and Applied Information Technology 4(5):433–441. Hassanien, A. E., A. Abraham, J. F. Peters, G. Schaefer, and C. Henry. 2009. Rough sets and near sets in medical imaging: A review. IEEE Transactions on Information Technology in Biomedicine 13(6):955–968. Digital object identifier: 10.1109/TITB.2009.2017017. Henry, C., and J. F. Peters. 2007. Image pattern recognition using approximation spaces and near sets. In Proceedings of the eleventh international conference on rough sets, fuzzy sets, data mining and granular computer (rsfdgrc 2007), joint rough set symposium (jrs07), lecture notes in artificial intelligence, vol. 4482, 475–482. ———. 2008. Near set index in an objective image segmentation evaluation framework. In Proceedings of the geographic object based image analysis: Pixels, objects, intelligence, to appear. University of Calgary, Alberta. ———. 2009a. Near set evaluation and recognition (near) system. Tech. Rep., Computational Intelligence Laboratory, University of Manitoba. UM CI Laboratory Technical Report No. TR-2009-015. ———. 2009b. Perception based image classification. Tech. Rep., Computational Intelligence Laboratory, University of Manitoba. UM CI Laboratory Technical Report No. TR-2009-016. ———. 2009c. Perceptual image analysis. International Journal of Bio-Inspired Computation 2(2):to appear. Henry, C., and J.F. Peters. 2009d. http://en.wikipedia.org/wiki/Near sets.
Near
sets.
Wikipedia.
Magick++. 2009. Imagemagick image-processing library. www.imagemagick.org. Mallat, S., and S. Zhong. 1992. Characterization of signals from multiscale edges. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(7):710–732. Marti, J., J. Freixenet, J. Batlle, and A. Casals. 2001. A new approach to outdoor scene description based on learning and top-down segmentation. Image and Vision Computing 19:1041–1055.
7–20
Rough Fuzzy Image Analysis
Martin, D., C. Fowlkes, D. Tal, and J. Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the 8th international conference on computer visison, vol. 2, 416–423. Meghdadi, A. H., J. F. Peters, and S. Ramanna. 2009. Tolerance classes in measuring image resemblance. Intelligent Analysis of Images & Videos, KES 2009, Part II, Knowledge-Based and Intelligent Information and Engineering Systems, LNAI 5712 127–134. ISBN 978-3-64-04591-2, doi 10.1007/978-3-642-04592-9 16. Orlowska, E. 1982. Semantics of vague concepts. applications of rough sets. Tech. Rep. 469, Institute for Computer Science, Polish Academy of Sciences. ———. 1985. Semantics of vague concepts. In Foundations of logic and linguistics. problems and solutions, ed. G. Dorn and P. Weingartner, 465–482. London/NY: Plenum Pres. Pal, N. R., and S. K. Pal. 1991. Entropy: A new definition and its applications. IEEE Transactions on Systems, Man, and Cybernetics 21(5):1260 – 1270. ———. 1992. Some properties of the exponential entropy. Information Sciences 66: 119–137. Pavel, M. 1983. “shape theory” and pattern recognition. Pattern Recognition 16(3): 349–356. Pawlak, Z. 1981. Classification of objects by means of attributes. Tech. Rep. PAS 429, Institute for Computer Science, Polish Academy of Sciences. ———. 1982. Rough sets. International Journal of Computer and Information Sciences 11:341–356. Pawlak, Z., and A. Skowron. 2007a. Rough sets and boolean reasoning. Information Sciences 177:41–73. ———. 2007b. Rough sets: Some extensions. Information Sciences 177:28–40. ———. 2007c. Rudiments of rough sets. Information Sciences 177:3–27. Peters, J. F. 2007a. Classification of objects by means of features. In Proceedings of the ieee symposium series on foundations of computational intelligence (ieee scci 2007), 1–8. Honolulu, Hawaii. ———. 2007b. Near sets. general theory about nearness of objects. Applied Mathematical Sciences 1(53):2609–2629. ———. 2007c. Near sets. special theory about nearness of objects. Fundamenta Informaticae 75(1-4):407–433. ———. 2008. Classification of perceptual objects by means of features. International Journal of Information Technology & Intelligent Computing 3(2):1 – 35. ———. 2009a. Computational intelligence laboratory. Http://wren.ece.umanitoba.ca/.
Near Set Evaluation And Recognition (NEAR) System ———. 2009b. Discovery of perceptually near information granules. In Novel developements in granular computing: Applications of advanced human reasoning and soft computation, ed. J. T. Yao, in press. Hersey, N.Y., USA: Information Science Reference. ———. 2009c. Tolerance near sets and image correspondence. International Journal of Bio-Inspired Computation 1(4):239–245. ———. 2010. Corrigenda and addenda: Tolerance near sets and image correspondence. International Journal of Bio-Inspired Computation 2(5). in press. Peters, J. F., and S. Ramanna. 2007. Feature selection: A near set approach. In Ecml & pkdd workshop in mining complex data, 1–12. Warsaw. ———. 2009. Affinities between perceptual granules: Foundations and perspectives. In Human-centric information processing through granular modelling, ed. A. Bargiela and W. Pedrycz, 49–66. Berline: Springer-Verlag. Peters, J. F., S. Shahfar, S. Ramanna, and T. Szturm. 2007a. Biologically-inspired adaptive learning: A near set approach. In Frontiers in the convergence of bioscience and information technologies. Korea. Peters, J. F., A. Skowron, and J. Stepaniuk. 2006. Nearness in approximation spaces. In Proc. concurrency, specification & programming, 435–445. Humboldt Universit¨at. ———. 2007b. Nearness of objects: Extension of approximation space model. Fundamenta Informaticae 79(3-4):497–512. Peters, J. F., and P. Wasilewski. 2009. Foundations of near sets. Information Sciences. An International Journal 179:3091–3109. Digital object identifier: doi:10.1016/j.ins.2009.04.018. ———. 2010. Tolerance space view of what we see. poincar´e’s physical continuum and zeeman’s visual perception. Mathematical Intelligencer, submitted. Peters, J.F., and L. Puzio. 2009. Image analysis with anisotropic wavelet-based nearness measures. International Journal of Computational Intelligence Systems 3(2): 1–17. Poincar´e, H. 1913. Mathematics and science: Last essays, trans. by j. w. bolduc. N. Y.: Kessinger. Schroeder, M., and M. Wright. 1992. Tolerance and weak tolerance relations. Journal of Combinatorial Mathematics and Combinatorial Computing 11:123–160. Seemann, T. 2002. Digital image processing using local segmentation. Ph.d. dissertation, School of Computer Science and Software Engineering, Monash University. Shreider, Yu. A. 1970. Tolerance spaces. Cybernetics and System Analysis 6(12): 153–758. Skowron, A., and J. Stepaniuk. 1996. Tolerance approximation spaces. Fundamenta Informaticae 27(2-3):245–253.
7–21
7–22
Rough Fuzzy Image Analysis
Sossinsky1986. 1986. Tolerance space theory and some applications. Acta Applicandae Mathematicae: An International Survey Journal on Applying Mathematics and Mathematical Applications 5(2):137–167. Weber, M. 1999. Leaves dataset. Url: www.vision.caltech.edu/archive.html. wxWidgets. 2009. wxwidgets cross-platform gui library v2.8.9. www.wxwidgets.org. Zeeman, E. C. 1962. The topology of the brain and the visual perception. In Topoloy of 3-manifolds and selected topices, ed. K. M. Fort, 240–256. New Jersey: Prentice Hall. Zheng, Z., H. Hu, and Z. Shi. 2005. Tolerance relation based granular space. Lecture Notes in Computer Science 3641:682–691.
8 Perceptual Systems Approach to Measuring Image Resemblance 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–1 8.2 Perceptual Systems, Feature Based Relations and Near Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2 Perceptual Systems • Perceptual Indiscernibility and Tolerance Relations • Nearness Relations and Near Sets
Amir H. Meghdadi Computational Intelligence Laboratory, University of Manitoba
James F. Peters Computational Intelligence Laboratory, University of Manitoba
8.1
8.3 Analysis and Comparison of Images Using Tolerance Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–11 Tolerance Overlap Distribution nearness measure (TOD)
8.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 8–19 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–21
Introduction
Image resemblance is viewed as a form of nearness between sets of perceptual objects as originally proposed in near set theory in (Peters, 2007b,c) and further elaborated in (Peters, 2009, 2010; Peters and Wasilewski, 2009), where a nearness relation is shown to be a tolerance relation on the family of near sets in a perceptual system. The idea of using tolerance relations (Sossinsky, 1986) in formalizing the concept of perceptual resemblance between images was introduced in 2008 (see, e.g., (Peters, 2008b) and elaborated in (Peters, 2009, 2010; Peters and Wasilewski, 2009)) as a result of a collaboration with Z. Pawlak in 2002 on describing the nearness of perceived objects (Pawlak and Peters, 2002,2007). In this approach, images are considered to be non-empty sets of perceptual objects O (pixels or subimages). Moreover, a set F of probe functions is used to describe the objects by extracting some of their perceivable features. < O, F > is named as perceptual information system (Peters and Ramanna, 2009). Near set theory grew out of a generalization of the rough set approach (Pawlak, 1981a,b) in describing the affinities between sample objects. The perceptual basis of near set theory was inspired by Orlowska’s suggestion that approximation spaces are the formal counterpart of perception or observation (Orlowska and Pawlak, 1984). 8–1
Rough Fuzzy Image Analysis
8–2
This concept has been described in (Peters, 2008c) as follows: “Our mind identifies relationships between object features to form perceptions of sensed objects. Our senses gather the information from the objects we perceive and map sensations to values assimilated by the mind. Thus, our senses can be likened to perceptual probe functions in the form of a mapping of stimuli from objects in our environment to sensations (values used by the mind to perceive objects)”.
8.2
Perceptual Systems, Feature Based Relations and Near Sets
In this section, formal definitions of perceptual systems, tolerance and nearness relations are provided. TABLE 8.1
Perceptual System Symbols
Symbol Interpretation
Symbol Interpretation
O F ∼B ∼ =B,ε x/∼B O/∼B B B
X Sample X ⊆ O, B Sample B ⊆ F, φ∈B Probe φ : O −→ , ∼B Weak indiscernibility relation, A ⊂∼ =B,ε ∀x, y ∈ A, x ∼ =B,ε y (preclass) x/∼ x in maximal preclass (tolerance class), =B,ε X/∼B = {x/∼B | x ∈ X}, quotient set, B,ε Tolerance nearness relation, B,ε Weak tolerance nearness relation.
8.2.1
Set of perceptual objects, Set of probe functions, Set of real numbers, Indiscernibility relation, Tolerance relation, = {y ∈ X | y ∼B x}, = {x/∼B | x ∈ O}, Nearness relation, Weak nearness relation,
Perceptual Systems
DEFINITION 8.1 Perceptual System A perceptual system O, F is a realvalued, total∗ , deterministic information system where O is a non-empty set of perceptual objects, while F is a countable set of probe functions.
A perceptual object (x ∈ O) represents something in the physical world that can be perceived with our senses (for example a pixel in an image, or an image in a set of images). Usually, we are dealing with a set of objects X ⊆ O (for example an image that consists of pixels or subimages). A probe function φ(x) is a real-valued
∗
A perceptual system is total inasmuch as each probe function φ maps O to a single real-value.
Perceptual Systems Approach to Measuring Image Resemblance
8–3
function representing a feature of the physical object x. A set of probe functions F = {φ1 , φ2 , ..., φl } can be defined to extract all the feature-values for each object x. However, not all the probe functions (features) may be used all the time. The set B ⊆ F represents the probe functions in use. This approach to representation and comparison of feature values by probe functions started with the introduction of near sets (See (Peters, 2007a) and (Peters, 2008a)). Probe functions provide a basis for describing and discerning affinities between sample objects in the context of what is known as a perceptual system. This represents a departure from partialfunctions known attributes define in terms of a column of values in an information system table in rough set theory. Example 8.1
Perceptual Subimages (pixel windows). An image can be partitioned into subimages viewed as perceptual objects. Each subimage has feature values that are the result of visual perception, i.e., how we visualize a subimage (e.g., its colour, texture, spatial orientation). Figure 8.1 for example shows an image of size 255×255 pixels divided into pixel windows (subimages) of size 85 × 85 pixels. For simplicity, the size of the subimages are very large here. In practice, the size of subimages are much smaller, resulting in a higher number of subimages. (see figure 8.2 for example). Therefore, the image can be represented with a set (O) of 9 perceptual objects as follows: O = {x1 , x2 , ...x9 }
(8.1)
Different probe functions can be defined to extract feature values of an image or subimage. Average gray, image entropy, texture and color information in each subimage are some examples. For practical use, several feature values are needed to represent an image. However, in some examples of this chapter, only average gray value and sometimes entropy have been used. Moreover, feature values have been normalized between 0 and 1 in cases where more than one probe function is used. Figure 8.1 shows the image as well as all the subimages, where all of the pixels in each subimage the gray level of a pixel is replaced with the average gray level of the subimage containing the pixel. By way of illustration, average gray levels are shown in each subimage in Fig. 8.1.
8.2.2
Perceptual Indiscernibility and Tolerance Relations
Indiscernibility and tolerance relations are defined in order to establish and measure affinities between pairs of perceptual objects in a perceptual system O, F . These relations are a subset of O × O. Indiscernibility relation is a key concept in approximation spaces in rough set theory. A perceptual indiscernibility relation is defined as follows.
Rough Fuzzy Image Analysis
8–4
1
4
7
161
139
161
2
5
8
78
80
152
3
6
9
66
116
114
FIGURE 8.1: An image and its 9 average gray levels subimages DEFINITION 8.2 Perceptual Indiscernibility Relation (Peters, 2010) Let O, F be a perceptual system. Let φ ∈ B, x, y ∈ O and let
φB (x) = (φ1 (x), . . . , φi (x), . . . , φL (x)) denote a description of object x containing feature values represented by φi ∈ B. A perceptual indiscernibility relation ∼B is defined relative to B as follows ∼B = {(x, y) ∈ O × O | φB (x) − φB (y) 2 = 0},
(8.2)
where · 2 denotes the L2 (Euclidean) norm. The set of all perceptual objects in O that are indiscernible relative to an object x ∈ O is called an equivalence class, denoted by x/∼B . This form of indiscernibility relation introduced in (Peters, 2009) is a variation of the very useful relation introduced by Z. Pawlak in 1981 (Pawlak, 1981a). Note that all of the elements in x/∼B have matching descriptions, i.e., the objects in x/∼B are indiscernible from each other. Then, by definition,
FIGURE 8.2: An example of an image partitioned into 961 subimages
∀x ∈ O,
x/∼B = {y ∈ O | y ∼B x}.
(8.3)
The indiscernibility relation partitions the set O to form the quotient set O/∼B , a set of x/∼B .
Perceptual Systems Approach to Measuring Image Resemblance
8–5
O/∼B = {x/∼B | x ∈ O}, x/∼B = O,
(8.4) (8.5)
x∈O
∀x, y ∈ O
(x/∼B ) ∩ (y/∼B ) = ∅.
(8.6)
Example 8.2
Figure 8.3a shows an image of size 256 × 256 pixels and their subimages. Let O, F
be a perceptual system where O denotes the set of 25 × 25 subimages.
A) The original image
(a) C) The equivalence class on a white background
(c)
B) A subimage in the covering
(b) D) The equivalence class marked on image
(d)
(8.3a) An equivalent class
A) The original image
(a) C) The equivalence class on a white background
(c)
B) A subimage in the covering
(b) D) The equivalence class marked on image
(d)
(8.3b) A tolerance class
FIGURE 8.3: An image and one of its equivalence (Left) and tolerance (Right) classes Let B = {φ1 (x)} ⊆ F is the set of only one probe function φ1 where φ1 (x) = gray(x) is the average gray scale value of subimage x. The two marked subimages are perceptually indiscernible with respect to their gray level values and hence they are named an equivalence class. The equivalence class is shown both individually (c) and on top of the image where the rest of image is blurred (d). Perceptual indiscernibility relation can also be defined in a weak sense as follows Perceptual Weak Indiscernibility Relation (Peters and Wasilewski, 2009; Peters, 2009)
DEFINITION 8.3
Let O, F be a perceptual system. Let B = {φ1 , φ2 , ..., φl } and x, y ∈ O. A perceptual weak indiscernibility relation ∼B is defined relative to B as follows
Rough Fuzzy Image Analysis
8–6
∼B = {(x, y) ∈ O × O |
∃φi ∈ B
φi (x) − φi (y) 2 = 0}.
(8.7)
Tolerance Relations and Tolerance Classes
The concept of indiscernibility relation can be generalized to the tolerance relation, which is very important in near set theory (Peters, 2009, 2010; Peters and Wasilewski, 2009; Peters and Ramanna, 2009; Peters and Puzio, 2009; Meghdadi, Peters, and Ramanna, 2009). Tolerance relations emerge in transition from the concept of equality to almost equality when comparing objects by they feature values. Tolerance Relation A tolerance relation ζ ⊆ X × X on a set X in general, is a binary relation that is reflexive and symmetric but not necessarily transitive (Sossinsky, 1986). DEFINITION 8.4
1. ζ ⊂ X × X, 2. ∀x ∈ X, (x, x) ∈ ζ, 3. ∀x, y ∈ X, (x, y) ∈ ζ ⇒ (y, x) ∈ ζ. Moreover, the notation x ζ y can be used used as an abbreviation of (x, y) ∈ ζ. The set X supplied with the binary relation ζ is named a tolerance space and is shown with Xζ . The term tolerance space was originally coined by E. C. Zeeman in (Zeeman, 1962). A perceptual tolerance relation is defined in the context of perceptual systems as follows, where ∼ =B, is used instead of ζ, to denote the tolerance relation. Perceptual Tolerance Relation (Peters, 2009, 2010) Let O, F be a perceptual system and let ∈ (set of all real numbers). For every B ⊆ F the perceptual tolerance relation ∼ =B,ε is defined as follows: DEFINITION 8.5
∼ =B,ε = {(x, y) ∈ O × O : φB (x) − φB (y) 2 ≤ ε},
(8.8)
where φB (x) = [φ1 (x) φ2 (x) ... φl (x)]T is a feature-value vector representing an object description obtained using all of the probe functions in B and · 2 is the L2 norm (Lp norm in general (J¨ anich, 1984)). A tolerance relation ∼ =B,ε defines a covering on the set O of perceptual objects, resulting in a set of tolerance classes and tolerance blocks (Bartol, Mir, Piro, and Rossell, 2004; Schroeder and Wright, 1992). In this work, the definition of tolerance class in (Bartol et al., 2004) has been used as follows and is similar to equivalence class in (8.3).
Perceptual Systems Approach to Measuring Image Resemblance
8–7
Tolerance Preclass ∼ A set A ⊂=B,ε is a preclass in ∼ =B,ε if, and only if ∀x, y ∈ A, x ∼ =B,ε y. DEFINITION 8.6
DEFINITION 8.7 Tolerance Class A set A ⊂∼ =B,ε if, and only if A is a maximal preclass. =B,ε is a tolerance class in ∼
where ∼ =B,ε is defined in (8.8) with respect to a given set of probe functions B. Let x/∼ =B,ε denote a maximal preclass containing x. Tolerance classes can overlap and hence the set of all tolerance classes is a covering of O denoted by O/∼ =B,ε . Recall that a cover is a family of subsets of O whose union is O and their intersection is not necessarily empty. O/∼ =B,ε = {x/∼ =B,ε | x ∈ O}
x/∼ =B,ε = O
(8.9) (8.10)
x∈O
Example 8.3
Figure 8.3b shows the image in example 8.2 and its subimages. Let O, F be a perceptual system where O denotes the set of 25 × 25 subimages. The image is divided into 100 subimages of size 25 × 25 and can be shown as a set X = O of all the 100 subimages. Let B = {φ1 (x)} ⊆ F where φ1 (x) = gray(x) is the average gray scale value of pixels in subimage x, normalized between 0 and 1. Let = .1. A subimage x has been selected and the marked subimages in the figure belong to a tolerance class that is represented by the selected subimage because their gray level values are close to the gray value of subimage x within the tolerance level . Example 8.4
The simple image in example 8.1 (Figure 8.1) is considered here again. For each given subimage, the corresponding tolerance class has been obtained by finding all the subimages that have the average gray scale values within the tolerance range ( = .1) of the average gray value of the given subimage. Figure 8.4 shows all the tolerance classes which are calculated for ∀x ∈ O after removing the redundant classes. Note that x1 /∼ =B,ε = x7 /∼ =B,ε = x8 /∼ =B,ε , x2 /∼ =B,ε = x3 /∼ =B,ε = x5 /∼ =B,ε and x6 /∼ =B,ε = x9 /∼ =B,ε and hence there are 4 tolerance classes in total. The set of all tolerance classes is a covering of O and is shown with O/∼ =B = {x/∼ =B | x ∈ O} = {x4 /∼ =B, , x5 /∼ =B, , x8 /∼ =B, , x9 /∼ =B, , }. Tolerance Matrix
In order to demonstrate a tolerance space and all its tolerance classes, a tolerance matrix is defined here to show the tolerance relation between pairs of perceptual objects. Each row in a tolerance matrix represents one tolerance class and each column represents one perceptual object (subimage). Corresponding to each probe
Rough Fuzzy Image Analysis
8–8 The original image
The tolerance class displayed on a white background
161
139
161
78
80
152
66
116
114
The tolerance class displayed on the image
1th Tol class. subimage:5
2th Tol class. subimage:9
3th Tol class. subimage:4
4th Tol class. subimage:8
(8.4a)
(8.4b)
FIGURE 8.4: (a) The image covering, (b) all classes O/∼ =B = {x/∼ =B | x ∈ O} function φk ∈ B, a tolerance matrix T Mk = [tij ] is defined as in equation 8.11, where the elements tij of the matrix will be zero if subimages xi and xj do not belong to the same tolerance class which is defined by xi . tij =
φk (xj ) 0
if xj ∈ xi /∼ =B otherwise
(8.11)
Subsequently, an Ordered Tolerance Matrix (OT M ) is defined by removing identical rows in the tolerance matrix (identical tolerance classes) and sorting out the rows (classes) based on the average value of φk (x) among the perceptual objects of each tolerance class. Example 8.5
In example 8.4, there are 9 perceptual objects (equation 8.1) and 4 tolerance classes. The tolerance matrix T M defined for the only probe function φ1 (average gray value between 0 and 255) will be a 9 × 9 matrix as follows with 4 non redundant rows: ⎡
T Mφ1
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ = {tij } = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
162 0 0 140 0 0 162 153 0 0 79 67 0 81 0 0 0 0 0 79 67 0 81 0 0 0 0 162 0 0 140 0 117 162 153 115 0 79 67 0 81 0 0 0 0 0 0 0 140 0 117 0 0 115 162 0 0 140 0 0 162 153 0 162 0 0 140 0 0 162 153 0 0 0 0 140 0 117 0 0 115
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(8.12)
The element tij of the tolerance matrix is set to the average gray value of j th subimage if ith and j th subimages belong to the same tolerance class that is defined
Perceptual Systems Approach to Measuring Image Resemblance
8–9
by ith subimage. (tij = 0 otherwise). The T M matrix can be displayed as an image as shown in figure 8.5.
2
1
3
4
Tolerance Matrix 6 5
7
9
8
1
2
3
Subimages
4
5
6
7
8
9
Subimages
FIGURE 8.5: The tolerance matrix displayed as an image
The tolerance classes have average gray values of 65, 150, 159, 175 and 184 respectively. ⎡
⎤ 0 79 67 0 81 0 0 0 0 ⎢ 0 0 0 140 0 117 0 0 115 ⎥ ⎥ OT M = ⎢ ⎣ 162 0 0 140 0 117 162 153 115 ⎦ 162 0 0 140 0 0 162 153 0
(8.13)
The OT M matrix can be shown as an image that displays the gray level values and the spatial information of all the tolerance classes.
Average Gray Level 67 Class 1
Class 2
Class 3
Class 4 162 1
2
3
4 5 Subimage index
6
7
8
9
FIGURE 8.6: The ordered tolerance matrix displayed as an image
Rough Fuzzy Image Analysis
8–10
In Figures 8.8 to 8.10, examples of ordered tolerance matrix is shown for more images. The images are from the Berkeley segmentation dataset (Martin, Fowlkes, Tal, and Malik, 2001). Each image size is 321 × 481 which is divided into 160 subimages of size 30 × 30. The only probe function used is the average gray level values between 0 and 255. The value of in equation 8.8 is equal to 20 where the gray values have not been normalized.
8.2.3
Nearness Relations and Near Sets
Nearness relations were introduced in the context of a perceptual system O, F
by James Peters (Peters and Wasilewski, 2009) after the introduction of near set theory in 2007 (see (Peters, 2007c),(Peters, 2007b) and (Peters and Wasilewski, 2009)). These relations are defined between sets of perceptual objects. Therefore, a nearness relation R is a subset of P(O) × P(O). DEFINITION 8.8 Weakly Nearness Relation Let O, F be a perceptual system and let X, Y ⊆ O. A set X is weakly near to a set Y within a perceptual system O, F and is shown with XF Y , if and only if the following condition is satisfied: ∃ x ∈ X, y ∈ Y, B ⊆ F such that x ∼B y Consequently, nearness relation is defined on P(O) as follows:
F = {(X, Y ) ∈ P(O) × P(O) | XF Y }
(8.14)
Example 8.6
Figure 8.7 shows two images and their corresponding subimages (25 × 25 pixels each). Let X and Y denotes the set of all subimages in image 1 and image 2 respectively. Let O = X ∪ Y be the set of all subimages in two images. Let O, F
be a perceptual system and let B ⊆ F and B = {φ1 (x)}, where φ1 (x) = gray(x) is the gray scale value of subimage x. Images X and Y are then weakly near to each other (XF Y ) because they have elements x ∈ X and y ∈ Y with matching descriptions (x ∼B y). Nearness Relation Let O, F be a perceptual system and let X, Y ⊆ O. A set X is near to a set Y within a perceptual system O, F and is shown with X F Y , if and only if the following condition is satisfied (Peters and Wasilewski, 2009): DEFINITION 8.9
∃ x ∈ X, y ∈ Y, A, B ⊆ F , f ∈ F and also ∃ A ∈ O/∼A , B ∈ O/∼B , C ∈ O/∼C such that A, B ⊆ C Consequently, the nearness relation is defined on P(O) as follows: F = {(X, Y ) ∈ P(O) × P(O) | X F Y }
(8.15)
Perceptual Systems Approach to Measuring Image Resemblance Image 1
Image 2
X
Y
8–11
FIGURE 8.7: Two images and sample subimages with matching descriptions Weak Tolerance Nearness Relation Let O, F be a perceptual system and let X, Y ⊆ O and ∈ . A set X is perceptually almost near to a set Y in a weak sense within the perceptual system O, F and is shown with F , if and only if the following condition is satisfied: ∃ x ∈ X, ∃ y ∈ Y, ∃ B ⊆ F such that x ∼ =B,ε y Consequently, weak tolerance nearness relation is defined on P(O) as follows: DEFINITION 8.10
F = {(X, Y ) ∈ P(O) × P(O) | XF Y }
(8.16)
DEFINITION 8.11 Near Sets Let O, F be a perceptual system and let X ⊆ O. A set X is a near set iff there is Y ⊆ O such that X F Y . The family of near sets of a perceptual system is denoted by Near F (O)
Tolerance Near Sets Let O, F be a perceptual system and let X ⊆ O. A set X is a tolerance near set iff there is Y ⊆ O such that X F Y . The family of tolerance near sets of a perceptual system is denoted by Near F˜ (O) DEFINITION 8.12
8.3
Analysis and Comparison of Images Using Tolerance Classes
Tolerance classes as described in section 8.2.2, can be viewed as structural elements in representing an image. The motivation for using tolerance classes in perceptual image analysis is the conjecture that visual perception in the human perception is performed in the class level rather than pixel level.
8–12
Rough Fuzzy Image Analysis
For a single image, tolerance classes can be calculated as shown in section 8.2.2 and displayed in a tolerance matrix or ordered tolerance matrix as shown in the simple example of Figures 8.5 and 8.6. The main approach in this chapter for comparing pairs of images is to compare the corresponding tolerance classes. Size of the tolerance classes, the overlap between tolerance classes and their distributions, for example can be used to quantitatively compare the image tolerance classes. Lattice Valued Tolerance Matrix
FIGURE 8.8: An image covering and the ordered tolerance matrix (30 × 30 subimages)
Lattice Valued Tolerance Matrix
FIGURE 8.9: An image covering and the ordered tolerance matrix (30 × 30 subimages)
Lattice Valued Tolerance Matrix
FIGURE 8.10: An image covering and the ordered tolerance matrix (30 × 30 subimages)
Perceptual Systems Approach to Measuring Image Resemblance
8.3.1
8–13
Tolerance Overlap Distribution nearness measure (TOD)
A similarity measure is proposed here based on statistical comparison of overlaps between tolerance classes at each subimage. The proposed method is as follows. Suppose X, Y ∈ O are two images (sets of perceptual objects). The sets of all tolerance classes for image X and Y are shown as follow and form a covering for each image. X/∼ =B,ε = {x/∼ =B,ε | x ∈ X}
(8.17)
Y/∼ =B,ε = {y/∼ =B,ε | y ∈ Y }
(8.18)
Subsequently, the set of all overlapping tolerance classes corresponding to each object (subimage) x is named as ΩX/∼ (x) and is defined as follows: =B,ε (x) = {z/∼ ΩX/∼ =B,ε ∈ X/∼ =B,ε | x ∈ z/∼ =B,ε } =B,ε
(8.19)
Consequently, the normalized number of tolerance classes in X/∼ =B,ε which are overlapping at x is named as ω and defined as follow: (x) ΩX/∼ =B,ε (8.20) ωX/∼ (x) = =B,ε X/∼ =B,ε Similarly, the set of all overlapping tolerance classes at every subimage y ∈ Y is denoted by ωY/∼ (y). Assuming that the set of probe functions B and the value of =B,ε
are known, we use the more simplified notation of ΩX (x) and ωX (x) for the set X/∼ =B,ε and the notations ΩY (y) and ωY (y) for the set Y/∼ =B,ε . Let {b1 , b2 , ..., bNb } are the discrete bins in calculation of histograms of ωX (x) and ωY (y) where x ∈ X and y ∈ Y . Therefore, the empirical distribution function (histogram) of ωX (x) at bin value bj is shown here as HωX (bj ) and defined as the number of subimages x with a value of ωX (x) that belongs to j th bin. The cumulative distribution function is then defined as follows: CHωX (bj ) =
i=j
HωX (bi )
(8.21)
i=1
CHωY (bj ) is similarly defined for image Y . The Tolerance Overlap Distribution (TOD) nearness measure is defined by taking the sum of differences between cumulative histograms as defined in equation 8.22 where γ is a scaling factor and is set here to 0.6. ⎛ T OD = 1 − ⎝
j=N
b j=1
⎞γ |CHωX (bj ) − CHωY (bj )|⎠
(8.22)
Rough Fuzzy Image Analysis
8–14
The proposed method is compared with tolerance nearness measure (tNM, a previously developed method by C. Henry and J. Peters (Hassanien, Abraham, Peters, Schaefer, and Henry, 2009; Peters and Wasilewski, 2009; Henry and Peters, 2009b) and also with a simple histogram based method based on comparing the cumulative histograms of gray level values. The results of comparison is presented in example 8.7. Tolerance nearness measure: tNM
Tolerance nearness measure (tNM) is based on the idea that if one considers the union of two images as the set of perceptual objects, tolerance classes should contain almost equal number of subimages from each image. tNM between two images (Henry and Peters, 2009b) is defined as follows: Suppose X and Y are the sets of perceptual objects (subimages) in image 1 and image 2. Let z/∼ =B,ε denote a maximal preclass containing z. Z = X ∪ Y is the set of all perceptual objects in the union of images and for each z ∈ Z the tolerance class is shown as: z/∼ =B,ε = {s ∈ Z
|
φB (z) − φB (s) ≤
(8.23)
The part of the tolerance class that is a subset of X is named as [z/∼ =B,ε ]⊆X and similarly, part of the tolerance class that is a subset of Y is named [z/∼ =B,ε ]⊆Y . Therefore: [z/∼ =B,ε ]⊆X {x ∈ z/∼ =B,ε | x ∈ X} ⊆ z/∼ =B,ε
(8.24)
[z/∼ =B,ε ]⊆Y {y ∈ z/∼ =B,ε | y ∈ Y } ⊆ z/∼ =B,ε
(8.25)
z/∼ =B,ε = [z/∼ =B,ε ]⊆X ∩ [z/∼ =B,ε ]⊆Y
(8.26)
Subsequently, a tolerance nearness measure is defined as the weighted average of the closeness between the cardinality (size) of sets [z/∼ =B,ε ]⊆X and the cardinality of ] where the cardinality of z is used as the weighting factor. [z/∼ ∼ /=B,ε =B,ε ⊆Y tN M = z/∼ =B,ε
1 |z/∼ =B,ε |
×
min( |[z/∼ =B,ε ]⊆X | , |[z/∼ =B,ε ]⊆Y | ) z/∼ =B,ε
max( |[z/∼ =B,ε ]⊆X | , |[z/∼ =B,ε ]⊆Y | )
× |z/∼ =B,ε | (8.27)
Histogram similarity measure: HSM
For the sake of comparison, a third similarity measure is also defined here to compare distributions (histograms) of gray scale values in images. Therefore a histogram similarity measure is defined here as the absolute difference between CDF of the gray scale values in two images similar to equation 8.22 ⎞γ ⎛ j=N
b |CHgX (bj ) − CHgY (bj )|⎠ (8.28) HSM = 1 − ⎝ j=1
Perceptual Systems Approach to Measuring Image Resemblance
8–15
where gX (x) and gY (y) are the (average) gray level values of (subimages) pixels x ∈ X and y ∈ Y . {b1 , b2 , ..., bNb } are the bins in calculating the histograms of gX (x) and gY (y) and CHgX (bj ) and CHgY (bj ) are cumulative distribution functions (empirical histograms). Example 8.7
Sample images and their tolerance classes are shown in Figure 8.12. The number of overlapping tolerance classes at each subimage, ωX (x)) (or ωY (y)) is plotted versus the index of subimage, x (or y). The empirical CDFs for ωX (x) and ωY (y) are shown in Figure 8.11. TOD and tNM nearness measures are shown in table 8.2 for different values of p × p subimage size (p = 10, 20) and epsilon (ε = 0.01, 0.05, 0.1, 0.2). Different sets (B1 and B2 ) of probe function have been used. B1 = {φ1 } and B2 = {φ1 , φ2 } where φ1 represents average gray value of a subimage and φ2 represent the entropy of subimage. HSM measure is also calculated using the gray level values of all the pixels in each image using equation 8.28. #Tol-X and #Tol-Y represent the number of tolerance classes for images X and Y, respectively. The results are shown in table 8.2 and plotted in figure 8.13.
Empirical histograms / Distributions
Cumulative histograms / CDFs
200
1
1st Histogram HX(bi)
180
Y
i
160 Cumulative histogram values
Histogram values
2nd CDF CHY(bi) |CHX Ŧ CHY|
0.8
140 120 100 80 60
0.7 0.6 0.5 0.4 0.3
40
0.2
20
0.1
0 0
1st CDF CHX(bi)
0.9
2nd Histogram H (b )
0.2
0.4 0.6 Histogram Bins: bi
0.8
1
0 0
0.2
0.4 0.6 Histogram Bins: bi
0.8
1
FIGURE 8.11: CDF of the number of overlaps between tolerance classes for ωX (x) and ωY (y)
Similarity between groups of conceptually different images
The similarity measures discussed in section 8.3.1, can be calculated for each pair of images in an image database to reveal structures in the set of images and possibly be used for clustering or classification of images or image retrieval. In order to
Rough Fuzzy Image Analysis
8–16 Image X
Image Y
Ordered tolerance classes in X
Ordered tolerance classes in Y 30
47
222
215
y (subimages in Y)
x (subimages in X)
ω (y)
1
0.5
Y
X
ω (x)
1
0 0
200 400 x (subimages in X)
0.5
0 0
600
200 400 y (subimages in Y)
600
FIGURE 8.12: Sample images, their ordered tolerance matrices and plot of the number of overlaps TABLE 8.2
Similarity measure calculated for different values of tolerance level () and subimage size (p)
p
#Tol-X B1 B2
#Tol-Y B1 B2
10 10 10 10 20 20 20 20
TOD B1 B2
HSM B1 B2
tNM B1 B2
0.01 0.05 0.10 0.20
425 401 399 384
526 592 603 614
423 411 425 371
534 606 620 602
0.96 0.91 0.88 0.83
0.98 0.90 0.88 0.85
0.86 0.86 0.86 0.86
0.86 0.86 0.86 0.86
0.74 0.86 0.87 0.91
0.45 0.68 0.73 0.88
0.01 0.05 0.10 0.20
86 96 92 90
124 133 132 124
78 102 100 86
126 134 136 120
0.94 0.91 0.87 0.83
0.97 0.92 0.89 0.84
0.86 0.86 0.86 0.86
0.86 0.86 0.86 0.86
0.58 0.74 0.78 0.88
0.34 0.59 0.72 0.89
visually demonstrate the similarity between pairs of images, a similarity matrix demonstration scheme is used here in which the similarity between pairs of images in a set of N images is shown as a symmetric N × N matrix. Let I = {I1 , I2 , ..., IN }
Perceptual Systems Approach to Measuring Image Resemblance
(8.13a)
8–17
(8.13b)
FIGURE 8.13: (Please see color insert) tNM, HSM and TOD for different values of tolerance level, subimage size p = 10, 20
be a set of images. Let sij is one kind of similarity (nearness) measure between image Ii and image Ij . The similarity matrix is SM = {sij } and is graphically shown with a square of N × N picture elements (cells) where each picture element is shown with a gray scale brightness value of gr = sij . Full similarity (identical) is shown with 1 (white) and the complete dissimilarity is shown with 0 (black). If the image database can be classified into subsets of conceptually similar images, it is expected that similarity is high (bright) within images of a subset and low (dark) between subsets.
FIGURE 8.14: Selected images from Caltech image database
Example 8.8
Figure 8.14 shows 9 sample images randomly selected from Caltech vision group
Rough Fuzzy Image Analysis
8–18 Similarity Matrix − Sim1 − TOD
Similarity Matrix − Sim2 − tNM
1 airplane1.jpg−[1]
1 airplane1.jpg−[1]
0.9 airplane2.jpg−[2] 0.8 airplane3.jpg−[3]
0.9 airplane2.jpg−[2] 0.8
0.7
airplane3.jpg−[3]
face_0009.jpg−[4]
0.6
face_0009.jpg−[4]
0.6
face_0073.jpg−[5]
0.5
face_0073.jpg−[5]
0.5
face_0095.jpg−[6]
0.4
face_0095.jpg−[6]
0.4
image_0008.jpg−[7]
0.3
0.7
0.3
image_0008.jpg−[7]
0.2
image_0011.jpg−[8]
0.2
image_0011.jpg−[8]
0.1 image_0021.jpg−[9]
0.1 image_0021.jpg−[9]
0 1
2
3
4
5
6
7
8
9
0 1
2
3
4
5
6
7
8
9
FIGURE 8.15: TOD and tNM similarity matrix for images in 8.14
database (Computational Vision Group at Caltech, 2009). The set consists of 3 subsets of airplanes, faces and leaves images. The similarity matrices for TOD and tNM are shown in figure 8.15 where B = {φ1 , φ2 }, the set of probe functions consists of φ1 (the average gray level) and φ2 (entropy) of subimages. Feature values have been normalized between 0 and 1 and ε = 0.1.
FIGURE 8.16: Selected images from USC-SIPI dataset
Example 8.9
As an example, 50 images in 5 different groups are randomly selected from USCSIPI image database (USC Signal and image processing institute, 2009) shown in figure 8.16 where each group of images is shown in one row. TOD and tNM are
Perceptual Systems Approach to Measuring Image Resemblance
8–19
Similarity Matrix − Sim2 − tNM
Similarity Matrix − Sim1 − TOD 1
Aerial (1).tiff−[1] Aerial (10).tiff−[2] Aerial (2).tiff−[3] Aerial (3).tiff−[4] Aerial (4).tiff−[5] Aerial (5).tiff−[6] Aerial (6).tiff−[7] Aerial (7).tiff−[8] Aerial (8).tiff−[9] Aerial (9).tiff−[10] Man (1).tiff−[11] Man (10).tiff−[12] Man (2).tiff−[13] Man (3).tiff−[14] Man (4).tiff−[15] Man (5).tiff−[16] Man (6).tiff−[17] Man (7).tiff−[18] Man (8).tiff−[19] Man (9).tiff−[20] Motion (1).jpg−[21] Motion (10).jpg−[22] Motion (2).jpg−[23] Motion (3).jpg−[24] Motion (4).jpg−[25] Motion (5).jpg−[26] Motion (6).jpg−[27] Motion (7).jpg−[28] Motion (8).jpg−[29] Motion (9).jpg−[30] Satellite (1).jpg−[31] Satellite (10).jpg−[32] Satellite (2).jpg−[33] Satellite (3).jpg−[34] Satellite (4).jpg−[35] Satellite (5).jpg−[36] Satellite (6).jpg−[37] Satellite (7).jpg−[38] Satellite (8).jpg−[39] Satellite (9).jpg−[40] brain (1).jpg−[41] brain (10).jpg−[42] brain (2).jpg−[43] brain (3).jpg−[44] brain (4).jpg−[45] brain (5).jpg−[46] brain (6).jpg−[47] brain (7).jpg−[48] brain (8).jpg−[49] brain (9).jpg−[50]
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
Aerial (1).tiff−[1] Aerial (10).tiff−[2] Aerial (2).tiff−[3] Aerial (3).tiff−[4] Aerial (4).tiff−[5] Aerial (5).tiff−[6] Aerial (6).tiff−[7] Aerial (7).tiff−[8] Aerial (8).tiff−[9] Aerial (9).tiff−[10] Man (1).tiff−[11] Man (10).tiff−[12] Man (2).tiff−[13] Man (3).tiff−[14] Man (4).tiff−[15] Man (5).tiff−[16] Man (6).tiff−[17] Man (7).tiff−[18] Man (8).tiff−[19] Man (9).tiff−[20] Motion (1).jpg−[21] Motion (10).jpg−[22] Motion (2).jpg−[23] Motion (3).jpg−[24] Motion (4).jpg−[25] Motion (5).jpg−[26] Motion (6).jpg−[27] Motion (7).jpg−[28] Motion (8).jpg−[29] Motion (9).jpg−[30] Satellite (1).jpg−[31] Satellite (10).jpg−[32] Satellite (2).jpg−[33] Satellite (3).jpg−[34] Satellite (4).jpg−[35] Satellite (5).jpg−[36] Satellite (6).jpg−[37] Satellite (7).jpg−[38] Satellite (8).jpg−[39] Satellite (9).jpg−[40] brain (1).jpg−[41] brain (10).jpg−[42] brain (2).jpg−[43] brain (3).jpg−[44] brain (4).jpg−[45] brain (5).jpg−[46] brain (6).jpg−[47] brain (7).jpg−[48] brain (8).jpg−[49] brain (9).jpg−[50]
1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950
FIGURE 8.17: Matrix of TOD and tNM similarity measures between pairs of images = 0.1, p=10 calculated for each possible pair of images where B = {φ1 , φ2 } consists of normalized gray level and entropy probe functions and = .1. Images are 256 by 256 pixels in size and the subimage size is 10 by 10 pixels. Figure 8.17 displays TOD and tNM similarity matrices where the similarity between images of the same group is generally higher.
8.4
Summary and Conclusions
In a perceptual system approach to discovering image similarities, images are considered as sets of points (pixels) with measurable features such as colour, i.e., points with feature values that we can perceive. Inspired by rough set theory, near set theory provides a framework for describing affinities between sets of perceptual objects and thus can be used to define description-based similarity measures between images. In this framework, tolerance spaces can be used to establish a realistic concept of nearness between sets of objects where equalities hold only approximately within a tolerance level of permissible deviation. This tolerance space form of near sets provides a foundation for modeling human perception in a physical continuum. Therefore, a perceptual image processing approach is rapidly growing out of the near set theory in recent years. See (Peters, 2009, 2010; Henry and Peters, 2009b; Peters and Wasilewski, 2009) and (Henry and Peters, 2009a), for example. In this chapter, two novel measures of image similarity based on tolerance spaces have been introduced and studied. The first nearness measure (tNM) was previously published in (Henry and Peters, 2008) and the second measure (TOD) was first proposed by the authors in (Meghdadi et al., 2009) and further elaborated and tested here. Both measures were tested on sample pairs of images and their dependence on the method parameters such as tolerance level (ε) and perceptual subimage size was studied. In order to evaluate the measures, nearness within and between groups
8–20
Rough Fuzzy Image Analysis
of perceptually similar images was calculated. Preliminary results show that very simple probe functions such as average gray level value and image entropy can be successful in classifying one category of images out of the rest (for example, images of airplanes in case of TOD and images of leaves in case of tNM in example 8.8). In our future work, many more probe functions as well as refinements in existing measures and the introduction of new nearness measures are currently being considered as a means of strengthening the proposed image retrieval and image classification methods described in this chapter.
Perceptual Systems Approach to Measuring Image Resemblance
8–21
Bibliography Bartol, W., J. Mir, K. Piro, and F. Rossell. 2004. On the coverings by tolerance classes. Information Sciences 166(1-4):193–211. Doi: DOI: 10.1016/j.ins.2003.12.002. Computational Vision Group at Caltech. 2009. ages archives of computational vision group at http://www.vision.caltech.edu/archive.html.
Imcaltech.
Hassanien, Aboul Ella, Ajith Abraham, James F. Peters, Gerald Schaefer, and Christopher Henry. 2009. Rough sets and near sets in medical imaging: A review. IEEE Transactions on Information Technology in Biomedicine 13(6): 955–968. Digital object identifier: 10.1109/TITB.2009.2017017. Henry, C., and J. F. Peters. 2008. Near set index in an objective image segmentation evaluation framework. In Geographic object based image analysis: Pixels, objects, intelligence, 1–6. University of Calgary, Alberta. ———. 2009a. Perception based image classification. Tech. Rep., Computational Intelligence Laboratory, University of Manitoba. UM CI Laboratory Technical Report No. TR-2009-016. Henry, Christopher, and James F. Peters. 2009b. Perceptual image analysis. International Journal of Bio-Inspired Computation 2(2):to appear. J¨ anich, K. 1984. Topology. Berlin: Springer-Verlag. Martin, D., C. Fowlkes, D. Tal, and J. Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In 8th int’l conf. computer vision, vol. 2, 416–423. Meghdadi, Amir H., James F. Peters, and Sheela Ramanna. 2009. Tolerance classes in measuring image resemblance. 127–134. Santiago, Chile. ISBN 978-3-64-04591-2, doi 10.1007/978-3-642-04592-9 16. Orlowska, Ewa, and Zdzislaw Pawlak. 1984. Representation of nondeterministic information. Theoretical computer science 29(1-2):27–39. Pawlak, Zdzislaw. 1981a. Classification of objects by means of attributes 429. ———. 1981b. Rough sets. International J. Comp. Inform. Science 11:341–356. Pawlak, Zdzislaw, and James Peters. 2002,2007. Jak blisko (how near). Systemy Wspomagania Decyzji I:57, 109. ISBN 83-920730-4-5.
8–22
Rough Fuzzy Image Analysis
Peters, James, and Sheela Ramanna. 2009. Affinities between perceptual granules: Foundations and perspectives. In Human-centric information processing through granular modelling, 49–66. Berlin: Springer. 10.1007, ISBN 978-3540-92916-1 3. Peters, James F. 2007a. Classification of objects by means of features. In Proc. ieee symposium series on foundations of computational intelligence (ieee scci 2007), 1–8. Honolulu, Hawaii. ———. 2007b. Near sets. general theory about nearness of objects. Applied Mathematical Sciences 1(53):2609–2629. ———. 2007c. Near sets. special theory about nearness of objects. Fundamenta Informaticae 75:407–433. ———. 2008a. Classification of perceptual objects by means of features. Int. J. of Info. Technology & Intell. Computing 3(2):1–35. ———. 2009. Tolerance near sets and image correspondence. International Journal of Bio-Inspired Computation 1(4):239–245. ———. 2010. Corrigenda and addenda: Tolerance near sets and image correspondence. International Journal of Bio-Inspired Computation 2(5). in press. Peters, James F., and Piotr Wasilewski. 2009. Foundations of near sets. Information Sciences 179:3091–3109. Digital object identifier: doi:10.1016/j.ins.2009.04.018. Peters, J.F. 2008b. Notes on perception. Computational Intelligence Laboratory Seminar. ———. 2008c. Notes on tolerance relations. Computational Intelligence Laboratory Seminar. See J.F. Peters, “Tolerance near sets and image correspondence”, Int. J. of Bio-Inspired Computing 4 (1), 2009, 239-245. Peters, J.F., and L. Puzio. 2009. Image analysis with anisotropic wavelet-based nearness measures. International Journal of Computational Intelligence Systems 3(2):1–17. Schroeder, M., and M. Wright. 1992. Tolerance and weak tolerance relations. Journal of Combinatorial Mathematics and Combinatorial Computing 11: 123–160. Sossinsky, A. B. 1986. Tolerance space theory and some applications. Acta Applicandae Mathematicae: An International Survey Journal on Applying Mathematics and Mathematical Applications 5(2):137–167.
Perceptual Systems Approach to Measuring Image Resemblance
8–23
USC Signal and image processing institute. 2009. USC signal and image processing institute image database. http://sipi.usc.edu/database. Zeeman, E. C. 1962. The topology of the brain and visual perception. Topology of 3-manifolds 3:240–248.
9 From Tolerance Near Sets to Perceptual Image Analysis
Shabnam Shahfar University of Manitoba
Amir H. Meghdadi University of Manitoba
James F. Peters University of Manitoba
9.1
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–1 9.2 Perceptual systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2 9.3 Perceptual Indiscernibility and Tolerance Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3 9.4 Near Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–5 9.5 Three Tolerance Near Set-based Nearness Measures for Image Analysis and Comparison . . . . . . . . . . . . . . . . . . . . 9–6 Tolerance Cardinality Distribution Nearness Measure (TCD) • Tolerance Overlap Distribution nearness measure (TOD) • Tolerance Nearness Measure (tNM)
9.6 Perceptual Image Analysis System . . . . . . . . . . . . . . . . . 9–9 9.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–11 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–15
Introduction
The problem considered in this chapter how is to find and measure the similarity between two images. The image correspondence problem is a central and important area of research in computer vision. To solve the image correspondence problem, a biologically inspired approach using near sets and tolerance classes is proposed in this chapter. The proposed method is developed in the context of perceptual systems (Peters and Ramanna, 2008), where each image or parts of an image are considered as perceptual objects (Peters and Wasilewski, 2009). ”A perceptual object is something presented to the senses or knowable by human mind” (Peters and Wasilewski, 2009; Murray, Bradley, Craigie, and Onions, 1933). The perceptual system approach presented here is inspired by the early 1980s work of Z. Pawlak (Pawlak, 1981) on the classification of objects by means of attributes and E. Orlowska (Orlowska, 1982) on approximate spaces as formal counterparts of perception and observation. It has been shown in (Peters, 2007b) that near sets are a generalization of rough sets (Pawlak and Skowron, 2007; Polkowski, 2002). Near sets provide a good basis for the classification of perceptual objects. Near sets are disjoint sets that have matching descriptions to some degree (Henry and Peters, 2009b). One set X is considered near another set Y in the case where there is at least one x ∈ X with a description that matches the description of y ∈ Y (Peters, 2007c,b). The proposed approach in this chapter also benefits from the idea of tolerance classes introduced by Zeeman (Zeeman, 1961). Tolerance relations are viewed 9–1
9–2
Rough Fuzzy Image Analysis
as good models of how one perceives, how one sees. Tolerance relations are also considered as a basis for studying similarities between visual perceptions (Zeeman, 1961; Peters, 2008c). In this chapter, first, the formal definitions of perceptual systems, equivalence and tolerance relations, and near set theory will be reviewed in sections 9.2, 9.3, and 9.4, respectively. Then, the new nearness measure called tolerance cardinality distribution measure (TCD) will be introduced in section 9.5. More details of the implementation of a tolerance-based perceptual image analysis system will be reviewed in section 9.6. Finally, section 9.7 will conclude the chapter.
9.2
Perceptual systems
A perceptual system is a real valued, total, deterministic, information system. Such a system consists of a set of perceptual objects and set of probe functions representing object features (Peters and Wasilewski, 2009), (Peters, 2007b,a; Peters and Ramanna, 2008). A perceptual object (x ∈ O) is something presented to the senses or knowable by human mind (Murray et al., 1933). For example, a pixel or a group of pixels in an image can be perceived as a perceptual object. Features of an object such as color, entropy, texture, contour, spatial orientation, etc. can be represented by probe functions. A probe function can be thought of as a model for a sensor. A probe function φ(x) is a real-valued function representing features of the physical object x. A set of prob functions F = {φ1 , φ2 , ..., φl } can be defined to generate all the features for each object x, where φi : O → <. However, all the of the probe functions (features) in F are used all of the time. The set B ⊆ F represents the probe functions in use. Example 9.1
Perceptual images & subimages An image in a set of images can be considered as a perceptual object. Anther example for a perceptual object is a pixel or a group of pixels inside an specific image. Also different regions in an image can be considered as perceptual objects. For example, an image can be divided into different regions(subimages), and each of these subimages can be considered as a perceptual object. Figure 9.1 shows an example image. This image has been divided into square subimages(windows) with the same size. The group of pixels belonging to each window(subimage) creates a perceptual object. The set of probe functions or different features for the perceptual objects represented in an image can include average gray level, color, entropy, edge, texture, and spatial orientation of the pixels of that perceptual object in the image. For instance, in the example of figure 9.1, we would like to use only average gray level of the pixels in each subimage (window) as a probe function. Hence, sets F and B will be, F = {φ1 , φ2 , ..., φl } ={avg gray level, color, entropy, edge, texture, spatial orientation}, e.g., B ={φ1 } ={avg gray level of the pixels in each subimage (window)}, e.g. Figure 9.1d shows feature values of the perceptual objects in Figure 9.1b. The average gray scale value of all of the pixels in a perceptual object(a window box) has been replaced with the value of the pixels in that specific region.
9–3
From Tolerance Near Sets to Perceptual Image Analysis
(9.1a) Original image (Walnut)
(9.1b) Each subimage is a perceptual object
(9.1c) Gray scale image
(9.1d) Avg gray scale used as a feature value
FIGURE 9.1: Images and perceptual objects This approach to representation and comparison of feature values by probe functions started with the introduction of near sets (Peters, 2007a, 2008b). Probe functions provide a basis for describing and discerning affinities between sample objects in the context of a perceptual information system. This approach is a generalization of the concept of attributes in the approximation spaces exists in rough set theory (Peters, 2008d; Meghdadi, 2009). Definition 9.2.1 Perceptual System A perceptual system hO, Fi is a real valued total deterministic information system where O is a non-empty set of perceptual objects, while F is a countable set of probe functions.
9.3
Perceptual Indiscernibility and Tolerance Relations
”The exact idea of closeness or of ’resembling’, or of ’being within tolerance’ is universal enough to appear quite naturally in almost any mathematical setting. It is especially natural in mathematical applications: practical problems, more often than not, deal with approximate input data and only require viable results - results with a tolerable level of error.” (Sossinsky, 1986) In this section indiscernibility and tolerance relations are defined. Indiscernibility relation introduced by Z. Pawlak (Pawlak, 1981) is a key concept in approximation spaces in rough set theory. Indiscernibility and tolerance relations are important and useful in defining measures to compare affinities between pairs of perceptual objects in a perceptual system, for example to compare perceptual images (Peters and Ramanna, 2008). The term tolerance space was introduced by E.C. Zeeman in 1961 in modelling visual perception with tolerances (Zeeman, 1961). A tolerance space is a set X supplied with a binary relation ' (i.e., a subset ' ⊂ X × X) that is reflexive (for all x ∈ X, x ' x) and symmetric (for all x, y ∈ X, x ' y and y ' x) but transitivity of ' is not required. Definition 9.3.1 Perceptual Indiscernibility Relation Let hO, Fi be a perceptual system. For every B ⊆ F the indiscernibility relation ∼B is defined as follows: ∼B = {(x, y) ∈ O × O | ∀φ ∈ B k φ(x) − φ(y) k= 0}. (9.1)
9–4
Rough Fuzzy Image Analysis
If B = {φ} for some φ ∈ F , instead of ∼{φ} we write ∼φ . Definition 9.3.2 Perceptual Weak Indiscernibility Relation Let hO, Fi be a perceptual system. For every B ⊆ F the weak indiscernibility relation 'B is defined as follows: 'B = (x, y) ∈ O × O | ∃φi ∈ B k 4φ k= 0 . If B = {φ} for some φ ∈ F, instead of '{φ} we write 'φ . The set of all perceptual objects in O that are indiscernible to an object x ∈ O is called an equivalence class and is shown as x/∼B . Note that all the elements in an equivalence class are indiscernible to each other. Definition 9.3.3 Tolerance Relation A tolerance relation ζ ⊆ X × X on a set X in general, is a binary relation that is reflexive and symmetric but not necessarily transitive (Sossinsky, 1986). 1. ζ ⊂ X × X, 2. ∀x ∈ X, (x, x) ∈ ζ, 3. ∀x, y ∈ X, (x, y) ∈ ζ ⇒ (y, x) ∈ ζ. The basic idea in a tolerance view of images is to replace the indiscernibility relation in rough sets with a tolerance relation.
(9.2a) original image (Lake)
(9.2b) gray scale image
(9.2c) Covering,p=20
(9.2d) Covering,p=40
(9.2e) Covering,p=60
(9.2f) Covering,p=100
FIGURE 9.2: (See color insert for Figure 9.2a and 9.2c) Coverings with different window sizes
9–5
From Tolerance Near Sets to Perceptual Image Analysis
Definition 9.3.4 Perceptual Tolerance Relation
Let O, F be a perceptual system and let ∈ R (set of all real numbers). For every B ⊆ F the perceptual tolerance relation ∼ =B is defined as follows: ∼ =B, = {(x, y) ∈ O × O : k φB (x) − φB (y) k≤ ε},
(9.2)
T
where φB (x) = [φ1 (x) φ2 (x) ... φl (x)] is a feature-value vector obtained using all the probe functions in B and k . k is L2 norm (Lp norm in general). If B = {φ} for some φ ∈ F, instead of ∼ =B,ε =B instead of ∼ ={φ} we write ∼ =φ . Further, for notational convenience, we will write ∼ with the understanding that ε is inherent to the definition of the tolerance relation. Let A ⊂ ∼ =B y. A is a tolerance class if, and only if =B . A is a preclass if ∀x, y ∈ A, x ∼ ∼ denote a tolerance class containing x. A covering ⊂ A is a maximal preclass. Let x/∼ = B =B of X ⊂ O is the union of the tolerance classes in ∼ =B . Let p in Figure 9.3 denote a window size. Figures 9.2 and 9.3 show examples of an image and its covering with different window sizes.
(9.3a) Original Image
(9.3b) Grayscale Image
(9.3c) Covering, p=20
(9.3d) Covering, p=40
(9.3e) Covering, p=60
(9.3f) Covering, p=100
FIGURE 9.3: Coverings with different window sizes
9.4
Near Sets
It has been shown (Peters and Ramanna, 2008; Peters and Wasilewski, 2009; Peters, 2008a; Henry and Peters, 2007) that near sets which are a generalization of rough sets (Polkowski, 2002) provide a good basis for classification of perceptual objects. Sets of perceptual objects where two or more of the objects have matching descriptions are called near sets (Peters, 2007c). The basic idea in the near set approach to object recognition is to compare object descriptions. Sample perceptual objects x, y ∈ O, x 6= y are near each other if, and only if x and y have similar descriptions.
9–6
Rough Fuzzy Image Analysis
Definition 9.4.1 Nearness Relation Let hO, Fi be perceptual system and let X, Y ⊆ O. A set X is near to a set Y within the perceptual system hO, Fi (X ./F Y ) iff there are A, B ⊆ F and f ∈ F and there are A ∈ O/∼A , B ∈ O/∼B , C ∈ O/∼f such that A, B ⊆ C. If a perceptual system is understood, then we say briefly that a set X is near to a set Y . Definition 9.4.2 Weak Nearness Relation Let hO, Fi be a perceptual system and let X, Y ⊆ O. A set X is weakly near to a set Y within the perceptual system hO, Fi (X./F Y ) iff there are x ∈ X and y ∈ Y and there is B ⊆ F such that x ∼B y, i.e., set X is weakly near to a set Y .
TABLE 9.1
Relation Symbols
Symbol Interpretation
O X F B < φ ∼B 'B ∼ =B,ε x/∼B O/∼B ./B ./B ./ B,ε
Set of perceptual objects, X ⊆ O, set of sample objects, A set of probe functions, B ⊆ F, reals, φ ∈ B, where φi : O −→ <, probe function, {(x, y) | f (x) = f (y) ∀φ ∈ B}, {(x, y) ∈ O × O | ∃φi ∈ B, φi (x) = φi (y)}, Perceptual tolerance relation, x/∼B = {y ∈ X | y ∼B x} (class),
O/∼B = {x/∼B | x ∈ O}, quotient set, nearness relation symbol, weak nearness relation symbol, Weak tolerance nearness relation symbol.
Definition 9.4.3 Weak Tolerance Nearness
Let O, F be a perceptual system and let X,
Y ⊆ O, ∈ R. The set X is perceptually near to the set Y within the perceptual system O, F (X ./F Y ) iff there exists x ∈ X, y ∈ Y and there is a φ ∈ F, ∈ R such that x ∼ =B y. If a perceptual system is understood, then we say shortly that a set X is perceptually near to a set Y in a weak tolerance sense of nearness. Definition 9.4.4 Tolerance Near Sets Let hO, Fi be a perceptual system and let X ⊆ O. A set X is a tolerance near set iff there is Y ⊆ O such that X ./F Y . The family of near sets of a perceptual system hO, Fi is denoted by Near F (O).
9.5
Three Tolerance Near Set-based Nearness Measures for Image Analysis and Comparison
The term tolerance space was introduced by E.C. Zeeman in 1961 in modelling visual perception with tolerances. Tolerance relations are viewed as good models of how one perceives, how one sees (Zeeman, 1961). Tolerance relations are also considered as a basis for studying similarities between objects (Peters, 2008c). The basic idea behind using tolerance classes
9–7
From Tolerance Near Sets to Perceptual Image Analysis
to relax the equivalence relations needed in mathematical world and extend it to indiscernibility relations where almost solutions are possible. In this section, a new similarity measure based on tolerance classes and near sets, for comparison and analysis of images will be introduced. The new measure called tolerance cardinality distribution nearness measure (TCD) is based on statistical distribution of the size (cardinality) of the tolerance classes in each image. The idea behind it is that if images are similar to each other, they should have corresponding tolerance classes with more or less the same size and hence similar size distribution functions. The histogram comparison approach in definition of TCD nearness measure was inspired by TOD nearness measure as proposed in (Meghdadi, Peters, and Ramanna, 2009). However, TCD is fundamentally different with TOD. While TCD calculates and compares distribution of the size of tolerance classes, TOD just measures the overlap between tolerance classes and the size of tolerance classes have no direct impact on TOD.
9.5.1
Tolerance Cardinality Distribution Nearness Measure (TCD)
A similarity measure is proposed here based on statistical distribution of the size (cardinality) of the tolerance classes in each image. The size of each tolerance class is defined as the number of perceptual objects (subimages) in that tolerance class. Definition of TCD is based on the basic idea that if images are similar to each other, they should have corresponding tolerance classes with ”almost” the same size and hence similar size distribution functions. ∼ Let x/∼ =B,ε , y/∼ =B,ε denote tolerance classes in the coverings of X, Y defined by =B,ε . Next, let c(x/∼ =B,ε =B,ε ) represent normalized cardinality of the tolerance classes x/∼ =B,ε ) and c(y/∼ and y/∼ =B,ε , respectively, i.e., cardinalities computed in (9.3) and (9.4). x/∼ =B,ε , (9.3) c(x/∼ ) = =B,ε |X| y / ∼ =B,ε c(y/∼ . (9.4) =B,ε ) = |Y | Suppose that {b1 , b2 , ..., bNb } is a set of discrete bins for calculation of the histograms of c(x/∼ =B, ) and c(y/∼ =B,ε ) where x ∈ X and y ∈ Y . The histogram (or empirical distribution function) of c(x/∼ =B,ε ) at bin value bj is shown as HcX (bj ) and defined as the number of tolerance classes with the number of subimages (size) that belongs to j th bin. The cumulative distribution function is then defined as follows: CHcX (bj ) =
i=j X
HcX (bi ).
(9.5)
i=1
CHcY (bj ) is similarly defined for image Y . The Tolerance Cardinality Distribution (TCD) nearness measure is defined by taking the sum of differences between cumulative histograms as defined in equation 9.6. j=N Xb (9.6) |CHcX (bj ) − CHcY (bj )| . T CD = 1 − j=1
The histogram comparison approach in definition of TCD nearness measure was inspired by the TOD nearness measure as proposed in (Meghdadi et al., 2009). However, TCD is
9–8
Rough Fuzzy Image Analysis
fundamentally different from TOD. While TCD calculates and compares distribution of the size of tolerance classes, TOD only measures the overlap between tolerance classes and the sizes of tolerance classes in an image covering defined by a tolerance relation has no direct impact on TOD. When the number of features (probe functions) used in a system increases, the overlaps between tolerance classes becomes less and less. In this case, TOD will not work well, since it is based on the distribution of overlaps between tolerance classes. TCD, on the other hand, does not depend on overlaps between tolerance classes and is not hampered by large number of features (probe functions).
9.5.2
Tolerance Overlap Distribution nearness measure (TOD)
Tolerance overlap distribution nearness measure(TOD), introduced by Meghdadi and Peters in (Meghdadi et al., 2009), is based on statistical comparison between tolerance classes at each subimage. TOD between two images (Meghdadi et al., 2009) is defined as follows: Suppose X, Y ∈ O are two images (sets of perceptual objects), and X/∼ =B, and Y/∼ =B, are the sets of all tolerance classes for image X and Y as defined in formulas (1.5) and (1.6). The set of all overlapping tolerance classes corresponding to each object (subimage) x is (x) and is defined as follows: named as ΩX/∼ =B,ε (x) = {z/∼ ΩX/∼ =B,ε }. =B,ε | x ∈ z/∼ =B,ε ∈ X/∼ =B,ε
(9.7)
Consequently, the normalized number of tolerance classes in X/∼ =B,ε which are overlapping at x is named as ω and defined as follows: (x) ΩX/∼ =B,ε . ωX/∼ (x) = (9.8) =B,ε X/ ∼ = B,ε
Similarly, the set of all overlapping tolerance classes at every subimage y ∈ Y is denoted by ωY/∼ (y). Assuming that the set of probe functions B and the value of are known, we =B,ε
use the more simplified notation of ΩX (x) and ωX (x) for the set X/∼ =B,ε and the notations } be discrete bins in calculation ΩY (y) and ωY (y) for the set Y/∼ . Let {b , b , ..., b 1 2 Nb =B,ε of histograms of ωX (x) and ωY (y) where x ∈ X and y ∈ Y . Therefore, the empirical distribution function (histogram) of ωX (x) at bin value bj is shown here as HωX (bj ) and defined as the number of subimages x with a value of ωX (x) that belongs to j th bin. The cumulative distribution function is then defined as follows: CHωX (bj ) =
i=j X
HωX (bi ).
(9.9)
i=1
CHωY (bj ) is similarly defined for image Y . The Tolerance Overlap Distribution (TOD) nearness measure is defined by taking the sum of differences between cumulative histograms as defined in equation 9.10 where γ is a scaling factor. γ j=N Xb (9.10) |CHωX (bj ) − CHωY (bj )| . T OD = 1 − j=1
From Tolerance Near Sets to Perceptual Image Analysis
9.5.3
9–9
Tolerance Nearness Measure (tNM)
Tolerance nearness measure (tNM), introduced in (Henry, 2009) is based on the idea that if two images are similar, tolerance classes in the union set of those images will have similar number of subimages from each image. tNM between two images (Henry, 2009; Henry and Peters, 2009a) is based on definition 9.4.2 and is defined as follows: Suppose X and Y be sets of perceptual object(for example two images), and Z = X ∪ Y be the union of X and Y . Let [z/∼ =B,ε ]⊆X and [z/∼ =B,ε ]⊆Y denote the portion of the elementary set z/∼ =B,ε that belongs to X and belong to Y respectively. Then, [z/∼ =B,ε ]⊆X , {x ∈ z/∼ =B,ε | x ∈ X} ⊆ z/∼ =B,ε ,
(9.11)
[z/∼ =B,ε ]⊆Y , {y ∈ z/∼ =B,ε | y ∈ Y } ⊆ z/∼ =B,ε ,
(9.12)
z/ ∼ =B,ε ]⊆Y . =B,ε ]⊆X ∩ [z/∼ =B,ε = [z/∼
(9.13)
Tolerance nearness measure (tNM) is defined as the weighted average of the closeness between the cardinality (size) of set [z/∼ =B,ε ]⊆Y where the =B,ε ]⊆X and the cardinality of [z/∼ is used as the weighting factor. cardinality of z/∼ =B,ε tN M = X z/∼ =B,ε
9.6
1 |z/∼ =B,ε |
×
X min( |[z/∼ =B,ε ]⊆X | , |[z/∼ =B,ε ]⊆Y | ) × |z/∼ =B,ε |. max( |[z/∼ =B,ε ]⊆Y | ) =B,ε ]⊆X | , |[z/∼ z ∼
(9.14)
/=B,ε
Perceptual Image Analysis System
This section explains some implementation details of the perceptual image processing system discussed in the previous sections. The system has been implemented using the latest version of C++ .Net(visual studio 2008), combining both managed and non-managed C++ coding techniques. The object oriented structure of the program makes it easier to make any further changes and improvements to the program; and the choice of using C++ language rather than Matlab makes the program much faster. Figure 9.4 shows a view of the system. The GUI has different parts that are designed for different purposes. Image panel 1 and image panel 2 are two of the most important panels, where you can see two images that will be compared and processed. Image panel 1 shows the first image and image panel 2 includes the second image. The images can be selected by clicking on the ’select image’ buttons. Gray scale of each image will be shown in the second image box of panels. Third image box in each panel shows the covering (all tolerance classes) for the selected image. This can be seen after clicking the ’Calculate Tolerance Classes’ button by user. In the case that processing of more than two images is needed and for the sake of simplicity, user can choose a folder on their computer and give its path to the system using the dialog box and browse button at the top of ’Image data base’ panel. The system will perform the comparison on each pair of images on the specified folder. Another advantage of this system, from the user friendly point of view, is the option to have up to 12 different images loaded to the system at the same time and being able to choose which ones are the ones being compared and processed at the moment. The image
9–10
Rough Fuzzy Image Analysis
FIGURE 9.4: GUI of the ’perceptual images’ system
data base panel at the left side offers these user friendly options to the users. Feature selection is also a part of this panel. Figure 9.5 shows an example of two different images and their coverings. The other two panels at the bottom are responsible to choose the analysis methods. In the analysis panel user can chose between different image comparison methods such as tNM, TOD, and TCD. The results will be showed in the results panel. Figure 9.6 shows another example of the images created by the GUI. First, the original images will be loaded to the GUI. Then, by choosing gray scale as a feature, the image will be converted to gray scales, and then the tolerance classes (covering) will be produced. User can choose the windowing size and epsilon as well as which measures to be calculated. Tables 9.2, 9.3, and 9.4 show the results of comparison of different images in Figures 9.1, 9.2, 9.5, and 9.6 with three different measures. The set of images include 8 different images which are numbered from 1 to 8 for simplicity. Each column of the tables show the results of pairwise comparison of one specific image to the rest of images. Table 9.2 demonstrates the results of TOD nearness measure, while Table 9.3 presents the results for tNM nearness measure. Finally, the results of TCD nearness measure is included in Table 9.4. Figures 9.7, 9.8 and 9.9 show three example plots for the comparison of images using three nearness measures TOD, tNM, and TCD. In each plot x-axis shows images using their numbers, and y-axis shows the value of different measures. Figure 9.7 shows the results of comparison of image 1 (Lena) to all the other images. As it can be seen on the plot, on number 3 for example, all the measures indicate that images 1 and 3 are very different, or the similarity of image 1 with image2 (number 2 of x-axis) is very high, which is very similar to what is expected. On the other hand, for images 5 and 6, the results of tNM is significantly different from two other measures TOD and TCD. tNM shows that images 5 and 6 are quite different from image 1, which is closer to the expected results. Hence, tNM shows a more accurate result in this case. Figure 9.8 shows another example that tNM works better than TOD and TCD. However, since TOD and TCD’s results depend on
9–11
From Tolerance Near Sets to Perceptual Image Analysis
many factors such as γ. One can not make the general conclusion that tNM is always better than TOD and TCD. The plots in Figure 9.9 for example, show the results of comparison of image 8 with the rest of images, where all three measures are consistent with each other.
(9.5a) original image
(9.5b) Covering,p=20
(9.5c) original image
(9.5d) Covering,p=20
FIGURE 9.5: GUI examples
TABLE 9.2
Image comparison results using TOD measure
Images 1-Barbara 2-Lena 3-Leaves 4-Walnut 5-Leaf 1 6-leaf 2 7-lake 8-water 1 2 3 4 5 6 7 8
1.00 0.78 0.53 0.74 0.67 0.68 0.72 0.84
TABLE 9.3
0.78 1.00 0.60 0.86 0.78 0.80 0.82 0.76
0.53 0.60 1.00 0.64 0.65 0.65 0.66 0.51
0.74 0.86 0.64 1.00 0.80 0.83 0.89 0.71
0.67 0.78 0.65 0.80 1.00 0.91 0.82 0.67
0.68 0.80 0.65 0.83 0.91 1.00 0.85 0.68
0.72 0.82 0.66 0.89 0.82 0.85 1.00 0.68
0.84 0.76 0.51 0.71 0.67 0.68 0.68 1.00
Image comparison results using tNM measure
Images 1-Barbara 2-Lena 3-Leaves 4-Walnut 5-Leaf 1 6-leaf 2 7-lake 8-water 1 2 3 4 5 6 7 8
9.7
1.00 0.77 0.45 0.70 0.26 0.25 0.56 0.64
0.77 1.00 0.51 0.61 0.21 0.21 0.61 0.62
0.45 0.51 1.00 0.54 0.11 0.12 0.76 0.33
0.70 0.61 0.54 1.00 0.15 0.16 0.70 0.47
0.26 0.21 0.11 0.15 1.00 0.90 0.14 0.47
0.25 0.21 0.12 0.16 0.90 1.00 0.14 0.45
0.56 0.61 0.76 0.70 0.14 0.14 1.00 0.41
0.64 0.62 0.33 0.47 0.47 0.45 0.41 1.00
Conclusion
A new similarity measure for perceptual comparison and analysis of images has been introduced in this chapter. The proposed similarity measure called tolerance cardinality
9–12
Rough Fuzzy Image Analysis
(9.6a) (Leaf1)
original
image
(9.6b) (Leaf2)
(9.6d) covering,p=60
original
image
(9.6e) covering, p=60
(9.6g) original image (Leaves)
(9.6c) original image (Water)
(9.6f) covering, p=60
(9.6h) covering, p=60
FIGURE 9.6: GUI example TABLE 9.4
Image comparison results using TCD measure
Images 1-Barbara 2-Lena 3-Leaves 4-Walnut 5-Leaf 1 6-leaf 2 7-lake 8-water 1 2 3 4 5 6 7 8
1.00 0.92 0.71 0.89 0.86 0.81 0.87 0.97
0.92 1.00 0.78 0.96 0.94 0.89 0.94 0.90
0.71 0.78 1.00 0.82 0.82 0.83 0.83 0.69
0.89 0.96 0.82 1.00 0.95 0.89 0.98 0.87
0.86 0.94 0.82 0.95 1.00 0.94 0.95 0.84
0.81 0.89 0.83 0.89 0.94 1.00 0.89 0.80
0.87 0.94 0.83 0.98 0.95 0.89 1.00 0.85
0.97 0.90 0.69 0.87 0.84 0.80 0.85 1.00
distribution nearness measure (TCD) is based on the difference between the statistical distribution of the size (cardinality) of the tolerance classes in each image. The size of each tolerance class is defined as the number of perceptual objects (subimages) in that tolerance class. TCD is fundamentally different from tNM proposed in (Henry, 2009) and TOD proposed in (Meghdadi et al., 2009). While TCD calculates and compares distribution of the size of tolerance classes, TOD only measures the overlap between tolerance classes and the size of tolerance classes have no direct impact on TOD. By contrast, tNM compares the sizes of tolerance classes in the coverings of pairs of images. The results of comparing different images with these three nearness measures are shown in this chapter. The results show that all three measures work well. However, there are
From Tolerance Near Sets to Perceptual Image Analysis
9–13
FIGURE 9.7: Plots showing comparison of image 1 versus other images with different measure
FIGURE 9.8: Plots showing comparison of image 5 versus other images with different measure
9–14
Rough Fuzzy Image Analysis
FIGURE 9.9: Plots showing comparison of image 8 versus other images with different measure
some cases where tNM gives a more accurate nearness measurement than TOD and TCD. This can be explained by the fact that the measurements computed with TOD and TCD depend on many factors such as γ and the values of their results vary based on different parameter adjustments. The preliminary results for the TCD similarity measure are shown in this paper, and are compared with the measurements computed with the tNM and TOD nearness measures. However, further studies are needed to have a better understanding of TCD and TOD, especially when the number of features (probe functions) used in the system increases. In that case, the overlaps between tolerance classes becomes less, and TOD will not work quite as well, since it is based on the distribution of overlaps between tolerance classes. TCD, on the other hand, does not depend on overlaps between tolerance classes and will not be undermined by large number of features (probe functions).
From Tolerance Near Sets to Perceptual Image Analysis
Bibliography Henry, C., and J. F. Peters. 2007. Image pattern recognition using approximation spaces and near sets. In Proc. of the eleventh international conference on rough sets, fuzzy sets, data mining and granular computer (rsfdgrc 2007), joint rough set symposium (jrs07), lecture notes in artificial intelligence, vol. 4482, 475–482. ———. 2009a. Perception based image classification. Tech. Rep., Computational Intelligence Laboratory, University of Manitoba. UM CI Laboratory Technical Report No. TR-2009-016. Henry, C., and J.F. Peters. 2009b. Near sets. Wikipedia. Henry, Christopher. 2009. Perceptual image analysis. International Journal of BioInspired Computation 2(2):to appear. Meghdadi, Amir H. 2009. A perceptual system approach to image analysis: Tolerance classes and lattices. Tech. Rep., Computational Intelligence Laboratory, University of Manitoba, E1-526. Meghdadi, Amir H., James F. Peters, and Sheela Ramanna. 2009. Tolerance classes in measuring image resemblance. In Intelligent analysis of images & videos, kes 2009, part ii, lnai 5712, springer, 127–134. Murray, J. A., H. Bradley, W. Craigie, and C. Onions. 1933. The oxford english dictionary. London: Oxford University Press. Orlowska, E. 1982. Semantics of vague concepts. applications of rough sets. Tech. Rep. 469, Institute for Computer Science, Polish Academy of Sciences. Pawlak, Z. 1981. Classification of objects by means of attributes. Tech. Rep. PAS 429, Institute for Computer Science, Polish Academy of Sciences. Pawlak, Z., and A. Skowron. 2007. Rudiments of rough sets. Information Sciences 177:3–27. Peters, J. F. 2007a. Classification of objects by means of features. In Proceedings of the ieee symposium series on foundations of computational intelligence (ieee scci 2007), 1–8. Honolulu, Hawaii. ———. 2007b. Near sets. General theory about nearness of objects. Applied Mathematical Sciences 1(53):2609–2029. ———. 2007c. Near Sets. Special Theory about Nearness of Objects. Fundamenta Informaticae 75(1-4):407–433. ———. 2008a. Discovery of perceptually near information granules. In Novel developements in granular computing: Applications of advanced human reasoning and soft computation, ed. J. T. Yao, to appear. Hersey, N.Y., USA: Information Science Reference. Peters, J. F., and P. Wasilewski. 2009. Foundations of near sets. Information Sciences. An International Journal 179:3091–3109. Digital object identifier: doi:10.1016/j.ins.2009.04.018.
9–15
9–16
Rough Fuzzy Image Analysis
Peters, James F. 2008b. Classification of perceptual objects by means of features. Int. J. of Info. Technology & Intell. Computing 3(2):1–35. ———. 2008c. Notes on set-theoratic relations. Tech. Rep., Computational Intelligence Laboratory Seminar, University of Manitoba, E1-526. ———. 2008d. Notes on tolerance relations. Tech. Rep., Computational Intelligence Laboratory Seminar, University of Manitoba, E1-526. Peters, James F., and Sheela Ramanna. 2008. Affinities between perceptual granules: Foundations and perspectives. In Human-centric information processing through granular modelling, ed. A. Bargiela and W. Pedrycz, 409–436. Springer-Verlag. Polkowski, L. 2002. Rough sets. mathematical foundations. Springer, Heidelberg. Sossinsky, A. 1986. Tolerance space theory and some applications. Acta Applicandae Mathematicae: An International Survey Journal on Applying Mathematics and Mathematical Applications 5, no. 2:137167. Zeeman, E.C. 1961. The topology of the brain and the visual perception. New Jersey: Prentice Hall. In K.M. Fort, Ed., Topology of 3-manifolds and Selected Topics, 240-256.
10 Image Segmentation: A Rough-set Theoretic Approach 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1 10.2 Rough-set theory and properties . . . . . . . . . . . . . . . . . . . . 10–2 10.3 The Concept of Histon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–3 Construction of Histon
Milind M. Mushrif Yeshwantrao Chavan College of Engineering, Nagpur
Ajoy K. Ray Indian Institute of Technology, Kharagpur
10.1
•
Roughness measure
10.4 Segmentation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–6 Selection of peak and Threshold values Merging
•
Region
10.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–7 10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–10 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–14
Introduction
Image segmentation is a critical, yet essential task in many applications. Segmentation subdivides an image into a set of homogeneous and meaningful regions, such that the pixels in each partitioned region possess identical set of properties or attributes and the union of any two adjacent regions is non-homogeneous (Gonzalez and Woods, 2002). The set of properties may include gray levels, color, contrast, spectral values, or textural properties. The color image segmentation has gained paramount importance in recent times largely due to the availability inexpensive digital cameras, increasing computational power of the computers and decreasing cost of computation. The applications of color image segmentation include medical image diagnostics, video object segmentation, object based video compression, object detection from remotely sensed images, and many more. The number of different approaches exist in the literature for the color image segmentation. They can be broadly classified into histogram based, edge based, region based, clustering, graph theoretic, rule-based or knowledge driven and combination of these techniques (Aghbari and Al-Haj, 2006; Cheng and Li, 2003; Acharya and Ray, 2005). Though there are a large number of segmentation algorithms available, there is no algorithm that can be considered to be good for all the images (Pal and Pal, 1993). One of the most widely used technique for image segmentation is the histogram based thresholding, which assumes that homogeneous objects in the image manifest themselves as clusters. These methods do not need any a-priori information of the image (Liu and Yang, 1994), but they do not take into account the spatial correlation of the same or similar valued elements. However, The real-world images usually have strong correlation among the neighboring pixels. The fuzzy 10–1
10–2
Rough Fuzzy Image Analysis
homogeneity approach (Cheng, Chen, Chiu, and Xu, 1998) and the histon based approach (Mohabey and Ray, 2000) exploit this correlation to improve the quality of segmentation. The concept of histon, introduced by Mohabey and Ray (Mohabey and Ray, 2000), is an encrustation of the histogram that visualizes the multi-dimensional color information in an integrated fashion. The concept has found applicability towards boundary region analysis problems. The histon encapsulates the fundamentals of color image segmentation in a rough-set theoretic sense and provides a direct means of segregating pool of inhomogeneous regions into its components. In this chapter, we present a new technique for color image segmentation using a roughset theoretic approach. The roughness index, obtained by correlating histon with the upper approximation and the histogram to the lower approximation of a rough set, has been used as a basis for segmentation (Mushrif and Ray, 2008). In the next section we present the basic concepts of the rough set theory and some important properties of rough sets. In section 1.3, we describe the concept of histon and calculation of roughness measure. Section 1.4 describes segmentation algorithm and experimental results are given in section 1.5, followed by concluding remarks in section 1.6.
10.2
Rough-set theory and properties
Rough set theory, introduced by Z. Pawlak (Pawlak, 1991), represents a new mathematical approach to vagueness and uncertainty. The theory is especially useful in discovery of patterns in data in real life applications such as medical diagnosis (Tanaka, Ishibuchi, and Shigenaga, 1992), pharmacology, industry (Szladow and Ziarko, 1992), image analysis (Pal, Shankar, and Mitra, 2005) and others. Rough set theory provides a possibilistic approach towards classification and extraction of knowledge from a data set. It supports granularity in knowledge and concerns with understanding knowledge, finding means of representation of knowledge and automation of the process of extraction of information from knowledge bases. Rough set theory addresses the issue of indiscernibility and is a formal framework for the automated transformation of of data into knowledge. The knowledge is primarily defined by the ability of the system to classify data or objects. Thus, it is necessarily connected with the variety of classification patterns related to specific parts of the real or abstract world, called universe of discourse. In this section, we introduce some preliminary concepts of rough-set theory that are relevant to this chapter. Given a finite set U 6= ∅ (the universe) of objects, any subset X ⊆ U of the universe is called a concept or a category in U and any family of concepts in U is referred to as abstract knowledge about U. Categories lead to the classification or partition of a certain universe U, i.e. in families C =S{X1 , X2 , . . . , Xn } such that Xi ⊆ U, Xi 6= ∅, Xi ∩ Xj = ∅ for i 6= j, i, j = 1, 2, . . . , n and Xi = U . A knowledge base is a relational system K = (U, R), where U 6= ∅ and R is a family of equivalence relations over U. If P ⊆ R and P 6= ∅, then ∩P is also an equivalence relation, and is denoted by IN D (P), and is known as indiscernibility relation over P. Moreover [x]IN D(P) =
\
[x]R
(10.1)
R∈P
Thus, U/IN D (P ) or simply U/P denotes the knowledge associated with the family of equivalence relations P, called P-basic knowledge about U in the knowledge base K. The equivalence classes of IN D (P) are called basic categories of knowledge P. The P-basic
Image Segmentation: A Rough-set Theoretic Approach
10–3
categories are the the fundamental building blocks of the knowledge and the family of all categories in the knowledge base K = (U, R) are known as K -categories. In the universe of discourse U, where X ⊆ U and R is an equivalence relation, X is said to be R-definable or R-exact if X is union of some R-basic categories; otherwise X is R-undefinable or R-inexact or R-rough. The R-exact sets are those sets of the universe U which can be exactly defined in the knowledge base K and R-rough sets are those subsets which cannot be defined in this knowledge base. However, R-rough sets can be defined approximately by employing the two exact sets, referred to as a lower and an upper approximation. The lower and upper approximations can be defined as follows: [ {Y ∈ U | IN D (R) : Y ⊆ X} (10.2) RX = RX =
[
{Y ∈ U | IN D (R) : Y ∩ X 6= ∅}
(10.3)
The set RX, also known as R-lower approximation of X, is the set of all elements of U which can be classified as elements of X with certainty in the knowledge R. The set RX, also known as R-upper approximation of X, is the set of elements of U which can possibly be classified as the elements of X, employing knowledge R. Obviously, the difference set yields the set of elements which lie around the boundary. The set BNR (X) = RX − RX is called the R-boundary of X or R-borderline region of X. This is the set of elements, which cannot be classified to X or to −X using the knowledge R. The borderline region actually represents the inexactness of the set X with respect to the knowledge R. The greater the borderline region of the set more is the inexactness. This idea can be expressed more precisely by the accuracy measure defined as: αR(X) =
|RX| f or X 6= ∅ |RX|
(10.4)
where | . | is the cardinality operator. The accuracy measure captures the degree of completeness of the knowledge about the set X. Here, we can also define a measure to express the degree of inexactness of the set X, called roughness measure or roughness index of X or R-roughness of X, given by ρR (X) = 1 − αR(X)
(10.5)
Obviously 0 ≤ ρR (X) ≤ 1, for every R and X ⊆ U . If ρR (X) = 0, the borderline region of X is empty and the set X is R-definable i.e. X is crisp or precise with respect to the knowledge R, and otherwise, the set X has some non-empty R-borderline region and therefore is R-undefinable i.e. X is rough or vague with respect to the knowledge R. Thus any rough set has a non-empty boundary region (Pawlak, 2004), as depicted in Figure 10.1.
10.3
The Concept of Histon
The histon (Mohabey and Ray, 2000) presents a new means for visualization of color information for the evaluation of similar color regions in an image. It also presents a method for the segregation of the elements at the boundary, which can be applied in the process of image segmentation. Histon is basically a contour plotted on the top of existing histograms of the primary color components red, green, and blue in such a manner that the collection of all points falling under the similar color sphere of the predefined radius, called expanse, belong to one
10–4
Rough Fuzzy Image Analysis
FIGURE 10.1: Rough approximation set
single value. The similar color sphere is the region in RGB color space such that all the colors falling in that region can be classified as one color. The concept of histon aids in visualization of the spatial correlation of similar color valued segments in the image. For every intensity value on the base histogram, the number of pixels encapsulated in the similar color sphere is evaluated and the count is then added to the value of the histogram at that particular intensity value. This computation is carried out for all the intensity values that lead to the formation of histon.
10.3.1
Construction of Histon
Consider a color image I, of size M × N , consisting of three primary components, red R, green G, and blue B. The histogram of the image for each of the R, G, and B components can be computed as follows: hi (g) =
N M X X
δ (I (m, n, i) − g)
m=1 n=1
for 0 ≤ g ≤ L − 1 and i = {R, G, B}
(10.6)
where δ (.) is the Dirac impulse function and L is the total number of intensity levels in each of the color components. The value of each bin is the number of image pixels having intensity g. Consider a P × Q neighborhood around a pixel I (m, n) and let I (p, q) be a pixel in the neighborhood. The Euclidean distance between I (m, n) and I (p, q) is given by d (I (m, n) , I (p, q)) =
s X
(I (m, n, i) − I (p, q, i))
2
(10.7)
i∈R,G,B
and the total distance of all the pixels in the neighborhood and the pixel I (m, n) is then given by XX dT (m, n) = d (I (m, n) , I (p, q)) (10.8) p∈P q∈Q
Image Segmentation: A Rough-set Theoretic Approach
10–5
The pixels in the neighborhood fall in the sphere of the similar color if the distance dT (m, n) is less than expanse. We define a matrix X of the size M × N such that an element X (m, n) is given by ( 1, dT (m, n) < expanse X (m, n) = (10.9) 0, otherwise The mathematical formulation of histon can now be given as (Mushrif and Ray, 2008) hi (g) =
M X N X
(1 + X (m, n)) δ (I (m, n, i) − g)
m=1 n=1
for 0 ≤ g ≤ L − 1 and i = {R, G, B}
10.3.2
(10.10)
Roughness measure
The histogram and the histon can be correlated with the concept of approximation space in the rough set theory. The histogram value of the g th intensity is the set of pixels which definitely belong to the class of intensity g and therefore, can be considered as the lower approximation and the histon value of the g th intensity represents all the pixels which possibly belong to the same segment or a class of similar color value and therefore, may be considered as the upper approximation. The roughness index of the set at each intensity may be defined as |hi (g)| |Hi (g)| for 0 ≤ g ≤ L − 1 and i = {R, G, B}
ρi (g) = 1 −
(10.11)
The roughness measure is always in the range [0, 1]. The object regions in the image are more or less homogeneous so that there is a very little variation in the pixel intensities in this region. The number of pixels belonging to the similar color sphere is large and therefore the roughness is more in the homogeneous object region. Whereas, near the object boundaries the homogeneity is less and variation in pixel intensities more. The number of pixels in the similar color sphere is small resulting into a smaller value of roughness measure. Roughness index, when evaluated at every intensity value, thus develops crests and troughs just as in case of a conventional histogram. It exploits the correlation among the neighboring pixels better than that in case of the conventional histograms as well as the histon and therefore it is better suited for segmentation of natural color images. In a histogram based segmentation scheme, the peaks in the histogram represent the different regions and the valleys represent the boundaries between those regions. In a similar fashion the peaks and valleys of the graph of roughness index versus intensity can also be used to segregate different regions in the image. The roughness index based segmentation scheme has some distinct advantages over the histogram based and histon based segmentation schemes such as: • In conventional histogram as well as in histon, the small but important segments may not be detected due to insignificant peak and valley points. Since the roughness index, defined by eq. (10.11), is a ratio, the peak and valley points are significant even if the segment is small. Due to this property, roughness index based segmentation technique detects such small but significant regions which results in a better segmentation performance.
10–6
Rough Fuzzy Image Analysis
• Since the peaks in the roughness index based method occur exactly at the intensities where the number of similar intensity pixels is large as compared to the number of same intensity pixels, the color of the every segmented region is more close to the color of the corresponding region in the original image.
10.4
Segmentation Method
The segmentation process is divided into three stages, as shown in Figure 10.2. For segmentation we have considered RGB color space. In the first stage, the histogram and the histon of the R, G, and B components of the image are computed. For computing the histon, the selection of two parameters is very important. These two parameters are: the neighborhood and the expanse. Neighborhood is the window that decides the pixels involved in the computation. For example if a 3 × 3 neighborhood is selected then the pixels used in computation of histon will be 33 − 1 = 26. The expanse is the radius of the similar color sphere. After sufficient experimentation we found that selecting a 3 × 3 neighborhood and an expanse value of 100 gives best results. The roughness index is then obtained for R, G, and B components of the image, for every intensity value, using equation (10.11).
FIGURE 10.2: Block diagram of the proposed method
10.4.1
Selection of peak and Threshold values
Like histogram, the roughness index for all the intensities also gives the global distribution of uniform regions in the image and every peak represents a uniform region. In the second stage, we determine the peaks and valleys in the graph of roughness index and apply thresholds to the image. Selection of correct peaks and valley points are very important for achieving good segmentation results. The criterion used for selection of significant peaks is based on distance between two significant peaks and the height of the peak. Experimentally we have found the following two criteria for obtaining the significant peaks: 1. The peak is significant if the height of the peak is greater than 20% of the average value of roughness index for all the pixel intensities. 2. The peak is significant if the distance between two peaks greater than 10.
Image Segmentation: A Rough-set Theoretic Approach
10–7
After the significant peaks are selected, the valleys are obtained by finding the minimum values between every two peaks.
10.4.2
Region Merging
Region merging is the final stage in the process of segmentation. Obtaining clusters on the basis of peaks and valleys usually results in over-segmentation. Many small regions are generated and some of the regions may contain very few pixels. Such small regions must be merged with the closest large regions. The region merging is carried out in two steps. In the first step, we check the size of the regions. The regions with number of pixels less than some predefined threshold are merged with the nearest regions. The process is repeated until the number of pixels in each region is greater than the threshold. Experimentally we found that threshold 0.1% of the total number of pixels in the image is appropriate. In the second step, we check for the color similarity between two regions. Two regions are merged if the the color difference between these two regions is less than the predefined threshold. The process is repeated until the distance between any two regions in the image is greater than the predefined distance. Here also, experimentally we find that distance of 20 is appropriate threshold for region merging.
10.5
Experimental Results
In this section, we present the experimental results of the roughness index based segmentation algorithm. The qualitative and also the quantitative comparisons of the algorithm with the histogram based and the histon based segmentations are given here. Several images obtained from Berkeley database (Martin, Fowlkes, Tal, and Malik, 2001) were used for experimentation. The Figures 10.3 to 10.8 display the visual comparison of the roughness index based algorithm with the histogram based segmentation and the histon based segmentation. In spite of the significant advancement in image segmentation techniques, it is still an illdefined problem as there is no unique ground-truth segmentation of an image against which the output of an algorithm may be compared (Unnikrishnan, Pantofaru, and Hebert, 2005, 2007). The evaluation of the segmentation algorithms thus far has been largely subjective and the effectiveness of the algorithm is judged only on intuition and results are given in the form of few segmented images. To address the problem of quantitative evaluation of segmentation algorithms several measures proposed in the literature include: Global consistency Error (GCE), Local Consistency Error (LCE) (Martin, 2002), and Probabilistic Rand Index (PRI) (Unnikrishnan et al., 2005) and PSNR (Makrogiannis, Economou, and Fotopoulos, 2005). We used PSNR and PRI for evaluation of the segmentation performance of our algorithm. The two measures are briefly presented here. The PSNR measure The PSNR measure between the image I and the segmented image S is given by ! p × 2552 × r × c P SN R (I, S) = 10 log10 Pr Pc Pp (10.12) 2 i=1 j=1 k=1 [I (i, j, k) − S (i, j, k)] where r, c, and p are the number of rows, columns and color components of the image, respectively. The PSNR represents region homogeneity of the final partitioning (Makrogiannis et al., 2005). The higher the value of PSNR the better is the segmentation. The higher the value of PSNR, the better is the segmentation.
10–8
Rough Fuzzy Image Analysis
(10.3a)
(10.3b)
(10.3c)
(10.3d)
FIGURE 10.3: Segmentation comparison. (a)Test image 22013 (b)Histogram based segmentation (c)Histon based segmentation (d)Roughness index based segmentation
The Probabilistic Rand Index (PRI) The Probabilistic Rand Index (PRI) proposed by Unnikrishnan and Hebert (Unnikrishnan et al., 2005) is a generalization of Rand Index (RI) (Rand, 1971), which allows comparison of test segmentation with multiple ground-truth images through soft non-uniform weighting of pixel pairs as a function of the variability in the ground-truth set. The RI counts the fraction of pairs of pixels whose labels are consistent between the computed segmentation and the ground truth. PRI is an extension of RI which averages the result across all manual segmentations of a given image. Consider an image I = {I1 , I2 , . . . , IN } consisting of N pixels. Let the set of manual segmentations (ground-truth images) of the image I be {S1 , S2 , . . . , SK } and let St be the test segmentation which is to be compared with manually labeled set. Let the label of a point Ii be denoted by liSt in segmentation St and by liSk in the manually segmented image
Image Segmentation: A Rough-set Theoretic Approach
10–9
(10.4a)
(10.4b)
(10.4c)
(10.4d)
FIGURE 10.4: Segmentation comparison. (a)Test image 92059 (b)Histogram based segmentation (c)Histon based segmentation (d)Roughness index based segmentation
Sk . The Probabilistic Rand Index (PRI) is defined as: P RI (St , {Sk }) =
1 X [cij pij + (1 − cij ) (1 − pij )]
N 2
(10.13)
i,j i6=j
where cij denotes the event of a pair of pixels i and j having same label in test image St : cij = II liSt = ljSt and pij denotes the ground truth probability that II (li = lj ): P Sk 1 pij = K II li = ljSt . This measure takes values in [0, 1], where 0 means St and {S1 , S2 , . . . , SK } have no similarities and 1 means all segmentations are identical. One can make several observations from the visual and the quantitative comparisons. The colors of the segmented regions look more natural in case of roughness index based thresholding algorithm. In case of the test image 22013 of Figure 10.3, the color of sky
10–10
Rough Fuzzy Image Analysis
(10.5a)
(10.5b)
(10.5c)
(10.5d)
FIGURE 10.5: Segmentation comparison. (a)Test image 97017 (b)Histogram based segmentation (c)Histon based segmentation (d)Roughness index based segmentation
and water regions in the roughness index based segmentation, see Figure 10.3dd, look more natural in comparison of those regions in the histogram based and histon based segmentations of Figures 10.3bb and 10.3cc. In case of test image 92059 of Figure 10.4, the regions like river and boat are well segmented in roughness index based segmentation as compared to histogram based and histon based segmentations. Consider a test image 271031 of Figure 10.8. It can be observed that the Sun is clearly and neatly segmented by the proposed method, see Figure 10.8dd, and not so neatly segmented by the histogram based and the histon based segmentations of Figures 10.8bb and 10.8cc. The quantitative comparison of the these images is given in Table 10.1. It can be observed, from the PSNR and PRI values, that roughness index based segmentation results are better than the other two segmentation methods. The number of clusters after merging is nearly the same as in case of histogram based segmentation. Computational efficiency of the algorithm depends on several factors such as computational complexity histon, obtaining the threshold values and region merging process. The major computational cost is involved in region merging process. The computation time is directly proportional to the ratio of the number of clusters before and after merging. The Table 10.1 also shows computation time of all the algorithms. It can be observed that whenever the ratio of number of clusters before merging and after merging is high the computation time is more. This means that the additional burden involved in computation of histon does not add to the computational cost of the complete segmentation process significantly.
10.6
Summary
Image Segmentation: A Rough-set Theoretic Approach
10–11
(10.6a)
(10.6b)
(10.6c)
(10.6d)
FIGURE 10.6: Segmentation comparison. (a)Test image 140055 (b)Histogram based segmentation (c)Histon based segmentation (d)Roughness index based segmentation
A rough set theoretic thresholding algorithm has been presented here for segmentation of color images. The histon, introduced in section 10.3, presents a new means for visualization of color information in an image by exploiting the correlation among the pixels in all the three color planes. The roughness index, obtained by correlating histon to the upper approximation and the histogram to the lower approximation of rough set, has been employed for obtaining the optimum threshold values to yield better segmentation performance. The visual and the quantitative comparisons demonstrate the effectiveness of the algorithm.
10–12
Rough Fuzzy Image Analysis
(10.7a)
(10.7b)
(10.7c)
(10.7d)
FIGURE 10.7: Segmentation comparison. (a)Test image 172032 (b)Histogram based segmentation (c)Histon based segmentation (d)Roughness index based segmentation
Image Segmentation: A Rough-set Theoretic Approach
10–13
(10.8a)
(10.8b)
(10.8c)
(10.8d)
FIGURE 10.8: (Please see color insert for Figures 10.8a and 10.8d) Segmentation comparison. (a)Test image 271031 (b)Histogram based segmentation (c)Histon based segmentation (d)Roughness index based segmentation TABLE 10.1 Comparison of the results of the proposed approach with the results of the histogram-based thresholding method Image
Thresholding method
Number of clusters CPU time PSNR (sec) (dB) Before After merging merging
PRI
22013
Histogram Histon Roughness measure
101 152 153
15 13 14
2.60 3.95 5.60
21.60 0.7327 22.07 0.7449 24.71 0.7550
92059
Histogram Histon Roughness measure
75 79 57
5 6 6
2.23 2.31 3.70
20.34 0.6677 20.75 0.6652 23.86 0.6717
97017
Histogram Histon Roughness measure
84 85 87
10 9 10
2.29 2.50 3.38
22.84 0.7251 23.19 0.7325 23.30 0.7910
140055
Histogram Histon Roughness measure
189 201 92
14 11 13
4.88 5.14 2.51
22.57 0.7500 23.65 0.7436 24.22 0.7779
172032
Histogram Histon Roughness measure
112 116 116
12 11 12
2.56 2.61 3.01
25.35 0.8502 25.40 0.8502 25.73 0.8718
271031
Histogram Histon Roughness measure
22 22 22
8 8 4
0.936 0.936 1.809
22.17 0.7008 22.87 0.7107 25.90 0.7300
10–14
Rough Fuzzy Image Analysis
Bibliography Acharya, T., and A. K. Ray. 2005. In Image processing principles and applications. Hoboken, New Jersey: Wiley Interscience. Aghbari, Z. A., and R. Al-Haj. 2006. Hill-manipulation: An effective algorithm for color image segmentation. Image and Vision Computing 24(8):894–903. Cheng, H. D., C. H. Chen, H. H. Chiu, and H. Xu. 1998. Fuzzy homogeneity approach to multilevel thresholding. IEEE Trans. Image Processing 7(7):1084–1088. Cheng, H.D., and J. Li. 2003. Fuzzy homogeneity and scale-space approach to color image segmentation. Pattern Recognition 36:1545–1562. Gonzalez, R. C., and R. E. Woods. 2002. In Digital image processing. Delhi, India: Pearson Education. Liu, J., and Y. Yang. 1994. Multiresolution color image segmentation. IEEE Trans. Pattern Anal. and Machine Intell. 16(7):689–700. Makrogiannis, S., G. Economou, and S. Fotopoulos. 2005. A region dissimilarity relation that combines feature-space and spatial information for color image segmentation. IEEE Transactions on Systems, Man, and Cybernetics-part B: cybernetics 35(1):44–53. Martin, D. 2002. An empirical approach to grouping and segmentation. Ph.D. thesis, Univ. of California, Berkeley. Martin, D., C. Fowlkes, D. Tal, and J. Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Eighth int. conf. on computer vision (iccv’01), vol. 2. Mohabey, A., and A. K. Ray. 2000. Fusion of rough set theoretic approximations and fcm for color image segmentation. In Proceedings of the ieee international conference on systems, man and cybernetics 2, 1529–1534. Mushrif, M. M., and A. K. Ray. 2008. Color image segmentation: Rough-set theoretic approach. Pattern Recognition Letters 29:483–493. Pal, N. R., and S. K. Pal. 1993. A review on image segmentation techniques. Pattern Recognition 26(9):1277–1294. Pal, S. K., B. U. Shankar, and P. Mitra. 2005. Granular computing, rough entropy and object extraction. Pattern Recognition Letters 26(16):2509–2517. Pawlak, Z. 1991. In Rough sets: Theoretical aspects of reasoning about data. London, UK: Kluwer Academic Publishers. ———. 2004. Some issues on rough sets. Transactions on Rough Sets I, LNCS 3100: 1–58. Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336):846–850.
Image Segmentation: A Rough-set Theoretic Approach
10–15
Szladow, A. J., and W. P. Ziarko. 1992. Knowledge based process control using rough sets. In Intelligent decision support, handbook of applications and advances of rough sets theory, 49–59. Kluwer Academic Publishers. Tanaka, H., H. Ishibuchi, and T. Shigenaga. 1992. Fuzzy interference system based on rough sets and its application to medical diagnosis. In Intelligent decision support, handbook of applications and advances of rough sets theory, 111–117. Kluwer Academic Publishers. Unnikrishnan, R., C. Pantofaru, and M. Hebert. 2005. A measure for objective evaluation of image segmentation algorithms. In Proceedings of ieee international conference on computer vision and pattern recognition, workshop on empirical evaluation methods in computer vision, 34–41. ———. 2007. Toward objective evaluation of image segmentation algorithms. IEEE Transaction on Pattern Analysis and Machine Intelligence 29(6):929–944.
11 Rough Fuzzy Measures in Image Segmentation and Analysis 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–1 Motivation
•
Chapter Organization
11.2 Clustering Based Image Segmentation . . . . . . . . . . . . . 11–3 Overview of Segmentation Methods
•
Image Clustering
11.3 Image Segmentation Evaluation . . . . . . . . . . . . . . . . . . . . . 11–4 Segmentation Quality Measures Measures
•
Cluster Validity
11.4 RECA Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–6 Rough Set Theory • Generalized Rough Set Theory • Crisp–Crisp Distance RECA • Fuzzy–Crisp Difference RECA • Fuzzy–Fuzzy Threshold RECA • Fuzzy–Fuzzy Difference RECA
11.5 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–12 Images • Populations • Standard Indices - SI Parameters for RECA Measures
•
11.6 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–15
Dariusz Malyszko Bialystok University of Technology
Jaroslaw Stepaniuk Bialystok University of Technology
11.1
The correlation between standard and rough measures • The Best Solutions • Population Indices
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–21 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–24 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–25
Introduction
Image segmentation presents extremely important routine with many applications in image processing (Gonzales and Woods, 2002; Haralick and Shapiro, 1985). The issue of correct image segmentation definition, creation and evaluation still remains unsolved and most often the solution to these problems mainly depends on the predefined goal. High quality and robustness of image segmentation routines is of considerable importance in the practical applications and at the same time contemporary research. Image segmentation makes possible further higher level image processing such as feature extraction, pattern recognition, and classification. Despite the fact that much effort has been put into elaboration of suitable segmentation algorithms the problem is still open in many areas that require particular segmentation characteristics and continuous development of new imagery technologies makes the segmentation quality issue constantly more pressing. Image segmentation presents active research subject in the last decades. High quality image segmentation routines are of great importance in practical applications owing their 11–1
11–2
Rough Fuzzy Image Analysis
significance to predominant impact on further image data analysis. Segmentation presents the low-level image transformation routine concerned with image partitioning into distinct disjoint and homogenous regions. Clustering or data grouping describes distinct key procedure in image processing and segmentation. Rough sets have been employed during image analysis routines, see for example (Borkowski and Peters, 2007). The presented research is based on combining the concept of rough sets and entropy measure in the area of image segmentation by means of Rough Entropy Clustering Algorithm. In the previous work (Malyszko and Stepaniuk, 2008), new algorithmic scheme RECA in the area of rough entropy (Pal, Shankar, and Mitra, 2005) based partitioning routines has been proposed. Proposed entirely novel rough entropy clustering algorithm incorporates the notion of rough entropy into clustering model taking advantage of dealing with some degree of uncertainty in analyzed data. Given predefined number of clusters, with each cluster lower and upper cluster approximations are associated. Image points that are close to the cluster contribute their impact by increasing lower and upper cluster approximation value in case of their proximity only to that cluster or distribute uniformly their impact on some number of upper cluster approximations otherwise. After lower and upper approximation determination for all clusters, their roughness and rough entropy value calculation proceeds. On the base of entropy maximization law, the best segmentation is achieved in case of maximal entropy value. For this purpose, an evolutionary population of separate solutions is maintained with solutions with predefined number of cluster prototypes. For each solution, respective rough entropy measure is calculated and subsequently, new populations are created form parental solutions with high values of this fitness measure. Additionally, an extension of Standard Crisp–Crisp Difference RECA - CCD RECA algorithm into fuzzy domain has been elaborated in the form of Fuzzy RECA algorithms Fuzzy–Crisp FCD RECA, Fuzzy–Fuzzy Threshold FFT RECA and Fuzzy-Fuzzy Difference FFD RECA. In Fuzzy RECA algorithm setting, the impact of each image point on upper cluster approximations of sufficiently close clusters is not constant and depends upon their distance from these clusters. Upper cluster approximations are increased by fuzzy measure for all image points that are sufficiently close to more than one cluster center relative to distance threshold dist or fuzzy threshold f uzz .
11.1.1
Motivation
Present research deals with in-depth analysis of RECA algorithm schemes performance relative to algorithm parameters defining boundary regions on the final image clustering and type of fuzziness. The important factor in RECA algorithm setting is the type of fuzziness that is employed in the algorithm. This experimental research has been concentrated on the different type of fuzziness and their parameters and generated by means of these parameters different rough entropy measures. Calculated entropy measures are subsequently compared with image segmentation quality indices. On this basis some conclusions could be drawn on type of fuzziness parameters that are the best correlated with other quality measure. In the experiments, selected number of algorithm parameters have been chosen. During experimental phase, relevant RECA routines have been employed in CCD RECA, FCD RECA, FFT RECA and FFD RECA setting. Finally, image clustering quality has been assessed by means of β-index, mean square error measure as employed in k -means algorithm and other standard validity indices as Dunn index and Davies-Bouldin index. These standard and rough factors interdependence has been assessed giving answer on the question how rough entropy measure is related to other segmentation quality indexes. Next important factors in RECA algorithm impact assessment are the imagery inherent characteristics. The above mentioned experiments have been carried out on several
Rough Fuzzy Measures in Image Segmentation and Analysis
11–3
selected imagery types giving some general notion about correlation between image data type, rough entropy measure, fuzzy and rough parameters and other distinct segmentation quality measures. Application of the presented research material relates to better insight into rough entropy notion that should improve understanding of this new concept in the context of data analysis and give way to practical incorporation into other data analysis frameworks. Additionally, experimental results describe correlation between selected image quality assessment indices. This type of information is valuable during construction and implementation of the image segmentation and analysis systems.
11.1.2
Chapter Organization
Section 11.2 presents general information concerning segmentation and clustering algorithms. In Section 11.4 methods of image segmentation has been presented together with description of selected validation measures described in Section 11.3 that are used in evaluation experiments. Experimental setup has been presented in Section 11.5 and experimental results in Section 11.6. Finally, conclusions are drawn with summary of the obtained results.
11.2 11.2.1
Clustering Based Image Segmentation Overview of Segmentation Methods
Segmentation operation is an essential and extremely important preprocessing step in the majority of image analysis based routines such as computer vision with practical applications ranging from object extraction and detection, change detection, monitoring and identification tasks. After the preprocessing stage of image handling routines, with for example noise removal, smoothing, and sharpening of image contrast, follows the image segmentation step, and subsequently more specific, high-level analysis is performed such as depicting objects and regions, and final interpretation of the image or scene. In almost all areas, the quality of the segmentation step determines the quality of the final image analysis output. Segmentation process is defined as an operation of image partitioning into some non-overlapped regions such that each region exhibits homogeneous properties and no two adjacent regions are homogeneous by means of intensity, color, texture or other relevant features. Additionally, most often other conditions are simultaneously imposed in the form, for example regions interiors that should be simple and without many small holes, and each segment boundaries should be comparatively simple and should have spatially accurate structure. Image segmentation routines are divided into: histogram-based routines, edge-based routines, region-merge routines, clustering routines and some combination of the above routines. Exhaustive overview of the segmentation methods is available in (Fu and Mui, 1981) and image segmentation evaluation methods in (Zhang, 1996). Edge detection approaches to image segmentation deal with discovering image locations where sharp changes in grey level or color are detected. The main difficulty in this type of algorithms is maintenance of the continuity of detected edges. Segments have always to be enclosed by continuous edges. However, usually disconnected or isolated edges within areas with more details have to be combined by using specialized heuristics. Region growing or merging presents an approach for image segmentation where large continuous regions or segments are detected first and afterwards, small regions are subjected to merging operation by use of a homogeneity criteria. Region growing and merging routines are sequential in nature, dependent upon the order in which regions grow or merge. Furthermore, algorithms based on combining
11–4
Rough Fuzzy Image Analysis
edge-based and region-based techniques are capable to exploit the complementary nature of edge and region information. Additionally, many segmentation techniques make use of particular data analysis approaches such as neural networks, fuzzy computing, evolutionary computing, multiscale resolution techniques and morphological analysis. Some segmentation methods are based on unique frameworks, such as active contour models, active shape models and watersheds. The active shape model (ASM) presents a particular structure for finding the object boundary in images. In this framework, different image features and search strategies are applied that subsequently create a vast range of ASM algorithms. Into this framework-based segmentation approaches falls clustering with rough entropy based segmentation quality measure that is the subject of this chapter.
11.2.2
Image Clustering
Image clustering algorithms are widespread high quality procedures applied in many areas of image processing and image analysis. High robustness of general data clustering schemes have been successfully incorporated in many image segmentation routines. Data clustering depends on partitioning data objects into disjoint groups that internally consist of similar objects and externally objects from different groups are mutually dissimilar as much as possible in the given data set. Data clustering algorithms are most often divided into hierarchical and partitioning routines. In the case of hierarchical routines, clustered data are arranged into groups that create a tree-like hierarchy. Objects from the given group are similar to each other relative to a predefined similarity measure as opposed to objects from different groups. However, most often data groups or clusters are the part of hierarchical level structure that can be considered as data grouping on different granularity or similarity level. On the other hand, data partitioning clustering most often depends on partition of data into predefined number of clusters on the basis of optimization of some objective function that reflects internal data similarity. In the group of data partitioning clustering methods there are many methods that are capable of creating high quality clustering solutions. In this group most prominent are k -means clustering schemes together with numerous modifications and different data analysis frameworks such as fuzzy or rough k -means algorithms. The necessity for improving basic clustering functionality in, for example, k -means algorithms, stems from the fact that this type of algorithms most often determines solutions that are local optima of the objective function. Rough entropy framework proposed recently in (Malyszko and Stepaniuk, 2008) has been aimed at incorporation of uncertainty notion in the clustering process and taking advantage of rough set data analysis methods in the clustering setting.
11.3 11.3.1
Image Segmentation Evaluation Segmentation Quality Measures
In supervised approaches to segmentation and classification routines, definition of segmentation quality by measures such as accuracy and precision is quite straightforward by means of ground-truth ideal images. On the other hand, during clustering routines, this kind of indispensable segmentation quality assessment not always is feasible because ground-truth image is not present or is difficult to obtain. In the present study, the following segmentation quality measures were taken into account as cluster validity indices. Data clustering as data grouping routine (together with other grouping algorithms such as for example data thresholding) presents unsupervised process that finally requires some sort of quality evaluation of generated data groups or clusters. This requirement can be satisfied by using
Rough Fuzzy Measures in Image Segmentation and Analysis
11–5
cluster validity indices (Rutkowski, 2008). In general, three distinctive approaches to cluster validity are possible. The first approach relies on external criteria that investigate the existence of some predefined structure in clustered data set. The second approach makes use of internal criteria and the clustering results are evaluated by quantities describing the data set such as proximity matrix etc. Approaches based on internal and external criteria make use of statistical tests and their disadvantage is high computational cost. The third approach makes use of relative criteria and relies on finding the best clustering scheme that meets certain assumptions and requires predefined input parameters values. Most commonly used indices are Dunn index, Davies-Bouldin index, S Dbw index and Quantization Error.
11.3.2
Cluster Validity Measures
Quantitative Measure: β-index The β-index measures the ratio of the total variation and within-class variation. Define ni as the number of objects in the i-th (i = 1, . . . , k) cluster from segmented image. Define Xij as the value of j-th data object (j = 1, . . . , ni ) in the cluster i and Xi the mean of ni values of the i-th cluster. The β-index is defined in the following way Pk
β-index =
Pni 2 j=1 (Xij − X) i=1 Pk Pni 2 i=1 j=1 (Xij − X i )
where X represents the mean value of all universe objects attributes. This index defines the ratio of the total variation and the within-class variation. In this context, important notice is the fact that β-index value increases as the increase of k number of cluster centers. Davies-Bouldin index The Davies-Bouldin index minimizes the average similarity between each cluster. It is defined as the the ratio of the sum of within-cluster scatter to between-cluster separation. The objective is to minimize this index. The Davies-Boludin index is defined as follows: DB =
k diam(Ci ) + diam(Cj ) 1X ) maxj=1,...,k,i6=j ( k i=1 d(Ci , Cj )
where d(u, w) represents the Euclidean distance between u and w and diam(C) is the diameter of a cluster which can be defined as diam(C) = max d(u, w) u,w∈C
Dunn index The Dunn index is a well known validity index, proposed by (Bezdek and Pal, 1995) that recognizes compact and well separated clusters by means of considering five different measures of distance between clusters and three different measures of cluster diameter. The value of the Dunn index should be maximized and is defined in the following way n o dist(C ,Cj ) minj=i+1,...,k maxa=1,...,iidiam(C D = min ) a i=1,...,k
where dist(Ck , Cp ) represents the dissimilarity function between two clusters Ck and Cp calculated as
11–6
Rough Fuzzy Image Analysis
dist(Ck , Cp ) =
min
u∈Ck ,w∈Cp
d(u, v)
Mean Squared Error A mean squared error value between all data object assigned to the cluster and this cluster center Ci presents objective function minimized during k -means clustering procedure and at the same time in numerous partitioning clustering schemes. The formula is given as follows M SE =
ni k X X
2
(Xij − Xi ) ,
i=0 j=1
where Xi denotes center of the Ci cluster and Xij object with index j from objects assigned to the cluster Ci . Within-Class and Between-Class Variance Within-class variance is defined in the following way wV ar =
ni k X X
2
pi (Xij − X i ) ,
i=1 j=1
and between-cluster variance is defined as follows cV ar =
k X
2
pi (X − X i ) .
i=1
11.4 11.4.1
RECA Algorithms Rough Set Theory
Information granules (Zadeh, 1997; Pedrycz, Skowron, and Kreinovich, 2008; Stepaniuk, 2008) are viewed as linked collections of objects (data points, in particular) drawn together by the criteria of indistinguishability, similarity or functionality. Information granules and the ensuing process of information granulation is a vehicle of abstraction leading to the emergence of high-level concepts. A granule is most often defined as a closely coupled group or clump of objects (for example pixels in image processing setting), in the examined space that are interpreted as an indivisible entity because of its indistinguishable character, similarity, proximity or functionality. The granulation process basically consists in and subsequently results in compression and summarization of information. In the last decades, rough set theory has been attracted growing attention as a robust mathematical framework for granular computing. Rough set theory has been introduced by (Pawlak, 1991) in the 1980s, creating a comprehensive platform for discovering hidden patterns in data with extensive applications in data mining. It has recently emerged as an important mathematical tool for managing uncertainty that arises from granularity in the domain of discourse, i.e., from the indiscernibility between objects in a set. The intention is to approximate a rough (imprecise) concept in the domain of discourse by a pair of exact concepts, called the lower and upper approximations. These exact concepts are determined by an indiscernibility relation on the domain, which, in turn, may be induced by a given set of attributes ascribed to the objects of the domain. The lower approximation is the set of
Rough Fuzzy Measures in Image Segmentation and Analysis
11–7
objects definitely belonging to the vague concept, whereas the upper approximation is the set of objects possibly belonging to the same. An information system is a pair (U, A), where U represents a non-empty finite set called the universe and A a non-empty finite set of attributes. Let B ⊆ A and X ⊆ U . Taking into account these two sets, it is possible to approximate the set X making only the use of the information contained in B by the process of construction of the lower and upper approximations of X. Let IN D(B) ⊆ U × U be defined by (x, y) ∈ IN D(B) if and only if, for all a ∈ B a(x) = a(y) An approximation space ASB = (U, IN D(B)). For X ⊆ U , the sets LOW (ASB , X) = {x ∈ U : [x]B ⊆ X}, and U P P (ASB , X) = {x ∈ U : [x]B ∩ X 6= ∅}. where [x]B denotes the equivalence class of the object x relative to B (the equivalence relation IN D(B)), are called the B-lower and B-upper approximations of X in U. It is possible to express numerically the roughness R(ASB , X) of a set X with respect to B (Pawlak, 1991) by assignment R(ASB , X) = 1 −
card(LOW (ASB , X)) . card(U P P (ASB , X))
In this way, the value of the roughness of the set X equal 0 means that X is crisp with respect to B, and conversely if R(ASB , X) > 0 then X is rough (i.e., X is vague with respect to B). Detailed information on rough set theory is provided in (Pawlak, 1991; Pawlak and Skowron, 2007; Stepaniuk, 2008).
11.4.2
Generalized Rough Set Theory
Let U be a non-empty set of objects called the universal set and P (U ) be the power set of U as described in previous subsection. Binary relation R on U is referred to as reflexive if for all x ∈ U, xRx. Relation R is referred to as symmetric if for all x, y ∈ U, xRy implies yRx. Relation R is referred to as transitive if for all x, y, z ∈ U, if xRy and yRz then xRz. Relation R is an equivalence relation if it is reflexive, symmetric and transitive. Parameterized approximation space In (Skowron and Stepaniuk, 1996; Stepaniuk, 2008) parameterized approximation space has been defined as a system AS#,$ = (U, I# , ν$ ) where • U non empty set of objects • I# : U → P (U ) uncertainty function • ν$ : P (U ) × P (U ) → [0, 1] rough inclusion function and #, $ describe parameter indices ( #, $ may be omitted if the context is clear). In this way, each predefined number of cluster centers defines for each data object different uncertainty function I# : U → P (U ). Detailed description is given in (Stepaniuk, 2008).
11–8
Rough Fuzzy Image Analysis
Now, let R represent an arbitrary relation on U. With respect to R, it is possible to define the left and right neighborhoods of an element x in U in the following way Lef t IR (x) = {y|yRx}, and, Right IR (x) = {y|xRy}. Lef t Right In this convention, if R is symmetric, then IR (x) = IR (x). The left (or right) neighborLef t Right hood IR (x) (or IR (x)) becomes an equivalence class containing x if R is an equivalence lef t Right relation. If R is a tolerance relation, then we obtain IR = IR and the approximation Lef t Right ), v) is reduced to (U, IR , v). For an arbitrary relation R, by replacing space (U, (IR , IR equivalence class with the right neighborhood, the operators U P P and LOW from P (U ) to itself are defined by
Right LOW (ASR , X) = {x|IR (x) ⊆ X}, and Right (x) ∩ X 6= φ}, U P P (ASR , X) = {x|IR
analogously to definitions from the previous subsection, LOW (ASR , X) is called a lower approximation of X and U P P (ASR , X) an upper approximation of X. The pair (LOW (ASR , X), U P P (ASR , X)) is referred to as a rough set based on R. The set LOW (ASR , X) consists of those elements whose right neighborhoods are contained in X, and the set U P P (ASR , X) consists of those elements whose right neighborhoods have a non-empty intersection with X. Clearly, in case of R being an equivalence relation, then these definitions are equivalent to the original rough set definitions. In the context of proposed rough entropy framework, objects from the universe U, are assigned to the clusters represented by clusters centers and with each cluster are defined lower and upper cluster approximations. In contrast to established solutions based on an equivalence relation, a new binary relation R has been introduced that describes, for a given point x, clusters or cluster centers related to x. Object x is in relation R with cluster center center C. Generalized Lower and Upper Approximations The binary relation that has been presented induces generalized lower and upper approximations according to the definitions given in this subsection. The lower generalized approximation contains objects that are uniquely located in the nearest neighborhood (according to predefined criteria) to this cluster center and only to this cluster center. The upper generalized approximation contains objects that are close to this cluster center but additionally may belong to one or more other clusters. Generalized lower and upper approximation for the given cluster will not be the same, but such a situation is possible and represents crisp clustering, with objects assigned uniquely to one cluster. Similarity Measures Similarity measures are based on distance threshold, fuzzy distance threshold and fuzzy threshold measures. Roughness Most often, the roughness value R(ASB , X) is defined numerically in the way described in the previous subsection. In the rough entropy clustering setting, however, this is not the only alternative. During roughness calculation other measures are possible besides a counting
Rough Fuzzy Measures in Image Segmentation and Analysis
11–9
measure. These measures include summing up fuzzy membership values of all objects from the given set, in our case objects of (generalized) lower and upper approximations. Other choice present different types of fuzzy membership functions. Probabilistic and Non-Probabilistic Entropy During calculation of rough entropy value (Pal et al., 2005), many type of entropy are possible to be taken into account, for example Shannon entropy, Renyi entropy, Tsallis entropy. All these entropies are generally probabilistic entropies, meaning that probability of all possible states equals to unity. Fuzzy Rough Entropy = Fuzzy Rough Entropy − 2e · roughness(l) · log(roughness(l)); In the presented solution, this condition has not been considered and roughness vales are not equalized to probability distribution. In case of probabilistic distribution, given n possible states, entropy attains maximal value in situation with all states having equal probability of 1/n. In rough entropy framework, roughness value for each cluster is contained in the interval [0, 1]. In such manner, total rough entropy depends on summing up all partial rough entropies. Total rough entropy attains the maximum in case of all partial entropies equal its maximum. Maximum value for partial (cluster) entropy −R · log(R) This partial cluster entropy reaches the maximum in case of roughness value equal to 1/e. Rough entropy framework searches for optimal clustering solution that boundary region is approximately equal 1/3.
11.4.3
Crisp–Crisp Distance RECA
Standard Crisp - Crisp Distance RECA algorithm as proposed in (Malyszko and Stepaniuk, 2008) incorporates computation of lower and upper approximations for the given cluster centers and considering these two set cardinalities during calculation of roughness and further rough entropy clustering measure. Rough Entropy Clustering Algorithm flow has been presented in Algorithm 1. Rough measure general calculation routine has been given in Algorithm 2. In all presented algorithms, before calculations, lower and upper cluster approximations should be set to zero. For each data point xi , distance to the closest cluster Cl is denoted as dmin dist = d(xi , Cl ) and approximations are increased by value 1 of clusters Cm that satisfy the condition: |d(xi , Cm ) − dmin dist | ≤ dist
11.4.4
Fuzzy–Crisp Difference RECA
In Fuzzy–Crisp Difference RECA setting, for the given point, lower or lower and upper approximation values are incremented not arbitrary by 1, but are increased by this point cluster membership value. In this way, fuzzy concept of belongings to overlapped classes has been incorporated. Taking into account fuzzy membership values during lower and upper approximation calculation, should more precisely handle imprecise information that imagery data consists of. Fuzzy membership value µCl (xi ) ∈ [0, 1] for the data point xi ∈ U in cluster Cl (equivalently Xl ) is given as
11–10
Rough Fuzzy Image Analysis
Algorithm: General RECA Algorithm Flow Data: Input Image Result: Optimal Threshold Value 1. Create X population with Size random N -level solutions (chromosomes) repeat forall chromosomes of X do calculate their rough entropy measure values RECA Fuzzy Rough Entropy Measure end create mating pool Y from parental X population apply selection, cross-over and mutation operators to Y population replace X population with Y population until termination criteria ;
Algorithm: Crisp–Crisp Difference RECA Approximations foreach Data object xi do Determine the closest cluster center center(Ci ) for xi Increment Lower(Ci ) by fuzzy membership value of xi Increment Upper(Ci ) by fuzzy membership value of xi foreach Cluster Ck distanced to D and to center(Ci ) by less than do Increment Upper(Ck ) by 1.0 end for l = 1 to number of data clusters do roughness(l) = 1 - [ Lower(l) / Upper(l)] Rough entropy = 0 for l = 1 to number of data clusters do Rough entropy = Rough entropy − 2e · roughness(l) · log(roughness(l))
d(xi , Cl )−2/(µ−1) µCl (xi ) = Pk −2/(µ−1) j=1 d(xi , Cj )
(11.1)
where a real number µ represents fuzzifier value that should be greater than 1.0 and d(xi , Cl ) denotes distance between data object xi and cluster (center) Cl . First, distances between the analyzed data point and all clusters centers are computed. After distance calculations, data object is assigned to lower and upper approximation of the closest cluster (center) d(xi , Cl ). Additionally, if difference between the distance to other cluster center(s) and the distance d(xi , Cl ) is less than predefined distance threshold dist - this data object is additionally assigned to this cluster approximations. Approximations are increased by fuzzy membership value of the given data object to the cluster center. Fuzzy–Crisp RECA algorithm flow is the same as presented in Algorithm 1 and 2 with the exception of the lower and upper approximation calculation that follows steps presented in Algorithm 3. For each data point xi , distance to the closest cluster Cl is denoted as dmin dist = d(xi , Cl ) and approximations are increased by this data point fuzzy membership value µCm (xi ) to clusters Cm that satisfy the condition:
Rough Fuzzy Measures in Image Segmentation and Analysis
11–11
Algorithm: Fuzzy–Crisp Difference RECA Approximations foreach Data object xi do Determine the closest cluster center center(Ci ) for xi Increment Lower(Ci ) by fuzzy membership value of xi Increment Upper(Ci ) by fuzzy membership value of xi foreach Cluster Ck not further then eps from D do Increment Upper(Ck ) by fuzzy membership value of xi end for l = 1 to number of data clusters do roughness(l) = 1 - [ Lower(l) / Upper(l)] Fuzzy Rough entropy = 0 for l = 1 to number of data clusters do Fuzzy Rough entropy = Fuzzy Rough entropy − 2e · roughness(l) · log(roughness(l))
|d(xi , Cm ) − dmin dist | ≤ dist
11.4.5
Fuzzy–Fuzzy Threshold RECA
In Fuzzy-Fuzzy threshold RECA, the membership values of the analyzed data point to all the clusters centers are computed. Next, the data point is assigned to the lower and upper approximation of the closest cluster and its fuzzy membership value to this cluster center center(C) is remembered. Additionally, if the fuzzy membership value to other cluster center or centers is greater than predefined fuzzy threshold f uzzy - this data object is assigned to this cluster upper approximation or approximations. For each data point xi , the approximations are increased by this data point membership value µCm (xi ) to the clusters Cm that satisfy the condition: µCm (xi ) ≥ f uzz
11.4.6
Fuzzy–Fuzzy Difference RECA
In Fuzzy–Fuzzy RECA setting, the membership values of the analyzed object to all the clusters centers are computed. Afterwards data object assignment is performed to lower and upper approximation of the closest cluster with the distance to the closest cluster Cl remembered as dmin f uzz = d(xi , Cl , ) Additionally, if difference between the fuzzy distance to other clusters and the fuzzy distance to the closest cluster dmin f uzz = d(xi , Cl ) is less than predefined distance threshold f uzz - this data object is assigned to this cluster approximations. Lower and upper approximation calculation follows steps presented in Algorithm 5. For each data point xi , the approximations are increased by this data point fuzzy membership value of the clusters Cm that satisfy the condition: (µCm (xi ) − dmin f uzz ) ≤ f uzz Approximations are increased by fuzzy membership value of the given data object to the cluster center.
11–12
Rough Fuzzy Image Analysis
Algorithm: Fuzzy–Fuzzy Threshold RECA Approximations foreach Data object xi do Determine the closest cluster center center(Ci ) for xi Increment Lower(Ci ) by fuzzy membership value of xi Increment Upper(Ci ) by fuzzy membership value of xi foreach Cluster Ck not further then fuzzy f uzz from D do Increment Upper(Ck ) by fuzzy membership value of xi end for l = 1 to number of data clusters do roughness(l) = 1 - [ Lower(l) / Upper(l)] Fuzzy Rough entropy = 0 for l = 1 to number of data clusters do Fuzzy Rough entropy = Fuzzy Rough entropy − 2e · roughness(l) · log(roughness(l))
Algorithm: Fuzzy–Fuzzy Difference RECA Approximations foreach Data object xi do Determine the closest cluster Cl for xi Increment Lower(Cl ) and Upper(Cl ) µCl (xi ) foreach Cluster Cm 6= Cl with |d(xi , Cm ) − d(xi , Cl )| ≤ f uzz do Increment Upper(Cm ) by µCm (xi ) end for l = 1 to C(number of data clusters) do roughness(l) = 1 - [ Lower(l) / Upper(l)] Fuzzy Rough entropy = 0 for l = 1 to C(number of data clusters) do Fuzzy Rough entropy = Fuzzy Rough entropy −[ 2e ] · [roughness(l) · log(roughness(l))
11.5 11.5.1
Experimental Setup Images
Images types Images taken into experiments consisted from satellite image presented in Figure 11.1 and three nature images presented in Figure 11.2 from Berkeley image database (Martin, Fowlkes, Tal, and Malik, 2001). Low, medium and high dimensional context In the experiments the images have been analyzed in low, medium and high dimensional context. The low dimensional images consisted from 2D dimensional gray-level images of satellite buildup region. The medium dimensional data consisted of satellite buildup region in Red, Green and Blue channels and the three selected Berkeley database images in Red, Green and Blue channels. High dimensional images are 9D images with base R, G and B channels and their max and min windowed versions as shown in Figure 11.2, totally 9 channels for each image.
Rough Fuzzy Measures in Image Segmentation and Analysis
(a)
(b)
FIGURE 11.1
(a)
FIGURE 11.2
(c)
Satellite build-up image in three bands, (a) Red, (b) Green and (c) Blue bands
(b)
(c)
Berkeley dataset images: (a) 27059 image, (b) 78004 image, (c) 86000 image
11–13
11–14
11.5.2
Rough Fuzzy Image Analysis
Populations
In the experiments, four independent algorithm-based populations have been created listed in the subsequent paragraphs. For each population, 300 iterations have been gathered. In case of CCD RECA, FCD RECA, FFT RECA and FFD RECA 5 populations have been created. The selection of RECA algorithms have been motivated by requirement of obtaining solutions with high values of rough fuzzy measures. Population CCD - CCD RECA The algorithm has been run 5 times, every time population size was 30 solutions that were iterated 300 times for each population. Distance threshold has been set to 50.0. The details have been given in Table 11.1. The algorithm flow as described in Algorithm 2. Population FCD - FCD RECA The algorithm algorithm has been run 5 times, every time population size was 30 solutions that were iterated 300 times for each population. Distance threshold dist has been set to 50.0. Fuzzifier - µ values have been set to 2.5. The other parameters have been given in Table 11.1. The algorithm flow as described in Algorithm 3. Population FFT - FFT-RECA The algorithm has been run 5 times, every time population size was 30 solutions that were iterated 300 times for each population. Fuzzy threshold value f uzz has been set to 0.15. The other parameters have been given in Table 11.2. The algorithm procedure has been given in Algorithm 4. Population FFD - FFD-RECA The algorithm has been run 5 times, every time population size was 30 solutions that were iterated 300 times for each population. Fuzzy threshold f uzz has been set to 0.015. The parameters have been given in Table 11.2. The algorithm implemented according to Algorithm 5. In RECA algorithmic setting, as previously mentioned relevant parameters have been introduced as described in Table 11.3. The presented parameters are distance threshold dist , fuzzy threshold f uzz , fuzzifier value µ, entropy α parameter.
11.5.3
Standard Indices - SI
In the experiments, the following standard indices have been considered and calculated for the generated populations: Dunn index, DB index, KMEANS (kM), Otsu index, Turi index, BETA index (β-index), within-class Variance - wVar, between-class Variance - cVar. The description of the above algorithm has been presented in Section 11.3.
11.5.4
Parameters for RECA Measures
Rough entropy parameters that are applied in described in Section 11.4 algorithms depend on the selected algorithm type and give possibility to adjust parameters to the concrete segmentation task that most often will be influenced by imagery type, resolution, dimensional context and other features. In order to make the present research and algorithm evaluation the most informative each rough entropy algorithm has been assigned relevant parameters that varied - distance threshold parameter has been valid only for CCD and FCD RECA algorithms, fuzzy threshold has been valid for FFT and FFD RECA algorithms. The rough entropy parameters have been considered in case of CCD RECA and FCD RECA as described in Table 11.1. Each of these two algorithms has been performed in with five independent sets of parameters. In the paper, these parameters sets are referred to as R1, . . . , R5 set for CCD RECA algorithm and as F 1, . . . , F 5 for FCD RECA algorithm. In case of FFT and FFD RECA algorithms the relevant parameters have been presented in Table 11.2. In the same way as in previous two algorithms, parameters sets for FFT RECA
11–15
Rough Fuzzy Measures in Image Segmentation and Analysis
are described as T 1, . . . , T 5 parameters sets and FFD RECA parameters are denoted as D1, . . . , D5. Additionally, rough entropy values calculated on the base of above-mentioned rough entropy parameters referred to as rough entropy measures determine rough indices RI that values are optimized by evolutionary algorithm in the search for optimal clustering segmentation solutions as described in Algorithm 1. On the other hand, rough indices are compared with standard clustering validity measures giving in-depth notion about internal rough entropy properties. TABLE 11.1 CCD RECA and FCD RECA measures - distance threshold dist , fuzzy threshold f uzz and fuzzifier - µ P R1 R2 R3 R4 R5
11.6 11.6.1
CCD
FCD dist 50 50 50 50 50
P F1 F2 F3 F4 F4
dist 10 20 30 40 50
µ 1.5 2.0 2.5 3.0 3.5
Experimental results The correlation between standard and rough measures
Correlation values have been calculated between standard validation indices SI and rough indices RI independently for all four populations: CCD RECA population, FCD RECA population, FFT threshold RECA and FFD RECA population as described in Subsection 11.5.2. In the comparison, in the first part, the satellite build-up area image has been analyzed, in the second part, the nature image 78004 in low and medium dimensions has been presented, and finally, the results for the 27059 nature image in high dimensional context with 9 bands in low, medium and high resolutions are gathered. Satellite buildup area Satellite buildup area image have been considered in low dimensional setting: bands RG, RB and BG and in the medium dimensional setting: RGB band. The indices correlations for four selected populations are presented in Table 11.4 and Table 11.5. Results for the selected low dimensional setting and medium dimensional context are presented in Table 11.4 in the context of standard validity indices correlation and in Table 11.5 in the rough indices correlation. In the class of standard validity indices, Dunn index has been maximally correlated with Turi index, Davies-Bouldin index has achieved the best correlation with β-index, wVar and cVar indices. The β-index has been most often correlated with wVar and cVar indices. TABLE 11.2 FFT RECA and FFD RECA measures - distance threshold dist , fuzzy threshold f uzz and fuzzifier - µ P T1 T2 T3 T4 T4
FFT f uzz 0.3 0.3 0.3 0.3 0.3
µ 1.5 2.0 2.5 3.0 3.5
P D1 D2 D3 D4 D5
FFD f uzz 0.3 0.03 0.01 0.001 0.0001
µ 2.5 2.5 2.5 2.5 2.5
11–16
Rough Fuzzy Image Analysis TABLE 11.3 Parameters for RECA measures during evolutionary algorithm phase - distance threshold dist , fuzzy threshold f uzz and fuzzifier - µ Name dist f uzzy µ α
CCD RECA 50 1.0
FCD RECA 50 2.5 1.0
FFT RECA 0.15 2.5 1.0
FFD RECA 0.015 2.5 1.0
TABLE 11.4 Standard indices correlations in RECA populations for satellite buildup area image Name
CCD RECA
Dunn DB β-index
Turi -0.45 cVar -0.58 wVar -0.81
Dunn DB β-index
Turi -0.54 Dunn -0.43 wVar - 0.93
Dunn DB β-index
Turi -0.48 Turi 0.48 cVar 0.84
Dunn DB β-index
Turi -0.49 wVar 0.35 kM -0.90
FCD RECA RG Turi -0.65 BETA -0.45 wVar -0.86 RB Turi -0.67 BETA -0.50 wVar -0.90 BG Turi -0.51 wVar 0.52 wVar -0.86 RGB Turi -0.50 wVar 0.40 wVar -0.90
FFT RECA
FFD RECA
Turi -0.45 cVar -0.58 wVar -0.82
Turi -0.47 cVar -0.52 wVar -0.82
Turi -0.50 KMEANS 0.59 wVar - 0.85
Turi -0.55 wVar 0.57 wVar -0.84
Turi -0.51 wVar 0.57 wVar -0.78
Turi -0.46 wVar 0.55 wVar -0.79
Turi -0.49 wVar 0.47 wVar -0.85
DB -0.67 Dunn -0.67 wVar -0.87
TABLE 11.5 Rough indices correlations in RECA populations satellite buildup image Name
CCD RECA
Dunn DB β-index
D3 0.04 R4 -0.51 F4 0.90
Dunn DB β-index
R4 0.24 R4 -0.39 F4 0.83
Dunn DB β-index
R5 0.23 R5 -0.29 F4 0.87
Dunn DB β-index
R5 0.23 R4 -0.33 F4 0.86
FCD RECA RG R3 0.23 R3 -0.43 F4 0.70 RB R5 0.36 R4 -0.37 R5 0.70 BG R5 0.36 R5 -0.36 R3 0.86 RGB R3 0.23 R3 -0.25 R4 0.73
FFT RECA
FFD RECA
D3 0.04 R4 -0.51 F4 0.90
D2 0.05 R4 -0.45 F4 0.85
R3 0.33 R4 -0.49 R5 0.88
R3 0.06 R5 -0.48 F4 0.82
R5 0.42 R4 -0.43 R4 0.90
R5 0.17 R5 -0.45 R3 0.89
D2 0.04 R4 -0.42 R5 0.85
D2 0.03 D2 -0.10 R1 0.88
In the class of rough entropy validity indices, Dunn index has been maximally correlated with CCD and FFD RECA measures, but correlations with FFD are comparatively low. In case of Davies-Bouldin index predominantly high (negative) correlation has been observed for CCD RECA measures. The best correlation for β-index has been achieved with CCD RECA and FCD RECA measures. The comparison shows that the β-index is the best correlated with the rough indices, the DB index is also much correlated, the Dunn index correlation with rough indices is the lowest. Generic BD images - medium dimensions Generic nature image 78004 have been considered in the medium dimensional setting:
11–17
Rough Fuzzy Measures in Image Segmentation and Analysis
RGB bands. Results for the selected medium dimensional context are presented in Table 11.6 in the context of standard validity indices correlation and in Table 11.7 in the rough indices correlation. TABLE 11.6 Standard indices correlations in RECA populations for Berkeley generic images in medium resolution Name
CCD RECA
Dunn DB β-index
DB -0.58 Dunn -0.58 kM -0.91
Dunn DB β-index
DB -0.64 Dunn -0.64 wVar -0.88
Dunn DB β-index
DB -0.70 Dunn -0.70 kM -0.95
FCD RECA 27059 DB -0.57 Dunn -0.57 kM -0.90 78004 DB -0.55 Dunn -0.55 wVar -0.80 86000 DB -0.68 Dunn -0.68 kM -0.94
FFT RECA
FFD RECA
DB -0.54 Dunn -0.54 wVar -0.88
DB -0.58 Dunn -0.58 kM -0.88
DB -0.62 Dunn -0.62 wVar -0.83
DB -0.60 Dunn -0.60 wVar -0.80
DB -0.66 Dunn -0.66 kM -0.93
DB -0.61 Dunn -0.61 wVar -0.91
In the class of standard validity indices, the Dunn index has been frequently maximally correlated with the DB index, and vice versa. The β-index has been predominantly most often correlated with kM and wVar indices. In the class of rough entropy validity indices, the Dunn index has been maximally highly correlated with CCD RECA measures. In case of the Davies-Bouldin index predominantly high (negative) correlation has been observed for the CCD RECA measures. The best correlation for the β-index has been achieved with the CCD and FCD RECA measures. The correlations for the β-index are the highest, the correlation for the DB index is moderate. The correlation for the DB index is comparatively the lowest. TABLE 11.7 Rough indices correlations in RECA populations for Berkeley generic images in medium resolution Name
CCD RECA
Dunn DB β-index
R5 0.18 R5 -0.11 R4 0.80
Dunn DB β-index
R5 0.27 D3 -0.10 F5 0.86
Dunn DB β-index
R4 0.22 R5 -0.19 F5 0.68
FCD RECA 27059 R5 0.36 R4 -0.25 R4 0.83 78004 R5 0.36 D2 -0.13 F5 0.89 86000 R5 0.39 R5 -0.34 R4 0.70
FFT RECA
FFD RECA
R5 0.52 R5 -0.16 R4 0.84
R5 0.33 R5 -0.06 R4 0.86
R5 0.23 T5 -0.16 R5 0.88
R5 0.01 D2 -0.14 R5 0.90
R5 0.40 R5 -0.33 R4 0.85
R5 0.36 D3 -0.10 R5 0.89
Generic 27059 - 9 bands - high dimensional context Generic nature image with results for the high dimensional setting in high, medium and low resolution have been presented in Table 11.8 in the context of standard validity indices correlation and in Table 11.9 in the rough indices correlation context. In the class of standard validity indices, Dunn index has been unanimously maximally correlated with DB index and vice versa. The β-index has been predominantly most often correlated with kM index.
11–18
Rough Fuzzy Image Analysis TABLE 11.8 Standard indices correlations in RECA populations for generic 27059 image - 9 bands Name
CCD RECA
Dunn DB β-index
DB -0.59 Dunn -0.59 kM -0.97
Dunn DB β-index
DB -0.60 Dunn -0.60 kM -0.96
Dunn DB β-index
DB -0.60 Dunn -0.63 kM -0.96
FCD RECA 9 - High DB -0.65 Dunn -0.65 kM - 0.95 9 - Medium DB -0.65 Dunn -0.65 kM -0.95 9 - Low DB -0.65 Dunn -0.65 kM -0.95
FFT RECA
FFD RECA
DB -0.58 Dunn -0.58 kM -0.96
DB -0.61 Dunn -0.64 kM - 0.95
DB -0.59 Dunn -0.59 kM -0.96
DB -0.65 Dunn -0.65 kM -0.95
DB -0.76 Dunn -0.76 kM -0.96
DB -0.66 Dunn -0.66 kM - 0.95
In the class of rough entropy validity indices, the Dunn index has been maximally correlated maximally with various RECA measures. In case of the Davies-Bouldin index predominantly the highest (negative) correlation has been observed for FCD and FFT RECA measures. The best correlations for the β-index are achieved for the CCD and FCD RECA measures. The comparison shows that β-index correlations are the highest from three selected indices and the correlations for DB are are comparatively the lowest. TABLE 11.9 Rough indices correlations in RECA populations generic 27059 image - 9 bands Name
CCD RECA
Dunn DB β-index
R5 0.10 F5 -0.04 F2 0.67
Dunn DB β-index
R5 0.12 T5 -0.02 F2 0.69
Dunn DB β-index
R5 0.12 T5 -0.04 R4 0.71
FCD RECA 9 - High T5 0.00 T5 0.00 R4 0.72 9 - Medium F5 -0.01 T5 0.00 R5 0.71 9 - Low R4 0.02 T5 0.00 R4 0.74
FFT RECA
FFD RECA
F5 0.08 T4 -0.05 R5 0.69
T5 0.02 T5 -0.09 R4 0.72
R5 0.11 F4 -0.14 R4 0.72
T5 0.04 T5 -0.11 R4 0.74
F4 0.28 T4 -0.30 F3 0.78
T5 0.01 T5 -0.10 R4 0.71
Standard and rough index correlation - summary The comparison performed for three datasets in low, medium and high dimensions suggests that rough indices have tendency to be highly correlated with the β-index values. This correlation correspondence seems to be the most profound in all data dimensions, but exhibits slow decrease as data dimensionality increases. In case of Dunn and DB indices, the moderate correlation have been observed for low dimensional datasets that tend to decrease for medium and high dimensionality.
11.6.2
The Best Solutions
In the subsequent tables, the best solutions from rough populations CCD RECA, FCD RECA, FFT RECA and FFD RECA are presented. The best solutions are considered relative to Dunn index, Davies-Bouldin index and β-index. In table 11.10 the best standard indices values for satellite buildup area image have been presented. Satellite buildup area image ave been segmented in low dimensional context (bands RG, RB, BG) and medium dimensional context (bands RGB).
11–19
Rough Fuzzy Measures in Image Segmentation and Analysis TABLE 11.10 Best indices values - population number for satellite build-up image Name
CCD RECA
Dunn DB β-index
1.58 113.96 33.70
Dunn DB β-index
1.45 128.78 26.38
Dunn DB β-index
1.63 84.36 42.99
Dunn DB β-index
1.46 121.95 27.20
FCD RECA RG 1.40 114.37 34.20 RB 1.09 137.70 25.24 BG 1.55 84.45 45.90 RGB 1.35 125.44 22.37
FFT RECA
FFD RECA
1.56 117.73 32.90
1.65 124.50 28.60
1.45 143.00 24.70
1.37 135.50 25.77
1.58 85.00 41.28
1.84 83.99 39.76
1.48 128.58 23.63
1.63 122.35 25.72
The best values for β-index have been achieved for CCD RECA populations in all examined cases. The best values of Dunn index have been most often observed for FFT populations, except RB bands. High values for DB index are achieved for FFT and FFD populations. In Table 11.11 the best standard indices values for three selected images from Berkeley database have been presented. The images have been segmented in medium dimensional context (bands RGB). TABLE 11.11 Best indices values - for images from Berkeley image dataset in medium resoltuion Name
CCD RECA
Dunn DB β-index
1.38 5.47 17.32
Dunn DB β-index
1.49 3.05 45.76
Dunn DB β-index
1.11 7.64 10.23
FCD RECA 27059 1.27 6.47 17.47 78004 1.82 1.76 47.47 86000 1.11 9.77 10.18
FFT RECA
FFD RECA
1.26 5.14 17.01
1.22 3.60 16.58
1.91 1.82 42.09
2.02 1.72 46.58
1.09 7.62 10.05
1.08 5.14 10.18
The best values for β-index have been achieved for CCD RECA and FCD populations. The values of Dunn index are not uniformly distributed. High values for DB index are achieved for FFT and FFD populations. In Table 11.12 the best standard indices values for generic nature area image 27059 from Berkeley imaga dataset have been presented. The image have been segmented in high dimensional context (9 bands) with three resolutions: low, medium and high resolution. The best values for β-index have been achieved for CCD RECA populations in all examined cases. The best (maximal) values of Dunn index have been most often observed for FFT populations. The Optimal (low) values for the DB index are achieved for CCD and FFD populations. The comparison of the best solutions in the four selected populations for three selected image datasets shows that the best solutions for the β-index values are obtained for CCD
11–20
Rough Fuzzy Image Analysis TABLE 11.12 bands
Best indices values for generic 27059 image - 9
Name
CCD RECA
Dunn DB β-index
0.90 48.45 6.87
Dunn DB β-index
0.98 5.69 7.29
Dunn DB β-index
0.98 6.70 6.32
FCD RECA 9 - High 1.03 49.79 6.24 9 - Medium 0.93 4.57 6.15 9 - Low 0.98 6.16 5.77
FFT RECA
FFD RECA
0.90 50.52 5.84
1.59 52.81 5.64
0.99 4.77 5.76
1.27 2.77 5.63
0.97 5.38 5.56
1.81 2.05 5.07
populations. The Dunn index and DB index the best values most often are achieved for FFT and FFD populations. The best solutions - summary For the selected and analyzed datasets the best solutions relative to β-index have been achieved for CCD and FCD populations. In case of the Dunn and DB indices, the populations of FFT and FFD performed better.
11.6.3
Population Indices
In this subsection, selected rough indices from analyzed populations have been presented in graphical form. In Figure 11.3 the values of the best rough index from all 300 populations have been diagrammatically presented. In the diagram average F2 rough measure as described in Table 11.1 in each of the 300 population has been given. From the graph shape general increasing tendency has been depicted.
FIGURE 11.3
Satellite build-up area image - RGB - average F2 measure during all iterated populations
In Figure 11.4 the values of the best rough index from all 300 populations have been
Rough Fuzzy Measures in Image Segmentation and Analysis
11–21
diagrammatically presented. In the diagram average β-index measure as described in Subsection 11.3.2 in each of the 300 population has been given. From the graph shape general increasing tendency has been depicted.
FIGURE 11.4 ulations
Satellite build-up area image - RGB - average β -index measure during all iterated pop-
In Figure 11.5 the values of correlation between Dunn index and R4 measure in all 300 populations have been diagrammatically presented. From the graph shape general positive correlation tendency has been depicted. In Figure 11.6 the values of correlation between DB index and R4 measure in all 300 populations have been diagrammatically presented. From the graph shape general negative correlation tendency has been depicted. In Figure 11.7 the values of correlation between β-index index and R4 measure in all 300 populations have been diagrammatically presented. From the graph shape general positive correlation tendency has been depicted. In Figure 11.8 the values of correlation between R1 and R4 measures in all 300 populations have been diagrammatically presented. From the graph shape general positive correlation tendency has been depicted.
Conclusions There is a growing need for effective segmentation routines capable of handling different types of imagery suitable for their characteristics and area of application. High quality of image segmentation requires incorporating reasonably as much information of an image as possible. This kind of combining diverse information in a segmentation understood as a means of improving algorithm performance has been widely recognized and acknowledged, see for example (Gonzales and Woods, 2002) for details.
11–22
Rough Fuzzy Image Analysis
FIGURE 11.5 Satellite build-up area image - GB - correlation between Dunn index and R4 measure during all populations
FIGURE 11.6 Satellite build-up area image - RGB - correlation between DB index and R5 measure during all populations
In the present study, detailed investigation into standard and rough rough entropy clustering algorithms has been performed. The paper presents and summarizes rough entropy framework for image segmentation systems. In order to make rough entropy measures more understandable, rough entropy measures have been compared to standard clustering validation indices. Experimental results have proved that it seems possible to hypothetically
Rough Fuzzy Measures in Image Segmentation and Analysis
11–23
FIGURE 11.7 all populations
Satellite build-up area image - RGB - correlation between β -index and R4 measure during
FIGURE 11.8 populations
Satellite build-up area image - GB - correlation between R1 and F4 measures during all
draw conclusion that rough entropy measures are correlated with standard clustering validity indices. Rough indices proved to be highly correlated with standard cluster validity measures and cluster homogeneity measures. Rough measures are best correlated with the β-index, next with Davies-Bouldin index and correlation with Dunn index is comparatively the lowest. The β-index values have tendency to high correlation in all dimensions, Dunn
11–24
Rough Fuzzy Image Analysis
and DB correlations decrease with the increase of the data dimensionality. From the experimental data the the assumption of the best correlation and solutions have been observed for CCD and FCD RECA populations for β-index and FFT and FFD RECA populations for Dunn and DB indices. The further research in the rough entropy area should help to better confirm and understand presented tendencies. Proposed by the authors Rough Entropy Clustering Algorithm with different rough entropy measures makes use of boundary region information during clustering process, and at the same time comprehend better internal data structure. RECA algorithmic scheme incorporates fuzziness and uncertainty into clustering procedure making possible a deeper look into internal data structure. In this context, the careful insight into similarities and differences between standard data analysis techniques and the new proposed approach seems to be crucial in order to properly explore and apply this fuzzy-rough algorithms. In order to answer the question of advantages of fuzzy rough data analysis in the area of image segmentation, the paper concentrates on discovering present relations between established data clustering validation measures (indices) and rough entropy measures. In this way, the area for application of the proposed solution becomes clearly more evident and helps better understand the nature of fuzzy-rough data analysis.
Acknowledgments The research is supported by the grants N N516 0692 35 and N N516 3774 36 from the Ministry of Science and Higher Education of the Republic of Poland. Computational experiments were performed on a cluster built by the Department of Computer Science, Bialystok University of Technology.
Rough Fuzzy Measures in Image Segmentation and Analysis
11–25
Bibliography Bezdek, J. C., and N. R. Pal. 1995. Cluster validation with generalized dunn’s indices. 190–193. 2nd New Zealand Two-Stream Int. Conf. on Artificial Neural Networks and Expert Systems (ANNES ’95). Borkowski, M., and J. F. Peters. 2007. Matching 2d image segments with genetic algorithms and approximation spaces. Transactions on Rough Sets V(LNAI 4100): 63–101. Fu, S. K., and J. K. Mui. 1981. A survey on image segmentation. Pattern Recognition 13:3–16. Gonzales, R. C., and R. E. Woods. 2002. Digital image processing. New York: Prentice Hall. Haralick, R. M., and L. G. Shapiro. 1985. Survey: Image segmentation techniques. Computer Vision Graphics Image Processing 29:100–132. Malyszko, D., and J. Stepaniuk. 2008. Standard and fuzzy rough entropy clustering algorithms in image segmentation. Lecture Notes in Computer Science 5306:409–418. Martin, D., C. Fowlkes, D. Tal, and J. Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th int’l conf. computer vision, vol. 2, 416–423. Pal, S. K., B. U. Shankar, and P. Mitra. 2005. Granular computing, rough entropy and object extraction. Pattern Recognition Letters 26(16):2509–2517. Pawlak, Z. 1991. Rough sets: theoretical aspects of reasoning about data. Netherlands, Dordrecht: Kluwer Academic. Pawlak, Z., and A. Skowron. 2007. Rudiments of rough sets. Information Sciences 177(1):3–27. Pedrycz, W., A. Skowron, and V. (Eds) Kreinovich. 2008. Handbook of granular computing. New York: John Wiley & Sons. Rutkowski, L. 2008. Computational intelligence methods and techniques. Springer. Skowron, A., and J. Stepaniuk. 1996. Tolerance approximation spaces. Fundamenta Informaticae 27:245–253. Stepaniuk, J. 2008. Rough–granular computing in knowledge discovery and data mining. Springer. Zadeh, L. A. 1997. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems 90:111–117. Zhang, Y. J. 1996. A survey on evaluation methods for image segmentation. Pattern Recognition 29:1335–1346.
12 Discovering Image Similarities. Tolerance Near Set Approach 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–1 12.2 Tolerance Near Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–3 Probe Function • Perceptual Systems • L2 norm-based Object Description • Perceptual Tolerance Relation
12.3 Resemblance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–5 Description-based Set Intersection Nearness Measure • Tolerance Class Overlap Distribution Nearness Measure (Meghdadi, Peters, and Ramanna, 2009) • Tolerance Class Size-Based Nearness Measure (Henry and Peters, 2008) • Hausdorff Distance and Image Correspondence Measure
12.4 Illustration: Image Nearness Measures with Microfossil Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–9 12.5 Image Retrieval Experiments with Meghdadi Image Nearness Toolset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–11
Sheela Ramanna University of Winnipeg, Dept. Applied Computer Science, 515 Portage Ave., Winnipeg, Manitoba, R3B 2E9, Canada
12.1
tNM-measure Results
•
Hausdorff-measure Results
12.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–13 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–13 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–17
Introduction
This chapter proposes an approach to detecting chains of affinities between perceptual objects contained in tolerance classes in digital image coverings. A perceptual object is something perceptible to the senses or knowable by the mind. Perceptual objects that have similar appearance are considered perceptually near each other, i.e., perceived objects that have perceived affinities or, at least, similar descriptions. Similarities between digital images are measured within the context of tolerance spaces with measurable similarities. This form of tolerance space is inspired by E.C. Zeeman’s work on visual perception and H. Poincar´e’s work on the contrast between mathematical continua and the physical continua in a pragmatic philosophy of science that laid the foundations for tolerance spaces. Comparison of pairs of tolerance spaces that are in some sense close to each other originated in E.C. Zeeman’s work on visual accuity spaces and a topological model for visual sensation. The perception of nearness or closeness that underlies tolerance near relations is rooted in M. Merleau-Ponty’s work on the phenomenology of perception during the mid-1940s, and, especially, philosophical reflections on description of perceived objects and the perception of nearness. Pairs of disjoint sets such as sets of points represening digital images are considered near each other to the extent that elements of tolerance classes in image coverings 12–1
12–2
Rough Fuzzy Image Analysis
have similar descriptions. Our classifications are often plainly influenced by chains of affinities. –Charles Darwin, 1859.
Related tolerance spaces have isomorphic set theories. [Let ξ, η denote tolerance relations, then] The right visual field (X, ξ) and the right visual lobe (X, η). have isomorphic set theories. –Sir E. Christopher Zeeman, 1962.
In general, the term affinity means close relationship based on a common origin or structure (Murray, Bradley, Craigie, and Onions, 1933). In this chapter, the term affinity means close relationship between perceptual granules (particularly images) based on common description (Peters and Ramanna, 2009). In particular, this chapter considers a tolerance near set solution to the image correspondence problem (Peters, 2009, 2010), i.e., where one uses image matching strategies to establish a correspondence between one or more images. Recently, it has been shown that near sets can be used in a perception-based approach to discovering correspondences between images (see, e.g., (Henry and Peters, 2009c; Peters, 2009, 2010; Peters and Ramanna, 2009; Peters and Puzio, 2009; Henry and Peters, 2009a; Meghdadi et al., 2009)). Sets of perceptual objects where two or more of the objects have matching descriptions are called near sets. Detecting image resemblance and formulating image description are part of the more general pattern recognition process enunciated by K. Cyran and A. Mr´ozek in 2001 (Cyran and Mr´ozek, 2001). Work on a basis for near sets began in 2002, motivated by image analysis and inspired by a study of the perception of the nearness of perceptual objects carried out in cooperation with Z. Pawlak in (Pawlak and Peters, 2002,2007) and is directly related to the more general setting of rough sets (Pawlak, 1981; Pawlak and Skowron, 2007c,b,a), especially if one considers approximation spaces (Peters, Skowron, and Stepaniuk, 2007, 2006). This initial work led to the introduction of near sets (Peters, 2007b), elaborated in (Peters, 2007a, 2009, 2010; Peters and Wasilewski, 2009; Peters and Puzio, 2009; Peters and Henry, 2009; Henry and Peters, 2009b). A perceptionbased approach to discovering resemblances between images leads to a tolerance space form of near sets that models human perception in a physical continuum. The term tolerance space was coined by E.C. Zeeman in 1961 in modelling visual perception with tolerances (Zeeman, 1962). A tolerance space is a set X supplied with a binary relation ' (i.e., a subset ' ⊂ X × X) that is reflexive (for all x ∈ X, x ' x) and symmetric (for all x, y ∈ X, x ' y and x∼ y) but transitivity of ' is not required. For example, it is possible to define a tolerance space relative to subimages of an image. This is made possible by assuming that each image is a set of fixed points. Let O denote a set of perceptual objects (e.g., grey level subimages) and let gr(x) = average grey level of subimage x. Then the tolerance relation 'gr is defined as 'gr = {(x, y) ∈ O × O | |gr(x) − gr(y)| ≤ ε} for some tolerance ε ∈ < (reals). Then (O, 'gr ) is a sample tolerance space. A tolerance threshold denoted by ε is directly related to the exact idea of closeness or resemblance (i.e., being within some tolerance) in comparing objects. The basic idea is to find objects such as images that resemble each other with a tolerable level of error. Sossinsky (Sossinsky, 1986) observes that main idea underlying tolerance theory comes from Henri Poincar´e (Poincar´e, 1913). Physical continua (e.g., measurable magnitudes in the physical world of medical imaging (Hassanien, Abraham, Peters, Schaefer, and Henry, 2009)) are contrasted with the mathematical continua (real numbers) where almost solutions are
Discovering Image Similarities. Tolerance Near Set Approach
12–3
common and a given equation have no exact solutions. An almost solution of an equation (or a system of equations) is an object which, when substituted into the equation, transforms it into a numerical ’almost identity’, i.e., a relation between numbers which is true only approximately (within a prescribed tolerance) (Sossinsky, 1986). Equality in the physical world is meaningless, since it can never be verified either in practice or in theory. Hence, the basic idea in a tolerance space view of images, for example, is to replace the indiscerniblity relation in rough sets (Pawlak, 1982) with a tolerance relation in partitioning images into homologous regions where there is a high likelihood of overlaps, i.e., non-empty intersections between image tolerance classes. The use of image tolerance spaces in this work is directly to recent work on tolerance spaces (see, e.g., (Bartol, Mir´ o, Pi´ oro, and Rossell´o, 2004; Gerasin, Shlyakhov, and Yakovlev, 2008; Schroeder and Wright, 1992a; Shreider, 1970; Skowron and Stepaniuk, 1996; Zheng, Hu, and Shi, 2005)). The contribution of this chapter is twofold, namely, a proposed tolerance near set-based approach to solving the image correspondence problem and a comparison of the results obtained using several tolerance space-based image resemblance measures. This chapter has the following organization. A brief introduction to tolerance near sets is given in Sect. 12.2. This is followed in Sect. 12.3 by a presentation of four nearness measures. A report on the results of experiments with pairs of images using the Meghdadi image nearness toolset is given in Sect. 12.4. TABLE 12.1
Nomenclature
Symbol O, X F, B ε φi (x) φB (x) ∼ =B,ε Preclass A ⊂ ∼ =B,ε CTX i B, T ∼ =B,ε
X ./B,ε Y
12.2
Interpretation Set of perceptual objects, X, Y ⊆ O, Sets of probe functions, B ⊆ F, ε ∈ < (reals) such that ε ≥ 0, ith probe function representing feature of x, (φ1 (x), . . . , φl (x)), φi ∈ F, x ∈ O, description of x, {(x, y) ∈ O × O : k φB (x) − φB (y) k2 ≤ ε}, tolerance relation ⇐⇒ ∀x, y ∈ A, x ∼ =B,ε y, i.e. k φB (x) − φB (y) k2 ≤ ε, maximal preclass in cover of X defined by ∼ =B, , T X Y = {(x, y) ∈ X × Y : k φB (x) − φB (y) k2 ≤ ε}, B T Y CX CYj = {(x, y) ∈ CX i i × Cj : k φB (x) − φB (y) k2 ≤ ε}, ∼ =B,ε
X resembles Y .
Tolerance Near Sets
This section introduces the basic notions underlying a tolerance near set approach to detecting image resemblances.
12.2.1
Probe Function
Definition 1 Probe Function (nea). A probe function is a real-valued function representing a feature of a physical object.
12–4
Rough Fuzzy Image Analysis
Examples of probe functions are the colour, size, weight of an object. Probe functions are used to describe an object to determine the characteristics and perceptual similarity of perceivable physical objects. Perceptual information is always presented with respect to probe functions just as our senses define our perception of the world. For example, our ability to view light in the “visible spectrum” rather than infra red or microwaves spectra defines our perception of the world just as the selection of probe functions constrains the amount of perceptual information available for extraction from a set of objects.
12.2.2
Perceptual Systems
Definition 2 Perceptual System (Peters and Wasilewski, 2009). A perceptual system hO, Fi is a real valued total deterministic information system where O is a non-empty set of perceptual objects and F is a countable set of probe functions. The notion of a perceptual system admits a wide variety of different interpretations that result from the selection of sample perceptual objects contained in a particular sample space O. Examples of perceptual objects include observable organism behaviour, growth rates, soil erosion, microfossils, events containing the outcomes of experiments such as energizing a network, microscope images, MRI scans, and the results of searches for relevant web pages.
12.2.3
L2 norm-based Object Description
The description of an object x ∈ O with probe functions B is given by φB (x) = (φ1 (x), φ2 (x), . . . , φi (x), . . . , φl (x)), where l is the length of the description vector φ and each φi (x) is a probe function that describes the object x. Let di = φ(x) − φ(y) denote the ith vector difference relative to descriptions φ(x), φ(y) for objects x, y ∈ O. Then let dT , d denote row and column vectors of description differences, respectively, i.e., d1 dT = (d1 . . . dk ), d = . . . . dk Finally, the overall distance computed using (12.1) is the L2 norm k d k2 for a vector d, i.e., v u k uX 1 T k d k2 = (d d) 2 = t d2i . (12.1) i=1
In general, k · k2 denotes the length of a vector in L2 space (J¨anich, 1984).
12.2.4
Perceptual Tolerance Relation
Definition 3 Perceptual Tolerance Relation (Peters, 2009) Let hO, Fi be a perceptual system and let ε ∈ R. For every B ⊆ F the tolerance relation ∼ =B,ε is defined as follows: ∼ =B,ε = {(x, y) ∈ O × O : k φB (x) − φB (y) k2 ≤ ε}. For notational convenience, this relation can be written ∼ =B,ε with the under=B instead of ∼ standing that is inherent to the definition of the tolerance relation.
12–5
Discovering Image Similarities. Tolerance Near Set Approach
Notice (O, ∼ =B in Defn. 3 defines a =B ) is an example of a tolerance space. The relation ∼ covering of O instead of partitioning O because an object can belong to more than one class. Let A ⊂ ∼ =B y, i.e. k φB (x) − φB (y) k2 ≤ =B , if ∀x, y ∈ A, x ∼ =B . A is a preclass in ∼ ε (Schroeder and Wright, 1992b). Of course, the idea of a preclass can be generalized to any metric space. A maximal preclass of tolerance ∼ =B,ε is called a class. The particular form of distance-based tolerance relation ∼ =B,ε used in this chapter works well for image analysis, applied mathematics, engineering and biomedicine, where ∼ =B,ε is used to express inaccuracy of measurement or what Poincar´e calls an indistinguishable difference in the intensity of a sensation (Poincar´e, 1913). In other words, a set A ⊂ ∼ =B is a tolerance class if, and only if A is a maximal preclass. The following simple example highlights the need for a tolerance relation as well as demonstrates the construction of tolerance classes from sample data. Let X/∼ =B,ε denote a covering of X defined by ∼ =B,ε . Consider the 20 objects in Table 12.2 where |φ(xi )| = 1. Put ε = 0.1 and obtain the following tolerance classes: X/ ∼ =B
= {{x1 , x8 , x10 , x11 }, {x1 , x9 , x10 , x11 , x14 }, {x2 , x7 , x18 , x19 }, {x3 , x12 , x17 }, {x4 , x13 , x20 }, {x4 , x18 }, {x5 , x6 , x15 , x16 }, {x5 , x6 , x15 , x20 }, {x6 , x13 , x20 }}.
Observe that each pair of objects x, y in a tolerance class satisfies the condition k φ(x) − φ(y) k2 ≤ ε, and many of the objects appear in more than one class. TABLE 12.2
Tolerance Class Example
xi
φ(x)
xi
φ(x)
xi
φ(x)
xi
φ(x)
x1
.4518
x6
.6943
x11
.4002
x16
.6079
x2
.9166
x7
.9246
x12
.1910
x17
.1869
x3
.1398
x8
.3537
x13
.7476
x18
.8489
x4
.7972
x9
.4722
x14
.4990
x19
.9170
x5
.6281
x10
.4523
x15
.6289
x20
.7143
Definition 4 Tolerance Near Sets (Peters, 2009) Let hO, Fi be a perceptual system, where O is a set of images (sets of points) and B ⊆ F be a set of probe functions representing image features. Let B contains probe functions used to measure features of subimages in X, Y ⊂ O. A set X is perceptually near a Y within the perceptual system hO, Fi (i.e., (X ./F,ε Y )) iff there are x ∈ X and y ∈ Y and there is B ⊆ F such that x ∼ =B,ε y.
12.3
Resemblance Measures
Tolerance classes can be viewed as structural elements in representing an image. The motivation for using tolerance classes in perceptual image analysis is the conjecture that visual perception in the human perception is performed in the class level rather than pixel level.
12–6
12.3.1
Rough Fuzzy Image Analysis
Description-based Set Intersection Nearness Measure
Because we want to consider the problem of comparing objects in pairs of disjoint sets such as separate images and, yet, we want to gather together from the disjoint sets those objects that have feature-values in common, we introduce a description-based set intersection. This problem is of interest because its solution provides a formal basis for an image resemblance measure considered in the context of tolerance near sets. To solve this problem, we introduce a description-based intersection of sets. Definition 5 Description-Based Tolerance Class Intersection (Peters and Henry, 2009). Y , , Y/∼ Let hO, Fi be a perceptual system and let X, Y ⊆ O, B ⊆ F and let CX i , Cj ⊆ X/∼ = = B
B
, a tolerance class for xi ∈ X and CYj denotes respectively. The notation CX i denotes xi/∼ =B , a tolerance class for yj ∈ Y . Then yj /∼ =B CX i
\
Y CYj = {(x, y) ∈ CX i × Cj : k φ(x) − φ(y) k2 ≤ ε}.
∼ =B
TABLE 12.3
Feature Values for a Pair of Samples
X
φ
x1 0.7
Y
φ
y1 0.8
x2 0.75 y2 0.85
Example 1 Sample Tolerance Class Intersections
CX 1 CX 1 CX 2 CX 2
ε = 0.2,
(12.2)
= {{x1 , x2 }}, X/ ∼ =φ,0.2
(12.3)
= {{y1 , y2 }}, Y/∼ =φ,0.2 \ CY1 = {x1 , y1 },
(12.4) (12.5)
∼ =B,0.2
\
∼ =B,0.2
\
∼ =B,0.2
\
∼ =B,0.2
CY2 = {x1 , y2 },
(12.6)
CY1 = {x2 , y1 },
(12.7)
CY2 = {x2 , y2 }.
(12.8)
T Y Notice, then, that this example illustrates the fact that the magnitude of CX C i j can ∼ =B equal the product of the sample sizes, i.e., X Y X \ Y Ci C j = Ci · Cj . ∼ =B,0.2
Discovering Image Similarities. Tolerance Near Set Approach
12–7
Y Let i, j ∈ IX , IY be distinct indices used to identify the classes CX i , Cj . Notice,Talso, that CYj and we can formulate a relationship between the description-based intersection CX i ∼ =B the product of the class sizes CX · CY . i
Proposition 1
j
X X \ Y Y CX CX ≤ C i · Cj . j i ∼ i∈IX , i∈IX , =B,ε j∈IY
j∈IY
A measure of the resemblance (i.e., image nearness measure N M∼ =B (X, Y )) between a pair of images X, Y is then formulated in Def. 6. Definition 6 Tolerance Class Intersection-Based Nearness Measure (Peters and Henry, 2009) Let hO, Fi be a perceptual system and assume X, Y ⊆ O with coverings defined using the Y , respectively. , Y/∼ tolerance relation ∼ =B . Let CX i , Cj denote tolerance classes in X/∼ =B =B (X, Y )) is given in Then a tolerance class intersection-based nearness measure (tiN M∼ =B (12.9) to measure the degree of nearness of X, Y is computed using P X T Y Cj Ci ∼ i∈IX , =B,ε j∈IY P tiN M∼ (12.9) =B,ε (X, Y ) = CX · CY . i j i∈IX , j∈IY
12.3.2
Tolerance Class Overlap Distribution Nearness Measure (Meghdadi et al., 2009)
A tolerance class overlap distribution (TOD) nearness measure is based on statistical comparison of overlaps between tolerance classes at each subimage. The method is as follows: Suppose X, Y ∈ O are two images (sets of perceptual objects). The sets of all tolerance classes for images X and Y are shown as follows and form a covering for each image. X/∼B,ε = {x/∼B,ε | x ∈ X},
(12.10)
Y/∼B,ε = {y/∼B,ε | y ∈ Y }.
(12.11)
Subsequently, the set of all overlapping tolerance classes corresponding to each object (subimage) x is named as ΩX/∼B,ε (x) and is defined as follows: ΩX/∼B,ε (x) = {z/∼B,ε ∈ X/∼B,ε | x ∈ z/∼B,ε }.
(12.12)
Consequently, the normalized number of tolerance classes ω in X/∼B,ε which are overlapping at x is defined as follows: ΩX/∼B,ε (x) . (12.13) ωX/∼ (x) = B,ε X/∼B,ε Similarly, the set of all overlapping tolerance classes at every subimage y ∈ Y is denoted by ωY/∼ (y). Assuming that the set of probe functions B and the value of are known, we use B,ε
the more simplified notation of ΩX (x) and ωX (x) for the set X/∼B,ε and the notations ΩY (y)
12–8
Rough Fuzzy Image Analysis
and ωY (y) for the set Y/∼B,ε . Now, let FωX (ω) and FωY (ω) be the empirical commutative distribution functions (CDF) of the functions ωX (x) and ωY (y) respectively, when x ∈ X and y ∈ Y . The Tolerance Overlap Distribution (TOD) nearness measure is defined as follow: Z γ ω=1
T OD = 1 −
(FωX (ω) − FωY (ω))dω
.
(12.14)
ω=0
12.3.3
Tolerance Class Size-Based Nearness Measure (Henry and Peters, 2008)
A tolerance class size-based nearness (tNM) measure is based on the idea that if one considers the union of two images as the set of perceptual objects, tolerance classes should contain almost equal number of subimages from each image. Therefore, the tolerance nearness measure between two digital images is calculated in the following way. Suppose X and Y are the sets of perceptual objects (subimages) in image 1 and image 2. Z = X ∪ Y is the set of all perceptual objects in the union of images and for each z ∈ Z, the tolerance class of an element z of Z (Bartol et al., 2004) (denoted z/∼B,ε in this chapter) for our application is defined in (12.15). z/∼B,ε = {s ∈ Z
|
kφB (z) − φB (s)k ≤ ε},
i.e., z/∼B,ε = {s ∈ Z
|
(12.15)
(z, s) ∈ ∼ =B,ε }.
The part of the tolerance class z/∼B,ε that is a subset of X is denoted [z/∼B,ε ]⊆X and, similarly, part of the tolerance class z/∼B,ε that is a subset of Y is denoted [z/∼B,ε ]⊆Y . Therefore: [z/∼B,ε ]⊆X , {x ∈ z/∼B,ε | x ∈ X} ⊆ z/∼B,ε ,
(12.16)
[z/∼B,ε ]⊆Y , {y ∈ z/∼B,ε | y ∈ Y } ⊆ z/∼B,ε ,
(12.17)
z/∼B,ε = [z/∼B,ε ]⊆X ∩ [z/∼B,ε ]⊆Y .
(12.18)
Subsequently, the measure tNM is defined as the weighted average of the closeness between the cardinality (size) of sets [z/∼B,ε ]⊆X and the cardinality of [z/∼B,ε ]⊆Y where the cardinality of z/∼B,ε is used as the weighting factor in (12.19). tN M = X z/∼B,ε
1 |z/∼B,ε |
×
X min( |[z/∼B,ε ]⊆X | , |[z/∼B,ε ]⊆Y | ) × |z/∼B,ε |. max( |[z/∼B,ε ]⊆X | , |[z/∼B,ε ]⊆Y | ) z
(12.19)
/∼B,ε
For a more detailed explanation of this measure considered within the general framework of near sets, see (Henry and Peters, 2009b).
12.3.4
Hausdorff Distance and Image Correspondence Measure
The Hausdorff distance is used to measure the distance between sets in a metric space (Hausdorff, 1914) (see (Hausdorff, 1962) for English translation), and is defined as dH (X, Y ) = max{ sup inf d(x, y), sup inf d(x, y) }, x∈X y∈Y
y∈Y x∈X
12–9
Discovering Image Similarities. Tolerance Near Set Approach
where sup and inf refer to the supremum and infimum, and d(x, y) is the distance metric (in this case it is the l2 norm). The Hausdorff distance is given graphically in Fig. 12.1a. In this case, the distance from the x ∈ X to every element of set Y is determined, and the shortest distance is selected as the infimum. This process is repeated for every x ∈ X and the largest distance (supremum) is selected as the Hausdorff distance of the set X to the set Y . Finally, this process is repeated for the set Y because the two distances will not necessarily be the same. For example, the Hausdorff distance of the set X to the set Y would be the distance from the lowercase letters in Fig. 12.1a, but the result would be smaller for the distance of the set Y to the set X. The method of applying the Hausdorff distance
X
X
Y y
x
y
x
x
x
Y
y
x
x
x
x x
x
y
y
y y
y
Z=XUY
(12.1a) Hausdorff distance
(12.1b) Image Correspondence Problem
FIGURE 12.1: Graphical Interpretations
to the image correspondence problem is the same as that described in Section 12.3.3. To reiterate, consider Fig. 12.1b where each rectangle represents a set of subimages (obtained by partitioning the original images X and Y ) and the coloured areas represent some of the obtained tolerance classes. The tolerance relation covers both images, but not all the classes are shown in the interest of clarity. Note, the tolerance classes are created based on the feature values of the subimages, and consequently, do not need to be situated geographically near each other (as shown in Fig. 12.1b). In the case of the tNM measure, the idea is that similar images should produce tolerance classes with similar cardinalities. Consequently, we are comparing the cardinalities of the portion of a tolerance class belonging to set X with the portion of the tolerance class belonging to set Y (represented in Fig. 12.1b as sets with the same colour). In contrast, the Hausdorff distance measures the distance between sets in some metric space. As a result, we measure distance in the feature space between the portion of a tolerance class belonging to set X with the portion of a tolerance class belonging to set Y (again, represented as a sets with the same colour). Here, the idea is that images that are similar should have tolerance classes are close to each other (in the Hausdorff sense) in the feature space. As a result, low Hausdorff distances are desirable. Finally, it is important to note, one may be inclined to intuitively think the Hausdorff distance should always be zero. However, this is not the case because we are dealing with tolerance where the feature values are similar, but not the same.
12.4
Illustration: Image Nearness Measures with Microfossil Images
12–10
Rough Fuzzy Image Analysis
Briefly, we illustrate the application of the image nearness measures for a collection of microfossil images. The coverings (with tolerance classes) for the Photographer in Fig. 12.2a is shown in Fig. 12.2b. For simplicity, we have chosen average greyscale level and entropy (information content) as the features used to define image coverings and measure similarities between images. The subimages in Fig. 12.2a delineate tolerance classes (i.e., sets of subimages with similar grey levels within a particular tolerance) that are subregions of the original images in Fig. 12.2b. The tolerance classes in this image are dominated by (light grey), (medium grey) and (dark grey) subimages along with many very dark subimages in Fig. 12.2b.
(12.2a) Photographer
(12.2b) Photographer TNS
FIGURE 12.2: Photographer Tolerance Classes
(12.3a) Co1
(12.3b) Co2
(12.3c) Co3
(12.3d) Os1
(12.3e) Os2
(12.3f) Os3
(12.3g) Os4
FIGURE 12.3: Sample Microfossil Images
The Meghdadi toolset for measuring nearness between images has been used to obtain the measurements reported in Table 12.4. This toolset was reported in (Meghdadi et al., 2009) and not described again here. Nearness measurements using the tiNM measure are not included in the results reported in this section because the tiNM measure in (12.9) is mathematically easy to understand but, so far, the implementation of tiNM is computationally very slow. The images in Fig. 12.3 show two different types of microfossils (Armstrong and Brasier, 2005), namely, Conodants in Figs. 12.3a, 12.3b 12.3c and Ostracods in Figs. 12.3d, 12.3e, 12.3f, 12.3g, respectively. Among the comparisons of the pairs of microfossil images recorded in Table 12.4, the most remarkable is the comparison between Fig. 12.3d and Fig. 12.3g (these are microscope images of fossils from different rocks on micrometer
12–11
Discovering Image Similarities. Tolerance Near Set Approach TABLE 12.4
Image 1
Fossil Image Nearness Measurements
Image (tol. classes)
Fig 12.3a Fig 12.3c Fig 12.3e Fig 12.3e Fig 12.3g
Fig Fig Fig Fig Fig
12.3a 12.3c 12.3e 12.3e 12.3g
(544) (738) (576) (576) (348)
Image 2 Fig 12.3e Fig 12.3b Fig 12.3f Fig 12.3g Fig 12.3d
Image (tol. classes) TOD tNM HdNM wtAvg Fig 12.3e (576) Fig 12.3b (2031) Fig 12.3f (325) Fig 12.3g (348) Fig 12.3d (290)
0.81 0.93 0.83 0.74 0.85
0.30 0.27 0.59 0.37 0.5
0.73 0.58 0.85 0.70 0.8
0.67 0.59 0.80 0.65 0.76
scale). The fossil image in Fig. 12.3g is broken up and quite indistinct compared with the image in Fig. 12.3d, and, yet, each of the nearness measurements indicate a high degree of correspondence between the fossils in the two images. This suggests a very practical application of the nearness measures in comparing microfossils (a knowledge of the fossils at different levels in sedimentary deposits provide indicators of the presence or absence of natural gas and petroleum deposits).
12.5
Image Retrieval Experiments with Meghdadi Image Nearness Toolset
In this section, we illustrate experiments with the tool set in an image retrieval context. Fig. 12.4 shows sample query images that we have used for retrieval from the SIMPLIcity database (Wang, 2001).
(12.4a) Im1, Sample1
(12.4b) Im2, Sample2
(12.4c) Im3, Sample3
FIGURE 12.4: Sample Query Images used in Tab. 12.5
The similarity measurements using different combinations of grey scale and edge-o features as well as different metrics (L1 and L2 ) are reported in Table 12.5. It should also be mentioned that the tool set allows for 14 different features such as hue, saturation and contrast to name a few. The subimage size was set to 20. It can also be observed from the first eight rows of the Table 12.5, that tNM measure values are fairly low and Hausdorff measure values are much higher than tNM.
12.5.1
tNM-measure Results
In this section, we present a sample of 17 images compared with the query image shown in Figure 12.5a. The subfigures that form a part of Figure 12.5 are arranged in descending order of their tNM measure values with the nearest image Im15 = 0.961 and the furthest
12–12
Rough Fuzzy Image Analysis
TABLE 12.5
Similarity Measurements
Metric Feature L2 L2 L2 L2 L1 L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1
greyscale greyscale edge-o. edge-o. greyscale greyscale edge-o. edge-o. greyscale edge-o. edge-o. greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale, greyscale,
edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o. edge-o.
Images
ε
Im1, Im2 Im1, Im3 Im1, Im2 Im1, Im3 Im1, Im2 Im1, Im3 Im1, Im2 Im1, Im3 Im3, Im4 Im3, Im4 Im3, Im7 Im1, Im2 Im1, Im3 Im1, Im2 Im1, Im3 Im3, Im4 Im3, Im5 Im3, Im6 Im3, Im7 Im3, Im8 Im3, Im9 Im3, Im10 Im3, Im11 Im3, Im12 Im3, Im13 Im3, Im14 Im3, Im15 Im3, Im16 Im3, Im17 Im3, Im18 Im3, Im19 Im3, Im20
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
tNM-measure Hausdorff-measure 0.49646 0.58015 0.60141 0.63080 0.49646 0.58015 0.60141 0.63080 0.74525 0.88960 0.91027 0.44694 0.52776 0.44644 0.52776 0.67706 0.94620 0.92052 0.89572 0.90605 0.90262 0.86576 0.92289 0.77348 0.83372 0.82593 0.96127 0.79675 0.81891 0.85555 0.92766 0.83858
0.78257 0.87151 0.88289 0.88147 0.78257 0.87151 0.88289 0.88147 0.92248 0.92419 0.92269 0.78119 0.77003 0.76412 0.75469 0.80572 0.76481 0.80602 0.80945 0.81532 0.79911 0.78755 0.79546 0.75115 0.78273 0.78242 0.81383 0.74366 0.76443 0.78585 0.79726 0.80834
image Im4= 0.677. Figure 12.6 shows the histogram values for closest and farthest image for the two features. It can be observed that the greyscale feature values fall into two ranges and the edge-orientation feature values fall into three ranges. Their combined influence is what is observed in the overall tNM measure. The Thesholded Images areshown in Figure 12.6b, Figure 12.6f and Figure 12.6j.
12.5.2
Hausdorff-measure Results
In this section, the experiment with sample of 17 images compared with the query image is repeated in Figure 12.7a using the Hausdorff measure. The sub figures that form a part of Figure 12.7a are arranged in descending order of their Hausdorff measure values with the nearest image Im8 = 0.815 and the furthest image Im16= 0.744. It can be observed that Hausdorff measure values are fairly consistent and show little variation between the nearest and the farthest image. Figure 12.6 shows the histogram values for closest and farthest image for the two features. The two sets of experiments also demonstrate completely different image retrieval results for the two measures based on two features. It can also be observed from Table 12.5 that the two measures based on edge-orientation and L1 metric would have almost similar values for Im7 for query image Im3.
Discovering Image Similarities. Tolerance Near Set Approach
(12.5a) Im3,1.0
(12.5b) (12.5c) (12.5d) (12.5e) (12.5f) Im15,.961 Im5,.946 Im19,.928 Im11,.923 Im6,.921
(12.5g) Im8,.906
12–13
(12.5h) (12.5i) Im7,.896 Im9,.903
(12.5j) (12.5k) (12.5l) Im10,.866 Im18,.856 Im20,.839
(12.5m) (12.5n) (12.5o) (12.5p) (12.5q) (12.5r) Im13,.834 Im14,.826 Im17,.826 Im16,.819 Im12,.773 Im4,.677
FIGURE 12.5: Images ordered by tNm-measure values
12.6
Conclusion
This article focuses on an approach to solving the image correspondence problem by considering description based affinities between coverings of digital images. It introduces an approach to measuring the resemblance between pairs of images using several L2 norm-based tolerance nearness measures. The proposed solution to the image correspondence problem is based on a comparison of the descriptions of elements of tolerance classes contained in coverings of pairs of digital images. It should be noted here that the proposed approach to measuring similarities between perceptual granules is not limited to digital images. The proposed approach also has promising implications for segmenting videos, especially in applications where grouping images in a video depends on very refined similarity measurements over many separate images contained in a video.
Acknowledgements The author is deeply grateful to James F. Peters and Amir H. Meghdadi for suggestions and insights concerning topics in this chapter. I especially want to thank Amir H. Meghdadi for the use of his tolernance nearness measures toolset. This research has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) grant 194376.
12–14
Rough Fuzzy Image Analysis
(12.6a) Im3,Query
(12.6b) Img
Thresholded (12.6c) Greyscale Hist
(12.6d) Edge-o Hist
(12.6e) Im15,Nearest (12.6f) Img Img
Thresholded (12.6g) Greyscale Hist
(12.6h) Edge-o Hist
(12.6i) Img
Thresholded (12.6k) Greyscale Hist
(12.6l) Edge-o Hist
Im4,Furthest
(12.6j) Img
FIGURE 12.6: (Please see color insert for Figures 12.6a and 12.6b) Retrieval Results based on tNM-measure
Discovering Image Similarities. Tolerance Near Set Approach
(12.7a) Im3,1.0
(12.7b) (12.7c) (12.7d) (12.7e) (12.7f) Im8,.815 Im15,.814 Im7,.809 Im20,.808 Im6,.806
(12.7g) Im4,.806
12–15
(12.7h) Im9,.799
(12.7i) Im19,.797
(12.7j) (12.7k) (12.7l) Im11,.795 Im10,.788 Im18,.786
(12.7m) (12.7n) (12.7o) (12.7p) (12.7q) (12.7r) Im13,.783 Im14,.782 Im5,.765 Im17,.764 Im12,.751 Im16,.744
FIGURE 12.7: Images ordered by Hausdorff-measure values
12–16
Rough Fuzzy Image Analysis
(12.8a) Im3,Query
(12.8b) Img
Thresholded (12.8c) Greyscale Hist
(12.8d) Edge-o Hist
(12.8f) Img
Thresholded (12.8g) Greyscale Hist
(12.8h) Edge-o Hist
(12.8i) Im16,Furthest (12.8j) Img Img
Thresholded (12.8k) Greyscale Hist
(12.8l) Edge-o Hist
(12.8e) Img
Im8,Nearest
FIGURE 12.8: Retrieval Results based on Hausdorff-measure
Discovering Image Similarities. Tolerance Near Set Approach
12–17
Bibliography Armstrong, H.A., and M.D. Brasier. 2005. Microfossils, 2nd ed. Oxford, UK: Blackwell Publishing. Bartol, W., J. Mir´ o, K. Pi´ oro, and F. Rossell´o. 2004. On the coverings by tolerance classes. Inf. Sci. Inf. Comput. Sci. 166(1-4):193–211. Cyran, K.A., and A. Mr´ozek. 2001. Rough sets in hybrid methods for pattern recognition. Int. J. of Intelligent Systems 16:149–168. Gerasin, S. N., V. V. Shlyakhov, and S. V. Yakovlev. 2008. Set coverings and tolerance relations. Cybernetics and Sys. Anal. 44(3):333–340. Hassanien, A.E., A. Abraham, J.F. Peters, G. Schaefer, and C. Henry. 2009. Rough sets and near sets in medical imaging: A review. IEEE Trans. Info. Tech. in Biomedicine. Digital object identifier: 10.1109/TITB.2009.2017017, in press. Hausdorff, F. 1914. Grundz¨ uge der mengenlehre. Leipzig: Verlag Von Veit & Comp. ———. 1962. Set theory. New York: Chelsea Publishing Company. Henry, C., and J. F. Peters. 2008. Near set index in an objective image segmentation evaluation framework. In Geographic object based image analysis: Pixels, objects, intelligence, 1–6. University of Calgary, Alberta. Henry, C., and J.F. Peters. 2009a. Near set evaluation and recognition (near) system. Tech. Rep., Computationa Intelligence Laboratory, University of Manitoba. UM CI Laboratory Technical Report No. TR-2009-015. ———. 2009b. Near sets. Wikipedia. ———. 2009c. Perception-based image analysis. Int. J. of Bio-Inspired Computation 2(2). in press. J¨ anich, K. 1984. Topology. Berlin: Springer-Verlag. Meghdadi, A.H., J.F. Peters, and S. Ramanna. 2009. Tolerance classes in measuring image resemblance. In Intelligent analysis of images & videos, vol. KES 2009 Part II LNAI 5712, 127–134. Murray, J.A., H. Bradley, W. Craigie, and C. Onions. 1933. The oxford english dictionary. Oxford, UK: Oxford University Press. Pawlak, Z. 1981. Classification of objects by means of attributes. Polish Academy of Sciences 429. ———. 1982. Rough sets. International Journal of Computer and Information Sciences 11:341–356. Pawlak, Z., and A. Skowron. 2007a. Rough sets and boolean reasoning. Information Sciences 177:41–73. ———. 2007b. Rough sets: Some extensions. Information Sciences 177:28–40. ———. 2007c. Rudiments of rough sets. Information Sciences 177:3–27.
12–18
Rough Fuzzy Image Analysis
Pawlak, Zdzislaw, and James Peters. 2002,2007. Jak blisko (how near). Systemy Wspomagania Decyzji I:57, 109. ISBN 83-920730-4-5. Peters, J.F. 2007a. Near sets. general theory about nearness of objects. Applied Mathematical Sciences 1(53):2609–2029. ———. 2007b. Near sets. special theory about nearness of objects. Fundamenta Informaticae 75(1-4):407–433. ———. 2009. Tolerance near sets and image correspondence. Int. J. of Bio-Inspired Computation 1(4):239–445. ———. 2010. Corrigenda and addenda: Tolerance near sets and image correspondence. Int. J. of Bio-Inspired Computation 2(5). in press. Peters, J.F., and C. Henry. 2009. Description-based image analysis in measuring image resemblance. Computer Vision and Image Understanding. submitted. Peters, J.F., and L. Puzio. 2009. Image analysis with anistropic wavelet-based nearness measures. International Journal of Computational Intelligence Systems 3(2):1–17. Peters, J.F., and S. Ramanna. 2009. Affinities between perceptual granules: Foundations and perspectives. In Human-centric information processing through granular modelling sci 182, ed. A. Bargiela and W. Pedrycz, 49–66. Berlin: Springer-Verlag. Peters, J.F., A. Skowron, and J. Stepaniuk. 2006. Nearness in approximation spaces. In Proc. concurrency, specification & programming. Humboldt Universit¨at. ———. 2007. Nearness of objects: Extension of approximation space model. Fundamenta Informaticae 79(3-4):497–512. Peters, J.F., and P. Wasilewski. 2009. Foundations of near sets. Information Sciences. An International Journal 179:3091–3109. Digital object identifier: doi:10.1016/j.ins.2009.04.018, in press. Poincar´e, H. 1913. Mathematics and science: Last essays, trans. by j.w. bolduc. N.Y.: Kessinger Pub. Schroeder, M., and M. Wright. 1992a. Tolerance and weak tolerance relations. Journal of Combinatorial Mathematics and Combinatorial Computing 11:123–160. ———. 1992b. Tolerance and weak tolerance relations. Journal of Combinatorial Mathematics and Combinatorial Computing 11:123–160. Shreider, Yu. A. 1970. Tolerance spaces. Cybernetics and Systems Analysis 6(12): 153–758. Skowron, A., and J. Stepaniuk. 1996. Tolerance Approximation Spaces. Fundamenta Informaticae 27(2/3):245–253. Sossinsky, A.B. 1986. Tolerance space theory and some applications. Acta Applicandae Mathematicae: An International Survey Journal on Applying Mathematics and Mathematical Applications 5(2):137–167. Wang, James Z. 2001. Simplicity-content-based image search engine. Content Based Image Retrieval Project.
Discovering Image Similarities. Tolerance Near Set Approach
12–19
Zeeman, E.C. 1962. The topology of the brain and the visual perception. New Jersey: Prentice Hall. In K.M. Fort, Ed., Topology of 3-manifolds and Selected Topics, 240256. Zheng, Z., H. Hu, and Z. Shi. 2005. Tolerance Relation Based Granular Space. Lecture Notes in Computer Science 3641:682.