Robert Jeansoulin, Odile Papini, Henri Prade, and Steven Schockaert (Eds.) Methods for Handling Imperfect Spatial Information
Studies in Fuzziness and Soft Computing, Volume 256 Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail:
[email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 241. Jacek Kluska Analytical Methods in Fuzzy Modeling and Control, 2009 ISBN 978-3-540-89926-6 Vol. 242. Yaochu Jin, Lipo Wang Fuzzy Systems in Bioinformatics and Computational Biology, 2009 ISBN 978-3-540-89967-9 Vol. 243. Rudolf Seising (Ed.) Views on Fuzzy Sets and Systems from Different Perspectives, 2009 ISBN 978-3-540-93801-9 Vol. 244. Xiaodong Liu and Witold Pedrycz Axiomatic Fuzzy Set Theory and Its Applications, 2009 ISBN 978-3-642-00401-8 Vol. 245. Xuzhu Wang, Da Ruan, Etienne E. Kerre Mathematics of Fuzziness – Basic Issues, 2009 ISBN 978-3-540-78310-7 Vol. 246. Piedad Brox, Iluminada Castillo, Santiago Sánchez Solano Fuzzy Logic-Based Algorithms for Video De-Interlacing, 2010 ISBN 978-3-642-10694-1 Vol. 247. Michael Glykas Fuzzy Cognitive Maps, 2010 ISBN 978-3-642-03219-6 Vol. 248. Bing-Yuan Cao Optimal Models and Methods with Fuzzy Quantities, 2010 ISBN 978-3-642-10710-8
Vol. 249. Bernadette Bouchon-Meunier, Luis Magdalena, Manuel Ojeda-Aciego, José-Luis Verdegay, Ronald R. Yager (Eds.) Foundations of Reasoning under Uncertainty, 2010 ISBN 978-3-642-10726-9 Vol. 250. Xiaoxia Huang Portfolio Analysis, 2010 ISBN 978-3-642-11213-3 Vol. 251. George A. Anastassiou Fuzzy Mathematics: Approximation Theory, 2010 ISBN 978-3-642-11219-5 Vol. 252. Cengiz Kahraman, Mesut Yavuz (Eds.) Production Engineering and Management under Fuzziness, 2010 ISBN 978-3-642-12051-0 Vol. 253. Badredine Arfi Linguistic Fuzzy Logic Methods in Social Sciences, 2010 ISBN 978-3-642-13342-8 Vol. 254. Weldon A. Lodwick, Janusz Kacprzyk (Eds.) Fuzzy Optimization, 2010 ISBN 978-3-642-13934-5 Vol. 255. Zongmin Ma, Li Yan (Eds.) Soft Computing in XML Data Management, 2010 ISBN 978-3-642-14009-9 Vol. 256. Robert Jeansoulin, Odile Papini, Henri Prade, and Steven Schockaert (Eds.) Methods for Handling Imperfect Spatial Information, 2010 ISBN 978-3-642-14754-8
Robert Jeansoulin, Odile Papini, Henri Prade, and Steven Schockaert (Eds.)
Methods for Handling Imperfect Spatial Information
ABC
Editors Dr. Robert Jeansoulin CNRS UMR 8049 LabInfo IGM, Université Paris-Est Marne-la-Vallée 77454 Marne-la-Vallée Cedex 2 – France E-mail:
[email protected]
Dr. Henri Prade CNRS UMR 5505 IRIT, Université Paul Sabatier, 118 route de Narbonne 31062 Toulouse Cedex 9 – France E-mail:
[email protected]
Prof. Odile Papini CNRS UMR 6168 LSIS, ESIL, Université de la Méditerranée, 163, Avenue de Luminy, 13288 Marseille Cedex 9 – France E-mail:
[email protected]
Dr. Steven Schockaert Department of Applied Mathematics and Computer Science, Krijgslaan 281, Universiteit Gent 9000 Gent – Belgium E-mail:
[email protected]
ISBN 978-3-642-14754-8
e-ISBN 978-3-642-14755-5
DOI 10.1007/978-3-642-14755-5 Studies in Fuzziness and Soft Computing
ISSN 1434-9922
Library of Congress Control Number: 2010934931 c 2010 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed on acid-free paper 987654321 springer.com
Acknowledgements
This book originated from a small workshop on Soft Methods for Statistical and Fuzzy Spatial Information Processing, which was co-located with the fourth International Symposium on Soft Methods in Probability and Statistics, and took place on September 11th 2008 in Toulouse, France. The contributions of this book include revised and extended versions of six of the eight papers that were presented at the workshop, together with seven invited contributions in order to improve its overall coverage. We are grateful to all the authors for their willingness to participate to this book, and for the time and effort they have spent to prepare their contribution. The chapters have been mainly reviewed by ourselves, and we also acknowledge the help from Tom M´elange. We thank John Grant for his careful proofreading of the introductory chapter. We have furthermore been helped in the preparation of this book by Florence Bou´e, who has in particular maintained a webpage which was accessible to the contributors, during the preparation of the book. We are grateful to Janusz Kacprzyk for his invitation to submit this volume in the Studies in Fuzziness and Soft Computing series. Lastly, this book may be viewed as a result of the inter-regional action project n◦ 05013992 “GEOFUSE: Fusion d’informations g´eographiques incertaines”, jointly supported by the Conseils R´egionaux of Midi-Pyr´en´ees and of Provence-Alpes-Cˆ ote d’Azur, where three of the editors were involved in. The last editor has benefited from two travel grants of the Research Foundation – Flanders (FWO), giving him the opportunity of staying in Toulouse and working on the preparation of the book. Toulouse, June 2010
The editors
Contents
Introduction: Uncertainty Issues in Spatial Information . . . . . . Robert Jeansoulin, Odile Papini, Henri Prade, Steven Schockaert
1
Part 1: Describing Spatial Configurations Spatial Vagueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brandon Bennett
15
A General Approach to the Fuzzy Modeling of Spatial Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pascal Matsakis, Laurent Wendling, Jing Bo Ni
49
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Isabelle Bloch
75
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Theresa Beaubouef, Frederick E. Petry Part 2: Symbolic Reasoning and Information Merging An Exploratory Survey of Logic-Based Formalisms for Spatial Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Florence Dupin de Saint-Cyr, Odile Papini, Henri Prade Revising Geographical Knowledge: A Model for Local Belief Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Omar Doukari, Robert Jeansoulin, Eric W¨ urbel
VIII
Contents
Merging Expressive Spatial Ontologies Using Formal Concept Analysis with Uncertainty Considerations . . . . . . . . . . 189 Olivier Cur´e Generating Fuzzy Regions from Conflicting Spatial Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Steven Schockaert, Philip D. Smart Part 3: Prediction and Interpolation Fuzzy Methods in Image Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Alfred Stein Kriging and Epistemic Uncertainty: A Critical Discussion . . . 269 Kevin Loquin, Didier Dubois Scaling Cautious Selection in Spatial Probabilistic Temporal Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Francesco Parisi, Austin Parker, John Grant, V.S. Subrahmanian Imperfect Spatiotemporal Information Analysis in a GIS: Application to Archæological Information Completion Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Cyril de Runz, Eric Desjardin Uncertainty in Interaction Modelling: Prospecting the Evolution of Urban Networks in South-Eastern France . . . . . . . 357 Giovanni Fusco Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
List of Authors
Theresa Beaubouef Department of Computer Science and Industrial Technology, Southeastern Louisiana University, Hammond, LA 70402
[email protected] Brandon Bennett Division of Artificial Intelligence, School of Computing, University of Leeds, Leeds , LS2 9JT, UK.
[email protected] Isabelle Bloch D´epartement Traitement du Signal et des Images, T´el´ecom ParisTech, CNRS LTCI, 46 rue Barrault, F-75634 Paris Cedex 13
[email protected] Olivier Cur´ e Equipe Terre Digitale, Universit´e Paris-Est, 5, bd Descartes, Marne la Vall´ee 77454 France,
[email protected] Cyril de Runz CReSTIC-SIC, IUT de Reims-Chˆ alons-Charleville,
Rue des Cray`eres, BP 1035, 51687 Reims, Cedex 2, France
[email protected] Eric Desjardin CReSTIC-SIC, IUT de Reims-Chˆ alons-Charleville, Rue des Cray`eres, BP 1035, 51687 Reims, Cedex 2, France
[email protected] Omar Doukari CNRS UMR 6168 LSIS, Campus de Saint-J´erˆome, Avenue Escadrille Normandie-Niemen, 13397 Marseille Cedex - France
[email protected] Didier Dubois IRIT, Universit´e Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex 9
[email protected] Florence Dupin de Saint-Cyr IRIT, Universit´e Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex 9
[email protected]
X
List of Authors
Giovanni Fusco UMR 6012 ESPACE, Universit´e de Nice-Sophia Antipolis, 98 Bd Herriot BP 3209, 06204 Nice cedex 3, France
[email protected]
Odile Papini LSIS UMR-CNRS 6168, ESIL, Universit´e de la M´editerran´ee, Avenue de luniversit´e BP 132, 83957 La Garde cedex, France
[email protected]
John Grant Towson University, Towson, Maryland, USA and University of Maryland, College Park, MD 20742, USA
[email protected]
Francesco Parisi Universit´ a della Calabria, Via P.Bucci, cubo 41/C, 87036 Rende (CS), Italy
[email protected]
Robert Jeansoulin CNRS UMR 4089 LabInfo IGM, Universit´e Paris-Est Marne-la-Vall´ee, 77454 Marne-la-Vall´ee Cedex - France
[email protected]
Austin Parker University of Maryland, College Park, MD 20742, USA
[email protected]
Frederick E. Petry Mapping Charting and Geodesy Branch, Naval Research Laboratory, Stennis Space Center, MS 39529 Kevin Loquin D´epartement Traitement du Signal
[email protected] et des Images, T´el´ecom ParisTech, CNRS LTCI, Henri Prade 46 rue Barrault, IRIT, Universit´e Paul Sabatier, F-75634 Paris Cedex 13 118 Route de Narbonne,
[email protected] F-31062 Toulouse Cedex 9
[email protected] Pascal Matsakis Deptartment of Computing and Steven Schockaert Information Science, Department of Applied Mathematics University of Guelph, and Computer Science, Guelph, ON N1G 2W, Canada Ghent University,
[email protected] Krijgslaan 281, 9000 Gent, Belgium
[email protected] JingBo Ni Deptartment of Computing and Information Science, University of Guelph, Guelph, ON N1G 2W, Canada
[email protected]
Philip Smart School of Computer Science, Cardiff University, 5 The Parade, Roath, Cardiff, UK
[email protected]
List of Authors
XI
Alfred Stein ITC, University of Twente, PO Box 6, 7500 AA Enschede, The Netherlands
[email protected]
Laurent Wendling Universit´e Paris Descartes, 45, rue des Saints-Pres, 75270 Paris Cedex 06 France
[email protected]
V.S. Subrahmanian University of Maryland, College Park, MD 20742, USA
[email protected]
Eric W¨ urbel CNRS UMR 6168 LSIS, Universit´e du Sud Toulon-Var, 83957 La Garde Cedex, France
[email protected]
Introduction: Uncertainty Issues in Spatial Information Robert Jeansoulin, Odile Papini, Henri Prade, and Steven Schockaert
Abstract. This introductory chapter serves two purposes. First, it provides a brief overview of research trends in different areas of information processing for the handling of uncertain spatial information. The discussion focuses on the diversity of spatial information, and the different challenges that may arise. Second, an overview of the contents of this edited volume is presented. We also point out the novelty of the book, which goes beyond geographical information systems and considers different forms of quantitative and qualitative uncertainty.
1 The Nature of Spatial Information Variety of Spatial Information The term spatial information refers to pieces of information that are associated with locations, which typically refer to points or regions in some two- or threedimensional space. Many applications deal with geographic information [8, 35], in which case the space under consideration is the surface of the Earth. Other Robert Jeansoulin CNRS UMR 8049 LabInfo IGM, Universit´e Paris-Est Marne-la-Vall´ee 77454 Marne-la-Vall´ee Cedex 2, France e-mail:
[email protected] Odile Papini CNRS UMR 6168 LSIS, 163, Avenue de Luminy, ESIL, Universit´e de la M´editerran´ee 13288 Marseille Cedex 9, France e-mail:
[email protected] Henri Prade CNRS UMR 5505 IRIT, Universit´e Paul Sabatier, 118 route de Narbonne 31062 Toulouse Cedex 9, France e-mail:
[email protected] Steven Schockaert Department of Applied Mathematics and Computer Science, Krijgslaan 281, Universiteit Gent 9000 Gent, Belgium e-mail:
[email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 1–11. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
2
R. Jeansoulin et al.
applications, however, deal with spatial information of a quite different kind, ranging from medical images (e.g. MRI scans [11]) to industrial product specifications (e.g. computer aided design and manufacturing [56]), or the layout of buildings or campuses [31, 32, 3]. In addition to describing aspects of the real world, spatial information may also describe virtual environments [26]. Beyond virtual environments, we may even consider space in a metaphorical way for describing or reasoning about the meaning of concepts, viewed as regions in a multi-dimensional space (e.g. the conceptual spaces of Gardenf¨ors [25]). Spatial information may have various origins. Geographic information may be derived from satellite images or other types of remote sensing, it may be collected using various types of sensors, it may result from the collection of census data, or from textual descriptions, among others. Other types of spatial information may be explicitly created by a human user (e.g. in the case of computer aided design), although it may also be derived e.g. from scanning devices, from a robot mapping its environment, or again from textual descriptions. Given this diversity, it should not come as a surprise that spatial information has been studied from different angles in different research communities, as described in more detail in Section 2. Dealing with Spatial Uncertainty Appropriate abstractions are needed to deal with the complexity of spatial configurations. A first observation is that some information is naturally associated with points, e.g. the altitude of a location, while other types of information pertain only to regions, e.g. the population of a city. A common approach to deal with the former type is to discretize a bounded fragment of space into a finite number of cells. This allows us to quantify the value of the parameters associated with each cell, leading to the so-called field-based models, which are also referred to as grids or matrices. To deal with spatial information that is not tied to particular points, we may introduce a mechanism to refer to spatial entities of interest, e.g. using names or coordinates, and to describe relations between them. Both the qualitative and quantitative spatial information available to applications usually contains uncertainty. Common to other applications that rely on measurement of observed phenomena, sensor data may be only imprecisely known (e.g. in the form of an interval); see e.g. [28]. Similarly, for instance, the classification of land coverage based on remote sensing images may be given in the form of a probability distribution [47]. In addition to these problems related to data acquisition, the way applications use spatial information may, by itself, introduce uncertainty. For instance, as it is impossible to make measurements at every point on Earth, interpolation or extrapolation techniques are required to estimate parameters for points where measured values are missing [9]. Information fusion is another process which may introduce uncertainty [16]. Indeed, in the face of conflicting sources, inconsistency may cast doubts on previously held beliefs, thus introducing uncertainty about them. A final cause for uncertainty is related to the use of labels to refer to places, their properties, and the relations between them. For instance, in fusion problems,
Introduction: Uncertainty Issues in Spatial Information
3
apart from the uncertainty affecting the pieces of information, the fact that different sources may use different partitions of space (e.g. electoral wards vs. parishes) is another reason for imprecision or uncertainty. Also, the labels used to describe properties of regions may belong to different ontologies (e.g. agricultural vs. botanical terminology). While the use of labels is paramount for interactions between a system and human users, they give rise to many practical problems. For instance, how do you say Gen`eve in Italian (Genova, Geneva)? Ghent in French? Lille in Dutch? Even worse, the same label is often used to refer to different places (e.g. Paris, France vs. Paris, Texas [33]), and even when a label unambiguously refers to one place, the spatial extent of this place may be ill-defined (e.g. downtown Toulouse [40]). Similar considerations apply to spatial relations (e.g. near, North-West of, etc.) and properties of places (e.g. densely populated). Note that labels may be used both for stating information, and for expressing queries. An example of the former case may be spatial representations that are derived from textual information (e.g. the web). In the case of queries, the uncertainty in the meaning of the labels may suggest some flexibility regarding what solutions are acceptable. Spatial Information Processing In practice, information processing is often divided into several subproblems. At a high level, we may consider three main steps [15], although they may not all be present in every application. The first one aims at clarifying raw information, e.g. by cleaning sensor information, or by synthesizing and structuring it. In the spatial domain, this may involve removing noise from remote sensing images, as well as analyzing and interpreting them. When collecting census information, for instance, this step also involves the assessment of the confidence levels in different sources. The second step obtains the information needed to address the problem at hand. This involves retrieving and combining information from one or more data repositories, and reasoning about it. Apart from querying geographic information systems (GIS) in general, this step may refer to the aforementioned information fusion and interpolation/extrapolation problems, or to a diagnosis based on the interpretation of medical images. The third and last step uses the retrieved information for decision making, or solving design or optimization problems: finding the best location for building a nuclear power plant, managing air traffic control, or deriving a strategy for avoiding the future flooding of a river. The three steps outlined above are usually studied in different research communities, as briefly described in the next section, but emphasizing only the issue of handling uncertainty. As we shall see in the last section, providing an overview of the book, most of the chapters address problems related to the second step.
2 Research Areas in Spatial Information The causes and remedies of spatial uncertainty have been studied for a long time. Spatial uncertainty is related mostly to spatial reasoning, to decision making
4
R. Jeansoulin et al.
involving space, and to the difficulty, sometimes the impossibility, of building a deterministic approach. The research about spatial uncertainty in the last decade has focused on the following domains: • uncertainty in spatial cognition, artificial intelligence, and vision, which is coupled to spatial reasoning; • uncertainty in geographic information systems, which is coupled to spatial data quality and spatial statistics. Uncertainty in Spatial Cognition, Artificial Intelligence, and Vision Spatial uncertainty has been extensively studied in the domain of cognitive science, from psychology to artificial intelligence. What does it mean to say “I’ve got no sense of direction”? What kind of information must a robot keep in memory to find its path? Choosing the right representation of space for a particular application is always an issue: every time you have to explain a route by phone, you must face a new problem. Is it relevant to use East-West indications? Or rather use “towards downtown”, “towards the river”, etc.? It always depends on what is the easiest to grasp, the most secure information that you will not mishandle, and not mismatch with a similar but wrong reference. Names are not the only source of spatial uncertainty; visual perception is one too. Optical illusions have been noticed and investigated for centuries. They are sometimes used in architecture, in the Greek Parthenon, in art works — noticeably in the Italian and French Renaissance — and more recently by the artist M.C. Escher. Interestingly enough, artificial intelligence constraint satisfaction approaches may be successfully applied for deciding if a line drawing of a three-dimensional object is actually realizable in physical space [12]. Optical illusions have been related to unconscious inferences, an idea first suggested in the mid 19th century by H. von Helmholtz, and to inhibition-influenced vision, by E. Mach. Experimental psychology has also addressed this problem, for instance, focusing on the question of the possible existence of a specific spatial working memory. Experiments made with 3D rotated figures seem to demonstrate that the difficulty of recognizing shapes depends on the rotation angle [38]. Based on such experiments D. Marr developed his computational theory in the late 1970s, collected in his posthumous book “Vision” [37]. At about the same time, the book “Mental Models” by P. Johnson-Laird [36] was published. It makes intensive use of relational reasoning, including spatial and temporal reasoning, as well as defeasible reasoning which allows for a change in one’s beliefs in the face of new observations. When communicating such a relational description, one of the arguments is used as a reference object. The proper choice of the reference objects is important when communicating a series of relational statements, as it may make the overall message easier to understand [20]. In artificial intelligence, leaving aside the early work on computer vision [4], the issue of reasoning about time, space, and uncertainty were considered independently at first. But only once temporal reasoning had been sufficiently mastered, after the introduction of Allen’s temporal interval relations [1], research on spatial reasoning has blossomed [19, 45, 6, 46, 10]. In parallel, studies in uncertainty representation
Introduction: Uncertainty Issues in Spatial Information
5
have led to the development of various settings beyond probability theory, namely Zadeh’s fuzzy sets [57, 30] and possibility theory [14], Shafer’s theory of belief functions [51], and imprecise probability [54]. The most prominent artificial intelligence domain where uncertainty and space have been jointly addressed — already since the late 1980s — is robotics, for automated planning purposes [31]. In particular, simultaneous localization and mapping (SLAM) [53, 41, 17] is a technique used in (mobile) robotics to build up a map within an unknown environment. Estimating uncertain spatial relationships is then one of the key issues. In spite of preliminary early attempts [27, 22] there have been few applications of fuzzy set based methods in robot navigation, with [58, 43, 49] being some of the exceptions. However, outside robotics, some works have focused on representing and reasoning about fuzzy spatial relations [18, 21, 34, 7, 50]. Uncertainty in Geographic Information Systems From artificial intelligence and robotics, we go to another domain where spatial uncertainty is definitely a big issue: cartographic mapping. Cartographic representations, land surveys, remote sensing, geographic information systems, and global positioning are providing data for the two main subdomains of mapping: the mapping of administrative and man-made spatial features, on the one hand, and the mapping of natural resources on the other. It has been a long time before men were able to use maps in the way we know them today. Cadastres are as old as tax collection, invented by the first Babylonian monarchs or Egyptian pharaohs. Areas and relative positions were all they needed to compute taxes, but errors and quarrels fed the courts, as they continue to feed them today. Geometers gained importance as land surveyors, as “arpenteurs”, and often, turning around spatial uncertainty, their statements became the reality. But this does not erase uncertainty, it merely adds other constraints to the system, which sometimes helps, and sometimes hampers the decision making. Spatial uncertainty analysis, in this context, mostly relates to geographic data quality, in the sense that the role of data quality management is to reduce the uncertainty in making decisions. Spatial uncertainty analysis has been, since the 1990s, an acknowledged discipline that integrates expertise from geographical sciences, remote sensing, spatial statistics and many others. Several international organizations have working groups on these issues: ISPRS (quality of spatio-temporal data and models), the European AGILE (spatial data usability), the International Cartographic Association (spatial data uncertainty and map quality). Standardisation bodies are concerned too: ISO 9000, which addresses the production and distribution of goods and services, and those in the field of geographic information, e.g. FGDC, OGC, CEN. There is a community of engineers in spatial uncertainty in the various national mapping agencies and in large private companies such as Microsoft and Google, as indeed the quality of geographic information is clearly an issue for them. In addition, several books have been published on this topic. In particular, [29] presents reflections by members of the International Cartographic Association, while [28, 52, 13] present research
6
R. Jeansoulin et al.
breakthroughs on issues related to the quality and the uncertainty of geographic information, while [59, 2] focus on uncertainty in geographic data. Cartographic mapping is not the only area related to GIS in which uncertainty appears. In the early years of image processing [48], following the release of the first digital cameras on Earth and above (with the release of Landsat Imagery in the early 1970s [55]), there has been a big boost in the automated acquisition of geographical information. Indeed, the automated processing of remote sensing images has proven to be an invaluable tool for feeding field-based models. Clearly, handling uncertainty is a central issue when interpreting remote sensing images in this way. Besides, spatial uncertainty is also prevalent in natural resources assessment [42], and must be estimated as it propagates in ecological models [8]. Geostatistics is a branch of statistics developed originally to predict distributions for mining operations [39]. It is currently applied in diverse branches of hydrology, meteorology, landscape ecology, soil science and agriculture (especially in precision farming), and branches of geography, such as epidemiology, and planning (logistics).
3 Overview Generations of children, reading “Hop o’ My Thumb” by Charles Perrault, also known under its original French title “Le Petit Poucet”, have been scared by spatial uncertainty. They were afraid to lose their path back, if being forced to decide their way: turn right or left? The intent of this book is not to scare children, nor to scare scholars. Still, we may give the reader some small white pebbles to keep him or her from getting lost in the forest of the literature on spatial uncertainty handling, or in the bush of the chapters that follow. To the best of our knowledge, this is the first book which is devoted to spatial uncertainty handling outside the traditional GIS setting. Uncertainty issues are especially addressed from a representation and reasoning point of view. In this sense, we only consider the second step of the information processing chain as described at the end of the first section. As such, the book does not discuss uncertainty issues in the interpretation of remote sensing images at the acquisition stage, and does not exclusively focus on geographical applications. Similarly, the book does not encompass planning and reasoning about actions in spatial environments, which is why uncertainty issues in robotics are not considered. The concept of uncertainty by itself is understood in a broad sense, including both quantitative and more qualitative approaches, dealing with variability, epistemic uncertainty, as well as with vagueness of terms. The contributions of this book are clustered around three general issues: i) description of spatial configurations, ii) symbolic reasoning and information merging, and iii) prediction and interpolation. Part 1 especially focuses on modeling uncertainty in the meaning (vagueness) of linguistic constructs that are used to describe spatial relations and predicates. The first chapter in this part, by B. Bennett, presents a tutorial overview of diverse methods to deal with the problem of spatial vagueness, before focusing in more detail on a new method called standpoint semantics. This
Introduction: Uncertainty Issues in Spatial Information
7
latter approach refines the supervaluation semantics of Fine [23], by adding structure on the set of possible precisifications. In particular, in the spatial domain, precisifications of a vague predicate or relation may depend on the standpoint one takes regarding the most appropriate thresholding of some parameter. The second chapter, by P. Matsakis, L. Wendling and J. Ni, is concerned with describing relative positions between objects, and spatial relations to reference objects, where an object is a crisp or fuzzy region, in raster or vector form, of 2D or 3D space. The models presented may be used, e.g. to identify the most salient spatial relation between two given objects, or to identify the object that best satisfies a given relation to a reference object. Subsequently, the chapter by I. Bloch explores the idea of bipolarity in the modeling of spatial information. The idea is to distinguish between locations considered as being really possible for a given object, and locations which are only not impossible. This bipolar view is then embedded in the framework of fuzzy mathematical morphology, and finally illustrated on a medical application. The last chapter of the first part, by T. Beaubouef and F. E. Petry, provides a broad overview of the possible uses of fuzzy and rough sets [44] in geographical information systems. Rough sets naturally allow for a granular view of space and of the description of land coverage, while the use of fuzzy sets and relations applies to the modeling of linguistic terms. Special attention is paid to the modeling of rough spatial relations, to the use of spatial indexing techniques, such as R-trees, for fuzzy regions, and to rough object-oriented spatial databases. Part 2 of the book deals with applications in artificial intelligence, and in particular with the problem of reasoning about spatial relations, and dealing with inconsistency in information merging. The first chapter by F. Dupin de Saint-Cyr, O. Papini and H. Prade provides an extensive survey of propositional and modal logics for describing mereo-topological or geometrical relations between regions, and for handling properties associated with regions. The handling of uncertainty in such frameworks is discussed. In particular, one may be uncertain whether a property holds in a region, and if it does, whether it holds everywhere in the region, or only in some part(s) of it. The next chapter, by O. Doukari, R. Jeansoulin and E. W¨urbel, presents a particular approach to the revision of propositional knowledge bases when receiving new information, which is well-suited for geographical information. This approach is centered around an assumption of locality, where conflicts related to one region of space do not affect what is known about regions that are far away. The third chapter of the second part, by O. Cur´e, addresses the problem of merging spatial ontologies that are used to describe properties of regions. To deal with the problem of heterogeneous vocabulary usage, an approach based on formal concept analysis [24, 5] is proposed which enables the creation of concepts that are not encountered in any of the given ontologies, and manages the resulting uncertainty. The last chapter, by S. Schockaert and P. D. Smart, deals with generating spatial scenarios which are compatible with available, and possibly conflicting, spatial constraints, using a genetic algorithm. To handle potential conflicts in a more flexible way, the approach may result in fuzzy regions, which are represented as a finite collection of nested polygons.
8
R. Jeansoulin et al.
Part 3 gathers chapters about interpolation and prediction of spatial phenomena. The first three chapters are methodologically oriented, while the two others are directly motivated by real-world case studies. The chapter by A. Stein presents a decision making approach based on the use of remote sensing images. In an image mining approach it discusses how such images are obtained and interpreted, how the resulting information may be used to identify objects, how these objects are tracked over time, and how this may lead to meaningful predictions for the future. Special consideration is given to issues of data quality, and solutions are provided based on fuzzy set theory and spatial statistics. The approach is illustrated with a case study on the flooding of Tongle Sap Lake in Cambodia. Next, the chapter by K. Loquin and D. Dubois deals with the kriging methodology used in geostatistics for the interpolation and extrapolation of parameters which are known only at a finite number of points in space. The kriging methodology is supposed to account for the uncertainty induced by the variability of the considered parameter over space. However, the chapter emphasizes the fact that the epistemic uncertainty appearing both in data specification and random function estimation is not properly taken into account by the standard approach. Subsequently, the merits and limitations of fuzzy and interval based extensions of kriging for handling epistemic uncertainty are discussed. The third chapter, by F. Parisi, A. Parker, J. Grant and V. S. Subrahmanian is concerned with spatial probabilistic temporal databases. The goal is to provide efficient support for queries asking for all pairs of objects and time points such that the object is in a specified region at that time, with a probability that is within a given interval. Solutions to such queries may be represented as convex polytopes in some high-dimensional space, and methods are provided for approximating such polytopes in an efficient way. The approach is evaluated on both synthetic data and real-world data about ship locations. The next chapter, by C. de Runz and E. Desjardin, presents a way to deal with scarce pieces of evidence, obtained from archaeological data. The goal is to reconstruct plausible spatial configurations (e.g. the layout of the streets in an ancient city), and to visualize them. The proposed method takes advantage of a fuzzy extension of the Hough transform from image processing, which may be applied to fuzzy pieces of information. Finally, the chapter by G. Fusco uses a Bayesian network methodology for predicting the evolution of spatial networks derived from data about flows through a dominant flows approach. Such networks may be as diverse as representing people commuting between their home and work, money transfers between different areas, or the migration of people between different suburbs. The presented case study is based on commuter trips in a region of Southeastern France. As a whole, the book intends to illustrate the different circumstances where spatial uncertainty may be encountered, and the different approaches that may be considered to cope with it.
Introduction: Uncertainty Issues in Spatial Information
9
References 1. Allen, J.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 832–843 (1983) 2. Atkinson, P., Foody, G.: Uncertainty in Remote Sensing and GIS: Fundamentals. Wiley & Sons, Chichester (2002) 3. Bahl, P., Padmanabhan, V.N.: Radar: An in-building RF-based user location and tracking system. In: Proceedings of IEEE Infocom, vol. 2, pp. 775–784 (2000) 4. Ballard, D., Brown, C.: Computer Vision. Prentice Hall, Englewood Cliffs (1982) 5. Barbut, M., Monjardet, B.: Ordre et Classification, Alg`ebre et Combinatoire, Hachette, vol. 2 (1970) 6. Bennett, B.: Determining consistency of topological relations. Constraints 3(2-3), 213–225 (1998) 7. Bloch, I.: Spatial reasoning under imprecision using fuzzy set theory, formal logics and mathematical morphology. International Journal of Approximate Reasoning 41(2), 77–95 (2006) 8. Burrough, P.A., McDonnell, R.A.: Principles of Geographical Information Systems. Oxford University Press, Oxford (1998) 9. Chil`es, J.-P., Delfiner, P.: Geostatistics: Modeling Spatial Uncertainty. Wiley, Chichester (1999) 10. Cohn, A., Renz, J.: Qualitative spatial representation and reasoning. In: van Hermelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation, pp. 551–596. Elsevier, Amsterdam (2008) 11. Colliot, O., Camara, O., Bloch, I.: Integration of fuzzy spatial relations in deformable models–Application to brain MRI segmentation. Pattern Recognition 39(8), 1401–1414 (2006) 12. Cooper, M.: Line Drawing Interpretation. Springer, Heidelberg (2008) 13. Devillers, R., Jeansoulin, R.: Fundamentals of Spatial Data Quality. Geographical Information Systems. ISTE (2006) 14. Dubois, D., Prade, H.: Possibility Theory: An Approach to Computerized Processing of Uncertainty. Plenum Press, New York (1988) 15. Dubois, D., Prade, H., Yager, R.R.: Information engineering and fuzzy logic. In: Proceedings of the Fifth IEEE International Conference on Fuzzy Systems, pp. 1525–1531 (1996) 16. Dupin de Saint-Cyr, F., Jeansoulin, R., Prade, H.: Fusing uncertain structured spatial information. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 174–188. Springer, Heidelberg (2008) 17. Durrant-Whyte, H., Bailey, T.: Simultaneous localisation and mapping (SLAM): Part I the essential algorithms. Robotics and Automation Magazine 13(2), 99–110 (2006) 18. Dutta, S.: Approximate spatial reasoning: integrating qualitative and quantitative constraints. International Journal of Approximate Reasoning 5(3), 307–330 (1991) 19. Egenhofer, M., Franzosa, R.: Point-set topological spatial relations. International Journal of Geographical Information Systems 5(2), 161–174 (1991) 20. Ehrlich, K., Johnson-Laird, P.: Spatial descriptions and referential continuity. Journal of Verbal Learning and Verbal Behavior 21(3), 296–306 (1982) 21. Esterline, A., Dozier, G., Homaifar, A.: Fuzzy spatial reasoning. In: Proceedings of the 1997 International Fuzzy Systems Association Conference, pp. 162–167 (1997) 22. Farreny, H., Prade, H.: Tackling uncertainty and imprecision in robotics. In: Proceedings of the 3rd International Symposium on Robotics Research, pp. 85–91 (1986) 23. Fine, K.: Vagueness, truth and logic. Synthese 30, 265–300 (1975)
10
R. Jeansoulin et al.
24. Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999) 25. G¨ardenfors, P.: Conceptual Spaces: The Geometry of Thought. MIT Press, Cambridge (2000) 26. Gillner, S., Mallot, H.: Navigation and acquisition of spatial knowledge in a virtual maze. Journal of Cognitive Neuroscience 10(4), 445–463 (1998) 27. Goguen, J.: On fuzzy robot planning. In: Zadeh, L.A., Fu, K.-S., Tanaka, K., Shimura, M. (eds.) Fuzzy Sets and their Applications to Cognitive and Decision Processes, pp. 429–447. Academic Press, London (1975) 28. Goodchild, M., Jeansoulin, R.: Data Quality in Geographic Information: From Error to Uncertainty. Hermes (1998) 29. Guptill, S., Morrison, J.: Elements of Spatial Data Quality. Pergamon, Oxford (1995) 30. Klir, G., Bo, Y. (eds.): Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems: Selected Papers by L.A. Zadeh. World Scientific, Singapore (1996) 31. Latombe, J.-C.: Robot Motion Planning. Kluwer Academic Publishers, Norwell (1991) 32. Laumond, J.-P.P.: Robot Motion Planning and Control. Springer, New York (1998) 33. Leidner, J.L.: Toponym Resolution in Text. CRC Press, Boca Raton (2008) 34. Li, Y., Li, S.: A fuzzy sets theoretic approach to approximate spatial reasoning. IEEE Transactions on Fuzzy Systems 12(6), 745–754 (2004) 35. Longley, P., Goodchild, M., Maguire, D., Rhind, D.: Geographical Information Systems and Science. John Wiley & Sons, Chichester (2005) 36. Mani, K., Johnson-Laird, P.: The mental representation of spatial descriptions. Memory and Cognition 10(2), 181–187 (1982) 37. Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co. (1982) 38. Marr, D., Nishihara, H.: Representation and recognition of the spatial organization of threedimensional shapes. Proceedings of the Royal Society of London. Series B, Biological Sciences 200(1140), 269–294 (1978) 39. Matheron, G.: Trait´e de G´eostatistique Appliqu´ee. Editions Technip (1962) 40. Montello, D., Goodchild, M., Gottsegen, J., Fohl, P.: Where’s downtown?: Behavioral methods for determining referents of vague spatial queries. Spatial Cognition & Computation 3(2), 185–204 (2003) 41. Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: FastSLAM: A factored solution to the simultaneous localization and mapping problem. In: Proceedings of the 18th National Conference on Artificial Intelligence, pp. 593–598 (2002) 42. Mowrer, H., Congalton, R.: Quantifying Spatial Uncertainty in Natural Resources: Theory and Applications for GIS and Remote Sensing. CRC Press, Boca Raton (2000) 43. Oriolo, G., Ulivi, G., Vendittelli, M.: Fuzzy maps: a new tool for mobile robot perception and planning. Journal of Robotic Systems 14(3), 179–197 (1997) 44. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Springer, Heidelberg (1991) 45. Randell, D., Cui, Z., Cohn, A.: A spatial logic based on regions and connection. In: Proceedings of the 3rd International Conference on Knowledge Representation and Reasoning, pp. 165–176 (1992) 46. Renz, J., Nebel, B.: On the complexity of qualitative spatial reasoning: A maximal tractable fragment of the Region Connection Calculus. Artificial Intelligence 108(1-2), 69–123 (1999) 47. Richards, J., Jia, X.: Remote Sensing Digital Image Analysis: An Introduction. Springer, Heidelberg (2006) 48. Rosenfeld, A., Kak, A.: Digital Picture Processing. Academic Press, London (1982)
Introduction: Uncertainty Issues in Spatial Information
11
49. Saffiotti, A.: The uses of fuzzy logic in autonomous robot navigation. Soft Computing 1(4), 180–197 (1997) 50. Schockaert, S., De Cock, M., Kerre, E.E.: Spatial reasoning in a fuzzy region connection calculus. Artificial Intelligence 173(258-298) (2009) 51. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976) 52. Shi, W., Fisher, P., Goodchild, M.: Spatial Data Quality. Taylor & Francis, Abington (2002) 53. Smith, R.C., Cheeseman, P.: On the Representation and Estimation of Spatial Uncertainty. The International Journal of Robotics Research 5(4), 56–68 (1986) 54. Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman & Hall, Boca Raton (1991) 55. Williams, R., Carter, W.: ERTS-1, a new window on our planet. U.S. Geological Survey (1976) 56. Wilson, R., Latombe, J.: Geometric reasoning about mechanical assembly. Artificial Intelligence 71(2), 371–396 (1994) 57. Yager, R., Ovchinnikov, S., Tong, R., Nguyen, H. (eds.): Fuzzy Sets and Applications: Selected Papers by L.A. Zadeh. Wiley, Chichester (1987) 58. Yen, J., Pfluger, N.: A fuzzy logic based extension to Payton and Rosenblatt’s commandfusion method for mobile robot navigation. IEEE Transactions on Systems, Man and Cybernetics 25(6), 971–978 (1995) 59. Zhang, J., Goodchild, M.: Uncertainty in Geographical Information. Taylor & Francis, Abington (2002)
Part 1: Describing Spatial Configurations
Spatial Vagueness Brandon Bennett
Abstract. This chapter explores the phenomenon of vagueness as it relates to spatial information. It will be seen that many semantic subtleties and representational difficulties arise when spatial information is affected by vagueness. Moreover, since vagueness is particularly pervasive in spatial terminology, these problems have a significant bearing on the development of computational systems to provide functionality involving high-level manipulation of spatial data. The paper begins by considering various foundational issues regarding the nature and semantics of vagueness. Overviews are then given of several approaches to spatial vagueness that have been proposed in the literature. Following this, a more detailed presentation is given of the relatively recently developed standpoint theory of vagueness and how it can be applied to spatial concepts and relations. This theory is based on the identification of parameters of variability in the meaning of vague concepts. A standpoint is a choice of threshold values determining the range of variation over which a vague predicate is judged to be applicable. The chapter concludes with an examination of a number of particularly significant vague spatial properties and relations and how they can be represented.
1 Introduction Suppose I am asked whether the Leeds City Art Gallery is close to Leeds University. Perhaps am not sure what to say. Suppose I then take an accurate map and determine the distance between the university main entrance and the entrance to the art gallery is 965m. Now, I have a pretty accurate measure of the distance but still I may be undecided as to whether to describe the gallery as ‘close to’ the university. This is because ‘close to’ is a vague relation. The example, illustrates a key property of vagueness, which is that it is distinct from uncertainty. Vagueness is not a result of Brandon Bennett School of Computing, University of Leeds e-mail:
[email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 15–47. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
16
B. Bennett
lack or imprecision of our knowledge about the world, rather it arises from lack of definite criteria for the applicability of certain linguistic terms. Even with complete and accurate knowledge of all relevant objective facts, one may be unsure as to whether a vague description is appropriate to a given situation. In this chapter, I shall examine the phenomenon of vagueness as it relates to spatial information. We shall see that vagueness is particularly pervasive in spatial terminology, and also that, where a description involves both vagueness and spatiality, semantic subtleties are encountered that do not arise when vagueness operates in a non-spatial context. I shall begin by examining some fundamental issues relating to our understanding of vagueness, how it should be represented, and how it interacts with spatial information and representations. Following this, I shall give overviews of some approaches to spatial vagueness that have been proposed in the literature. I shall then present a particular analysis of spatial vagueness, which is based on the identification of parameters of variability in the meaning of vague predicates and on the notion of a standpoint. A standpoint is a choice of threshold values determining the range of variation over which a vague predicate is judged to be applicable. Finally, I shall look in more detail at certain vague spatial predicates that I consider to be of particular significance and examine how these might be modelled in the context of a formal representation system.
2 The Nature of Vagueness 2.1 Distinguishing Vagueness from Generality and Uncertainty ‘Vagueness’ as I use the term is distinct from both generality and uncertainty. A predicate or proposition may be more or less general according to the range of individuals to which it applies or the range of circumstances under which it holds. However, as long as the boundary between correct and incorrect applications is definite, I shall not consider generality to be a form of vagueness. For example, the sentence ‘Tom’s house is within 5 miles of the Tower of London’ is general but not vague, since its truth can be determined by measuring the distance between the two buildings. By contrast ‘Tom lives near to the Tower of London’ is vague, since even when I know the exact distance, the truth of this sentence is indeterminate. A proposition is uncertain if we do not know whether it is true or not. In most circumstances we describe a proposition as uncertain when the reason we do not know whether it is true is that we do not possess complete and accurate knowledge about the state of the world. In so far as the term ‘uncertainty’ is taken to apply only in such cases, vagueness is distinct from uncertainty, since it may occur even where we do have complete and accurate knowledge of the world. However, some have argued (notably Williamson (1992)) that vagueness does indeed arise from lack of knowledge: lack of knowledge about the meanings of terms; and is thus a kind of uncertainty. This position is known as the epistemic view of vagueness. Although it has a devoted following, the epistemic view is not widely accepted. The
Spatial Vagueness
17
main objection is that, in the case of a vague linguistic term, there is no fact of the matter as to what is its true precise meaning. Hence, vagueness does not consist in ignorance of any objective fact and is thus not epistemic in nature. Proponents of the epistemic view may counter this objection by claiming that there is a fact of the matter but it is unknowable.1 My opinion is that vagueness is distinct from uncertainty and that any strong form of the epistemic view is misleading. Nevertheless, vagueness and uncertainty do share significant logical properties and one would expect similar kinds of formal representation to apply to the two phenomena. So, if vagueness is not a kind of generality or uncertainty, what is it? As one might expect, many different theories have been proposed. In fact, even once we have distinguished vagueness from generality and uncertainty, there may well be more than one semantic phenomena operating under the name of ‘vagueness’.
2.2 Vagueness in Different Linguistic Categories Whether or not there are multiple kinds of vagueness, it is certainly true that vagueness affects different types of linguistic expression, different syntactic categories, and these must be treated in, at least superficially, and perhaps fundamentally, distinct ways. I need only briefly consider the category of propositions. Obviously, propositions can be and often are vague. However, it is, I believe, uncontroversial that propositions are only vague in so far as they contain constituent parts that are vague. Hence, I shall proceed to examine sub-propositional expressions. But in considering these I shall of course be concerned with the central semantic issue of how they contribute to the truth conditions of propositions in which they occur. Perhaps the most obvious and most studied type of vague expression is adjectives, prime examples being ‘bald’, ‘rich’ and ‘red’. This category also includes many spatially oriented adjectives, such as ‘tall’, ‘short’, ‘large’, ‘small’, ‘elongated’, ‘steep’, ‘undulating’ etc.. For such concepts, their vagueness seems to lie in the lack of clearly defined criteria for their applicability. Thus, when a vague adjective is predicated of an individual, the resulting proposition will be true in some cases and false in others, but for some individuals it will not be clear whether the predication should be considered true or false — such individuals are borderline cases. For count noun predicates such as ‘table’, ‘mountain’, ‘lake’, ‘village’, the manifestation of vagueness is similar but not exactly the same as in adjectives. As Kamp (1975) observed, the vagueness of count nouns differs from that of adjectives as to the number of parameters of variation that are usually involved. The applicability of an adjective typically depends on a relatively small number of factors (tallness 1
Lawry (2008) has proposed a weakened version of the epistemic view, which he calls the epistemic stance. Lawry suggests that while there may be no objective fact of the matter underlying the meaning of vague terms, people tend to use vague language as if there were.
18
B. Bennett
depends on height, redness on colour).2 By contrast, the conditions for applicability of a count noun usually involve a large number of factors. For instance, a table should be made of suitably solid material; it should have a flat surface, which is supported by legs; its height and other dimensions should be within certain appropriate ranges; and various other constraints should apply to its shape. The reason these diverse characteristics are gathered together within the meaning of ‘table’ is that objects with such a combination of properties are useful to people, and hence are frequently encountered and referred to by language users.3 This correlation of relevant attributes also means that for many count nouns it is difficult to decide which of them are essential and which are merely typical features (c.f. Waismann (1965)). Relational expressions may also be vague, and in the spatial domain the prototypical examples are the relations ‘... is near to ...’ and ‘... is far from ...’. The vagueness of such relations appears to be similar to that of adjectives. That is, their applicability is dependent on one or two parameters (for instance distance) but the values of the parameter(s) for which the relation is applicable are not precisely determined. As well as affecting predictive expressions, vagueness may also be associated with referential terms, i.e. names and definite descriptions. In the first category we have nominal expressions such as ‘Mount Everest’ and ‘The Sahara Desert’. In the second we have more complex constructs such as ‘the foothills of Mount Everest’ and ‘the area around the church’. There has been considerable debate about whether the vagueness of nominals is parasitic upon that of predicative expressions, or whether it is a separate form of vagueness. One way that vagueness of proper names might be analysed in terms of vagueness of predicates is via the following logical re-writing of a nominal into a definite description, which is then represented using Russell’s analysis of definite descriptions:4
Φ (‘Mount Everest’) ≡ ∃!x[Mountain(x) ∧ Named(x, ‘Mount Everest’) ∧ Φ (x)] Under this analysis, the vagueness of ‘Mount Everest’ is completely explained by the vagueness of ‘mountain’. A similar approach could be applied to definite descriptions, although these expressions can take many different forms and may require a variety of different analyses involving constituent vague count nouns, adjectives or relations, or some combination of these. In opposition to this analysis, is the view that the vagueness of nominal expressions cannot or should not be reduced to vagueness of predicative expressions, but has its own mode of operation. Here again two contrasting explanations can be given: one is that vague nominals indeterminately refer to one of many possible referents; the other is that the referents of vague nominals are intrinsically vague 2
3 4
There are some adjectives that might be considered exceptions to this. For example, ‘intelligent’ depends on intelligence, but this quality is manifest in many different attributes. Nevertheless, we often do treat intelligence as if it were a single measurable quantity. The phenomenon of clustering of exemplars, where logically distinct properties tend in the real world to occur in combination, is discussed in (Bennett, 2005). Following Russell, the syntax ∃![Φ (x)] is used to abbreviate ∃x[Φ (x) ∧ ∀y[Φ (y) → y = x]].
Spatial Vagueness
19
entities. According to the first view, the semantics for vague nominals would involve an indeterminate denotation function, perhaps modelled by a one-to-many function or by a collection of possible functions (such an account is usually called supervaluationist and will be considered in detail is section 6). According to the second view, the semantics must provide an explicit model of vague entities and incorporate a domain of such entities as the range of the denotation function for nominals. I shall further consider these alternatives in the next section which deals with the issue of whether vagueness is intrinsic or linguistic. Our consideration of vagueness in different categories of linguistic expression may be summarised as follows. Vagueness is present in its most obvious form in adjectives and relations, where conditions of applicability are indeterminate but are typically dependent on a small number of relevant properties. The vagueness of count nouns is similar to that of adjectives, but is typically dependent on more factors and it is often unclear which factors are relevant. The nature of the vagueness of proper nouns and definite descriptions is controversial. In relation to spatial information, the difference between vagueness located in a relation and vagueness located in a nominal expression is illustrated by the following example sentences: 1. 2. 3. 4.
The treasure is 20 km from the summit of Mount Everest. The treasure is 20 km from of Mount Everest. The treasure is near to the summit of Mount Everest. The treasure is near to Mount Everest.
In sentence 1 both the relation and the reference object are precise. In 2 the relation is precise but the reference object vague, whereas in 3 the relation is vague but the reference object precise. Finally, in 4 both relation and reference object are vague.
2.3 Is Vagueness Intrinsic or Linguistic? A fundamental issue in the ontology of vagueness concerns whether vagueness is an intrinsic property of certain kinds of real-world object (often called de re or ontic vagueness; or whether it is an entirely linguistic phenomenon (de dicto vagueness). For instance Tye (1990) supports the view that there are actual objects in the world that are vague (as well as vague linguistic expressions); whereas Varzi (2001a)) argues that all vagueness is essentially linguistic.5 It is not clear whether the handling of vagueness within a computational information system requires one to take a philosophically defensible position on ontological status of vagueness. However, the debate does have some bearing on the choice of a suitable representational and semantic framework. If one takes the view that vagueness is purely linguistic, then the phenomenon will be modelled in terms of indeterminate interpretation of predicates and/or naming terms, but the objects to which the predicates and names are applied will be modelled as precisely 5
A much discussed paper by Evans (1978) contains a formal proof that appears to show that some fundamental logical problems may arise from taking an ontic view of vagueness. However, the significance and implications of Evans’ proof are unclear and controversial.
20
B. Bennett
determined entities. For instance, the denotation of “Mount Everest” would be considered to be indeterminate, but each possible denotation would correspond to a precisely bounded volume of matter. By contrast, if one adopts an ontic view of vagueness, then the term “Mount Everest” would denote an entity that is in itself vague. For instance it could be identified with a fuzzy set of points, or with a cluster of possible extensions. Tye (1990) suggests that vagueness can be present both in predicates and also in objects. He argues that the vagueness of objects cannot simply be explained by saying that they are instances of vague predicates. In the case of material objects, Tye proposes that a vague material object is one for which the set of parts of the object is not fully determinate.6 This condition is of particular relevance to the investigation of spatial vagueness because it specifically identifies ontic vagueness with indeterminacy of spatio-temporal parts. I shall not take a rigid position on whether vagueness is primarily linguistic or ontic. In fact, it seems to me that it may not be possible to make a sharp distinction between ontic and linguistic properties. This is because the entities to which we refer are not given prior to our linguistic conventions; rather the ways that we use language play a significant role in determining the domain of objects. So the entities to which we might ascribe ‘ontic’ properties are themselves, at least partially, determined by linguistic convention and stipulation. This is especially evident in the case of predicates that are both vague and spatial. As we shall see in the next section, when considering the truth conditions for a predicate that vaguely characterises a spatial entity, one cannot assume that there is a pre-determined and well-defined object to which the predicate is applied. Moreover, from the point of view of establishing formal representations and semantics, the issue of whether vagueness should be modelled in terms of vague entities or in terms of indeterminate linguistic reference may be seen as more a matter of technical convenience than philosophical significance.
3 Vagueness and Spatiality Circumstances where vagueness interacts with spatiality present issues that do not arise where vagueness operates in a non-spatial context. In this section I describe some of the main issues that arise and present some illustrative examples.
3.1 The ‘Sorites’ Paradox The issue that has dominated much of the debate about vagueness over the last couple of millenia is the logical anomaly known as the sorites paradox. I will not be looking into this in detail here since, although it affects many spatial predicates, the 6
In fact Tye also suggests the further criterion that “there is no determinate fact of the matter about whether there are objects that are neither parts, borderline parts, nor non-parts of o”. This condition relates to the issue of second-order vagueness (i.e. the vagueness of the borderline between clear and indeterminate cases), which will not be considered in this chapter.
Spatial Vagueness
21
paradox is not essentially spatial in nature. Hence, considering it from a specifically spatial angle would not add much to the a great deal has already been written about this topic elsewhere (see e.g. Williamson (1994); Keefe and Smith (1996); Beall (2003)). The classic form of the sorites paradox involves the predicate ‘... is a heap’,7 and can be stated as follows: 1. This pile of 1,000,000 grains of sand is a heap. 2. If one removes one grain of sand from a heap, the remainder will still be a heap. ∴ 3. Even if we removed all the grains from this heap, it would still be a heap. Here, the conclusion, 3, appears to follow from the premisses by a kind of induction, with premiss 1 being the base case and premiss 2 being used as a principle of induction. Similar ‘slippery slope’ arguments can be formulated using pretty much any vague predicate — indeed, susceptibility to sorites arguments is often considered to be a necessary requirement for a predicate to be vague. Many such predicates are spatial in nature, obvious examples being ‘tall’, ‘large’, ‘near’ etc.. For example: 1. A man whose height is 2m is tall. 2. If a man of a given height is tall, then a man whose hight is 1mm less is also tall. ∴ 3. A man whose height is 1m is tall. A characteristic feature of sorites susceptible predicates is that they are associated with some mode of variation, often a measurable property, that is either continuous or fine grained. In attempting to explain the sorites, it is usually the inductive premiss that comes under most scrutiny. A typical diagnosis is that this premiss must be false, although we are for some reason compelled to believe that it ought to be true. The problem then is explaining why we have a tendency to think it should be true. Accounts of this often refer to the supposition that vague predicates cannot be used to make precise distinctions because their limits of applicability are not well-defined. Hence, if two samples are very similar in all attributes that are relevant to the application of a vague predicate V(x), then it must be that either both of them or neither of them are instances of V. Such reasoning can be used to justify the inductive premiss of a sorites argument. In practice it seems that the sorites paradox has not had a direct impact on existing computational spatial information systems. The types of computer system most in danger of being affected by sorites paradoxes are those concerned with representing commonsense knowledge or implementing commonsense reasoning — for instance, systems such as CYC (Guha and Lenat, 1990). It would be inadvisable for such a system to directly encode any proposition or rule similar to a sorites induction step, since it would be clear that it could quickly lead to problems. 7
‘Sorites’ is the Greek word for heap. More accurately, it is an adjective meaning ‘heaped up’.
22
B. Bennett
Current implemented systems generally avoid the paradox simply by not taking vagueness into account. However, as we shall see later in the chapter, a number of computationally-oriented approaches to representing vagueness have been proposed. In fact, these tend to tackle the phenomenon using modelling techniques that are rather different from the axiomatic analysis that leads to the paradox. Hence, they might be regarded as skirting round the problem rather than solving it. It is not yet clear whether such circumvention will be ultimately satisfactory. As more sophisticated applications and representations are developed, it may turn out that the sorites paradox will need to be confronted more directly.
3.2 The Problem of Individuation That vagueness results in a lack of well-defined criteria for the application of predicates is widely recognised. However, there is a further consequence of vagueness that becomes especially important when considering cases where the vagueness of a predicate affects the determination of the spatial extension of entities satisfying that predicate. The problem is that of individuation — i.e. the determination of the class of entities to which the predicate might be applied. The issue of individuation is illustrated in Fig. 1, which shows a section of an extended water body. Suppose we now wish to find instances of the count noun predicates ‘river’ and ‘lake’ in relation to this water region. One interpretation is that the region is simply a river section that is rather irregular in width and includes a number of bulges. But another interpretation is that the water body consists of three lakes connected by short river channels.8 The possibility of these different interpretations arises from the vagueness of the terms ‘river’ and ‘lake’. However, what makes this case especially problematic is that there is no pre-existing division of the water region into segments that we choose to describe as ‘river’, ‘lake’ or whatever. Rather, the acceptable ways in which water can be segmented into features is dependent upon the meanings of these feature terms. And since these meanings are affected by vagueness, it is indeterminate what is the most appropriate segmentation.
Fig. 1 Is this just an irregular river, or is it three lakes joined by a river?
Fig. 2 illustrates a similar case concerning the demarcation of forest regions based on the density of trees. If we vary the minimum tree density threshold 8
One might comment that the distinction between lake and river also significantly depends on water flow. This is certainly true but much the same segmentation issue would still arise, and it is much easier to illustrate in terms of shape rather than flow.
Spatial Vagueness
23
Fig. 2 Possible forest demarcations for a given tree distribution. Inner contours are based on a high threshold on the tree density. Outer contours are based on lower thresholds.
required for a region to be classified as a forest, the extension of forest regions will clearly vary. Furthermore, the number of forest regions may alter according to the threshold.
3.3 Consequences of Indeterminate Spatial Extension A well as illustrating the problem of individuation, the examples given in the previous section also show how in association with spatial concepts vagueness not only affects classification but also results in indeterminacy of spatial extension. Consequently vagueness in spatially related terms such as geographic feature types often leads to an indeterminacy in spatial predicates and relations, even where the predicates and relations themselves are apparently precise. Suppose we want to compare the size of a particular expanse of desert at two different time points on the basis of precipitation data collected at two different time points. (Of course the classification of desert may depend on features other than precipitation, but whatever measure of aridity is employed the issue I am about to describe will occur.) In order to demarcate the extension of desert at each time point, we need to choose some threshold for the amount of precipitation below which we will classify a region as ‘desert’. But if we do this we may find that whether the demarcated desert region expands or contracts between the two time point depends on the particular threshold value that we choose. This problem is illustrated in Fig. 3. The sub-figures on the left of the diagram show precipitation contour maps for the year 1965. In the upper map a threshold of 30mm per month is taken to bound the shaded desert region, whereas in the lower one a threshold of 20mm per month is used to demarcate the desert. On the right of the diagram we have precipitation contours for the year 2009 and again precipitation thresholds of 30 and 20 mm per month respectively are used to bound the desert region in the upper and lower maps. We see that if we choose the higher threshold
24
B. Bennett
(30mm) the demarcated desert region is seen to expand from 1965 to 2009, whereas if we choose the lower threshold (20mm), over the same period the demarcated desert appears to contract. What has happened is that although the driest region has shrunk the region that is dry but somewhat less so has expanded. We could of course avoid this issue by asking about the change in overall precipitation calculated by integrating the precipitation values over the whole map. But this is a different question from that of whether the desert expands or contracts. Moreover, describing the world in terms of average or cumulative measures over a large area can be misleading. For example, if a land region consists of one part that is bitterly cold and another that is intolerably hot, it would be uninformative to classify the whole region as temperate.
Fig. 3 Desert demarcation according to different precipitation thresholds
4 A Theory of Crisp and Blurred Regions In this section I give an overview of the theory of vague regions presented by Cohn and Gotts (1996b). This paper is best known for the so-called ‘egg-yolk’
Spatial Vagueness
25
model of vague spatial regions, which had originally been proposed by Lehmann and Cohn (1994).9 However, the Cohn and Gotts paper starts by developing a purely axiomatic theory of vague regions, for which the egg-yolk model is given as only one possible interpretation. The style of presentation of this theory suggests a de re ontology of vagueness — i.e. vagueness lies in the regions themselves, rather than in the linguistic expressions that refer to these regions. Moreover the egg-yolk semantics can be regarded as a providing a simple model of a spatially vague object. Although presented primarily as a theory of ‘vague’ spatial regions, the theory is more or less neutral about the type of indeterminacy that it models. Thus, it can equally well be used to represent spatial indeterminacy arising from uncertainty, and indeed the terms ‘vague’ and ‘uncertain’ are used more or less interchangeably in the original papers that proposed the theory. The theory is described by means of the following terminology. The term ‘crisp’ is used to mean that a region is precisely determined, and ‘blurred’ to mean that it is indeterminate (either vague or uncertain). More generally, a region x may be described as being a crisper version of another region y, or more succinctly, one may say that ‘x is a crisping of y’. These propositions both assert that the region x is a more precise version of y. This relation can be understood as meaning that every precise boundary that could be considered as a possible boundary of x could also be considered as a possible boundary of y.
4.1 An Axiomatic Theory of the ‘Crisping’ Relation A formal theory of crisp and blurred regions is developed in similar fashion to axiomatic theories of mereology (Simons, 1987) or the topological Region Connection Calculus (Randell et al., 1992). The theory is based on the primitive relation x ≺ y, read as ‘x is a crisping of y’. This relation is defined to be a strict partial order: A1) ∀xy[x ≺ y → ¬(y ≺ x)] A2) ∀xyz[(x ≺ y ∧ y ≺ z) → x ≺ z)] Before stating the further axioms satisfied by ≺, it will be helpful to introduce some definitions: D1) x y ≡def (x ≺ y ∨ x = y) D2) MA(x, y) ≡def ∃z[z x ∧ z y] D3) Crisp(x) ≡def ¬∃y[y ≺ x] D1 is just a convenient abbreviation, giving the reflexive counterpart of the crispness ordering. D2 defines the key relation of ‘mutual approximation’. MA(x, y) holds just in case there is some third region z that is a crisping of both x and y. Thus x and y could potentially approximate the same region. D3 defines a Crisp region to be one that has no crispings.
9
Cohn and Gotts (1994, 1996a) are also precursors of Cohn and Gotts (1996b).
26
B. Bennett
Further axioms of the theory can now be stated as follows: A3) A4) A5) A6) A7) A8) A9) A10)
∀xy[x ≺ y → ∃z[z ≺ y ∧ ¬MA(x, z)]] ∀xy[MA(x, y) → ∃z∀w[w z ↔ (w x ∧ w y)]] ∀xy[∃z[x z ∧ y z ∧ ∀w[(x w ∧ y w) → z w]]] ∀xy[MA(x, y) ↔ ∃z[x z ∧ y z ∧ ∀w[(x w ∧ y w) ↔ z w]]] ∃x∀y[y x] ∀xy[∀z[Crisp(z) → ((z x) ↔ (z y))] → (x = y)] ∀x∃y[y x ∧ Crisp(y)] ∀xy[x ≺ y → ∃z[x ≺ z ∧ z ≺ y]]
Axiom A3 says that, if a region can be made more precise in one way, it can also be made precise in another way that is incompatible with the first. Axiom A4 says that, if two regions are mutually approximate, then there is not only a region that is a crisping of both but, more specifically, there is a region that is the least crisp crisping of both. Axiom A5 says that any two regions have a ‘crispest common blurring’. (In terms of the egg-yolk model the crispest common blurring of two vague region is the region whose white is the sum of the two whites and yolk is the intersection of the two yolks. This assumes that we may have regions with an empty yolk.) Axiom A6 states that if two regions have a mutual approximation, they must have a ‘maximally blurred’ mutual approximation. Axiom A7 states that there is a maximally blurred region (‘the complete blur’). Cohn and Gotts remark that one may wish to omit this axiom or even assert its negation, which says that for any region one can always find a more blurred region. Axiom A8 specifies a condition for identity. It says that if two regions have the same complete crispings they must be equal. (Cohn and Gotts (1996b) give a slightly more complex but equivalent formula, and they also consider some alternative identity axioms.) Axiom A9 states that every region can be crispened to a completely crisp region. Axiom A10 states that whenever one region is crisper than another, there is a third intermediate region, more blurred than the first but crisper than the second. (This gives makes the crisping ordering dense). Exactly what kinds of relation structure satisfy this axiom set is not known. However, one possible model is considered in the next section.
4.2 The ‘Egg-Yolk’ Model The axiomatic theory of ‘crisping’ can of course be given a semantics based on the general purpose model theory of first-order logic. However, the theory is normally understood in terms of the more specific semantics originally proposed in Lehmann and Cohn (1994). The so-called ‘Egg-Yolk’ model Lehmann and Cohn (1994) interprets a vague region in terms of a pair of nested crisp regions representing its maximal and minimal possible extensions. The maximal extension is called the ‘egg’ and the minimal is the ‘yolk’, which is required to be a part of the egg (see Fig. 4). (The case where the yolk is equal to the egg is allowed, such cases
Spatial Vagueness
27
Fig. 4 A typical ‘egg-yolk’ interpretation of a vague region
corresponding to ‘crisp’ regions.) When an egg-yolk pair is given as the spatial extension of a vague region this means that the region is definitely included in the egg and definitely includes the yolk. In terms of the Egg-Yolk model, the crisping relation x ≺ y can be understood as holding whenever the egg associated with x is contained within the egg associated with y and the yolk of x contains the yolk of y, and furthermore x and y do not have identical eggs and yolks (if this last condition does not hold, we have x y but not x ≺ y). In order to tie the theory of the crisping relation explicitly to the Egg-Yolk model, Cohn and Gotts (1996b) introduce two functions to the vocabulary: egg-of(x) and yolk-of(x), denoting respectively the egg and yolk regions associated with x. The mereological relation P(x, y), meaning ‘x is part of y’, is also introduced. The following additional axioms are then specified: A11) ∀x[P(yolk-of(x), egg-of(x))] A12) ∀xy[x ≺ y → (P(egg-of(x), egg-of(y)) ∧ P(yolk-of(y), yolk-of(x)) ∧ ¬(P(egg-of(y), egg-of(x)) ∧ P(yolk-of(x), yolk-of(y))))] A11 ensures that the yolk of every region is contained within its egg. A12 states that if x is crisper than y then x’s egg must be part y’s egg and y’s yolk must be part of x’s yolk and at least one of these relations must be a proper part relation (i.e. the eggs and yolks can’t both be equal). Note that axiom A12 is stated as an implication rather than an equivalence. The explanation given for this is that the egg-yolk condition is proposed as a necessary but not sufficient condition for one vague region being a legitimate crisping of another. The rationale behind this is that there might be a vague region that satisfies the relatively weak spatial constraints required to be a crisping of another region and yet would not be considered as a reasonable crisping for other reasons, such as the shape of its egg and yolk. Fig. 5 illustrates two potential candidates for the crisping of a given region. The egg and yolk of the initially given region are shown as bounded by solid lines, and the egg and yolk of the candidate crispings are outlined with dashed lines. In case
28
B. Bennett
Fig. 5 a) Reasonable, and b) anomalous crispings in the ‘egg-yolk’ interpretation
a) the egg and yolk of the candidate crisping are similar in shape to the initial region, so the vague region that they define may be regarded as a legitimate crisping of the original. But in case b) the jagged outline of the egg and yolk of the candidate mean that it is implausible that the original could be crispened in this way. Although, the egg-yolk model has become well-known, the papers of Cohn and Gotts (1994, 1996a,b) suggest that this is just one of a range of possible interpretations that could be given to the axiomatic theory. The egg-yolk model cannot of itself, account for any kind of constraint on a region’s plausible extensions between its maxima and minima. However, when dealing with real phenomena, such as vague geographic features, we would expect the range of possible extensions to be structured in accordance with the underlying conditions relative to which a vague region is individuated. For instance, one might expect these extensions to exhibit a contour-like structure, such as we saw in Fig. 2 and Fig. 3 above. Another strand of research on spatial vagueness, somewhat related to the work of Cohn and Gotts and the egg-yolk theory, is work based on rough sets and granular partitions. See for example Bittner and Stell (2002). Consideration of this approach is beyond the scope of this chapter.
5 Fuzzy Logic Approaches The most popular approach to modelling vagueness used in AI is that of fuzzy logic and the theory of fuzzy sets. These theories originated with the works of Goguen (1969) and Zadeh (1975) and have given rise to a huge field of research (see (Dubois and Prade, 1988; Zimmermann, 1996) for surveys of fuzzy logics and their applications). Within the context of the current collected work, I presume that a general introduction to fuzzy logic will not be required. The coverage of fuzzy approaches to vagueness here will be very limited, since these are considered in detail in other chapters of this collection.
Spatial Vagueness
29
5.1 Spatial Interpretation of Fuzzy Sets Fuzzy sets can be given an intuitive spatial interpretation. In the same way that a precise spatial region can be identified by a set of spatial points, a vague spatial region may be associated with a fuzzy set of points. Here, the degree of membership of the point in the fuzzy set corresponds to the degree to which that point is considered as belonging to the vague region. Fuzzy representations have been popular with many geographers as they provide a natural way of representing features with ill-defined boundaries (Wang and Brent Hall, 1996). The particular way that fuzzy sets have been employed varies greatly according to the type of feature being analysed, the kinds of data available, and the particular aims of each analysis. Kronenfeld (2003) proposed a fuzzy approach to classification and partitioning of continuously varying land cover types and applies this to the classification of forest types. Arrell et al. (2007) use a fuzzy classification of elevation derivatives to identify natural landforms such as peaks and ridges. Evans and Waters (2007) use fuzzy sets to model the regions referred to by vernacular place names.
5.2 Fuzzy Region Connection Calculus A fuzzy version of the well-known Region Connection Calculus (Randell et al., 1992) of topological relations has been developed by Schockaert et al. (2008, 2009), by treating the primitive connection relation, C, of the RCC theory as a fuzzy relation, and replacing the definitions of other spatial relations by analogous definitions formulated using fuzzy logic operators. Let T (φ , ψ ) be the value of a T -norm (fuzzy conjunction) function on the truth values of propositions φ and ψ ; and let IT (φ , ψ ) be the residual implicator function relative to T , given by IT (φ , ψ ) =def sup{λ | λ ∈ [0, 1] and T (φ , λ ) ≤ ψ }. Then a fuzzy parthood relation P is defined in terms of the connection relation as follows: P(a, b) ≡def (∀x ∈ U)[C(x, a) → C(x, b)] =⇒ infx∈U {IT (C(x, a), C(x, b))} So, following a standard approach to translating first-order logic into fuzzy logic, the universal quantification in the classical definition is replaced by the infimum of the truth values for all regions in the domain, and implication is replaced by the residual implicator. Similar mappings from first-order logic to fuzzy relations can be given for all the topological relations of the original RCC theory. This fuzzification method is certainly well-principled, and various strong results can be proved regarding the structure and reasoning capabilities of the resulting fuzzy region connection calculus. However, some properties of the relations defined in this way appear to be somewhat unexpected. In particular, the use of the infimum (and supremum) as corresponding to universal (and existential) quantification leads to results that may seem counter-intuitive. For example, consider the case illustrated in Fig. 6. Here we have one region b which seems to be very much part of region a, except from a thin protruding spike, whereas c lies on the edge of a but has no part that is strongly within a. In fuzzy RCC one may find that in such a situation c
30
B. Bennett
Fig. 6 a, b and c are three fuzzy regions. Under Schockaert’s definition of fuzzy parthood, b will be evaluated as being part of a to a higher degree than is c
will be evaluated as being part of a to a higher degree than is b. This is because a region near the tip of b may have a high degree of connection with b but a very low degree of connection with a. Hence, even though regions connected to most parts of b will be highly connected to a, the infimum in the fuzzy P definition means that the evaluation of the overall degree of truth of the P relation is determined by that part of b that is least connected to a. By contrast, the whole of c is on the periphery of a but no part of c is very far from a. Since the furthest part of c is nearer to a than is the furthest part of b, P(c, a) is likely to be evaluated as having a higher degree of truth than P(b, a). Of course, this depends on the particular details of the situation and the distribution of the fuzzy C relation. To avoid cases such as that just described, one could define C in such a way that contact with a protruding spike of a region always counts as a low degree of connection. However, it is evident that very natural measures of the degree of connection can give rise to measures of parthood that may seem unnatural. The reason for this is that, when we hold the view that a topological relation between spatial regions is almost true, we often mean that it would hold if we disregard some comparatively insignificant part of one or both of the regions. But the use of the infimum in the fuzzy part definition means that we cannot disregard any part of a region (except by reducing the degree to which it is considered a part of the region).
6 Supervaluationist Approaches 6.1 Origins and Motivations of Supervaluation Semantics The fundamental idea of the supervaluationist account of vagueness, is that a language containing vague predicates can be interpreted in many different ways, each of which can be modelled in terms of a more precise version of the language, which is known as a precisification. In specifying a formal supervaluation semantics each
Spatial Vagueness
31
precisification is associated with a valuation of the symbols (i.e. predicates and constants) of the language. In some accounts each precisification is simply a classical assignment, corresponding to a completely precise interpretation; but it is common to allow precisifications that are not completely precise, and are associated with partial assignments. The interpretation of the vague language itself is determined by a supervaluation, which is the collection of assignments at all precisifications. The view that vagueness can be analysed in terms of multiple senses was proposed by Mehlberg (1958), and a formal semantics based on a multiplicity of classical interpretations was used by van Fraassen (1969) to explain ‘the logic of presupposition’. It was subsequently applied to the analysis of vagueness by Fine (1975) and independently by Kamp (1975). Thereafter, it has been one of the more popular approaches to the semantics of vagueness adopted by philosophers and linguists, and to a lesser extent by logicians. Only a brief overview of supervaluation can be given here. A more thorough account and extensive discussion can be found in works such as Williamson (1994) and Smith (2008). A major strength of the supervaluation approach is that it enables the expressive and inferential power of classical logic to be retained (albeit within the context of a somewhat more elaborate semantics) despite the presence of vagueness. In particular, necessary logical relationships among vague concepts can be specified using classical axioms and definitions. These analytic interdependencies will be preserved, even though the criteria of correspondence between concepts and the world are ill-defined and fluid. For example, in the sentence ‘Tom is tall and Simon is short’, both ‘tall’ and ‘short’ are vague. However, their meaning is coordinated in that, when applied in the same context relative to a given comparison class, it must be that in any admissible precisification the minimal height at which an object is ascribed the property ‘tall’ is higher than the maximal height at which an object is ascribed the property ‘short’. Thus if it were to turn out that Tom’s height is less than that of Simon, the claim would be false in any admissible precisification. Investigation of supervaluation semantics in the philosophical literature tends, as one might expect, to be drawn towards subtle foundational questions (such as those concerning the sorites paradox and second-order vagueness). Consequently, there has been relatively little development towards practical use of supervaluation semantics in formal representations designed for computational information processing applications. Supervaluation semantics is often regarded as fundamentally opposed to multivalued and fuzzy approaches to vagueness. Whereas the latter approaches modify the notion of truth and the logical operators, supervaluation sememantics locates vagueness in the interpretation of terms and accounts for this within an extended model theory, whilst retaining essentially classical notions of truth and deduction.
6.2 Admissible Precisifications and ‘Supertruth’ In standard supervaluation semantics the concept of an admissible precisification plays a key role. It is used to elaborate the simple classical concept of truth by
32
B. Bennett
defining the concept of supertruth, which is applicable to propositions that include vague terminology. In a semantics where all precisifications are associated with complete classical models, the following definition is may be given:
φ is supertrue just in case φ is true according to every admissible precisification. In many accounts (e.g. Fine (1975)) the notion of admissible precisification is taken as primitive. It is normally assumed that, in addition to satisfying conditions of logical consistency (in virtue of being associated with classical models), an admissible precisification also satisfies an appropriate theory specifying analytic properties and relationships between vocabulary terms (e.g. nothing can be an instance of both the predicates ‘tall’ and ‘short’ — here we are assuming the predicates are applied in the same context). Stipulating such semantic conditions may be complex in general, but poses no particular theoretical problems. But if the only restrictions on admissible precisifications are that they must satisfy logical and analytical axioms, then the only propositions that will count as supertrue will be those that are analytically true. This is usually regarded as too strong a requirement. What we would like is a notion of supertruth such that a proposition is true if it comes out as true on any reasonable interpretation of its terms not on every possible interpretation. Consequently, the set of admissible precisifications is normally taken to be those that are in some sense reasonable. For instance if a man is 6’6” in hight, one would expect him to be tall according to all admissible precisifications. However, this begs the question of what precisifications should be counted as admissible. The problem of determining the set of admissible precisifications is often sidestepped in presentations of supervaluation semantics. However, it causes difficulties both from a theoretical and practical point of view. From a theoretical perspective, any attempt to stipulate a set of admissible precisifications brings to the fore the problem of second-order vagueness. To explain briefly: the admissible precisifications are intended to correspond to possible ways of deciding the truth of predications applied to borderline cases. That is, every complete precisification corresponds to a set of decisions as to whether each borderline case is an instance of a given vague predicate. By contrast, a non-admissible precisification would one one that assigns a predicate as true of an object to which it is clearly not applicable (or as not true of something to which it is clearly applicable). The problem is that any sharp distinction between admissible and non-admissible precisifications assumes a correspondingly sharp distinction between borderline instances and clear instances (or clear non-instances). But given that we are considering vague predicates, it seems untenable that there should be a precise boundary between borderline and clear-cut cases. The problem with admissibility from a practical point of view, is that if we actually wanted to reason with or implement a formal system in which the set of admissible precisifications plays a key role, we would have to explicitly specify all the
Spatial Vagueness
33
conditions required of an admissible precisification. This would not only be complex but would also seem to require one to make stipulations without any obvious means of justification.
6.3 Computational Applications of Supervaluationism The uptake of supervaluation-style approaches in computational applications has been relatively limited. One reason for this is probably the difficulty in specifying the set of admissible precisifications, as mentioned above. Nevertheless it is worth mentioning some works which have taken preliminary steps in the application of supervaluation semantics. Bennett (1998) proposed a two-dimensional model theory and a corresponding modal logic, in which the interpretations of propositions are indexed both by precisifications and possible worlds. Relative to this semantics a spectrum of entailment relations were defined corresponding to more or less strict requirements on how the vague senses of premisses and conclusion are allowed to vary. In Bennett (2006) the semantics of vague adjectives was characterised in terms of their dependence on relevant objective observables (e.g. ‘tall’ is dependent on ‘height’). This may be seen as a precursor of the standpoint semantics, which will be presented in the next section. An example of the use of a supervaluationist approach in an implemented computer system for processing geographic information can be found in Bennett et al. (2008). Applications of supervaluation or similar theories to geographic information have been proposed by Smith and Mark (2003), Varzi (2001a) and Kulik (2003). Halpern (2004) analyses vagueness in terms of the subjective reports of multiple agents, but these play a similar role in his semantics to precisifications in the semantics proposed in this paper.
7 Standpoint Semantics Standpoint Semantics is both a refinement and an extension of supervaluation semantics whose purpose is to make more explicit the modes of variability of vague concepts and to support a definition of truth that is relative to a particular attitude to the meanings of terms in a vague language. Whereas supervaluation semantics provides a very general framework within which vagueness can be analysed formally, standpoints semantics is more geared towards detailed modelling of specific vague concepts within some particular application domain.
7.1 What Is a Standpoint? In making an assertion or a coherent series of assertions, one is taking a standpoint regarding the applicability of linguistic expressions to describing the world. Such a standpoint depends partly on one’s beliefs about the world and partly on one’s linguistic judgements about the criteria of applicability of words to a particular situation. This is especially so when some of the words involved are vague. For instance,
34
B. Bennett
one might take the standpoint that a certain body of water should be described as a ‘lake’, whereas another smaller water-body should be described as a ‘pond’. It is not suggested that each person/agent has fixed standpoint, which they stick to in all situations. Rather an agent adopts a given standpoint at a particular time as a basis for describing certain features of the world. In a different situation the agent might find that adopting a different standpoint is more convenient for describing salient features of the world. This is somewhat misleading since even a person thinking privately may be aware that an attribution is not clear cut. Hence a person may change their standpoint. Moreover this is not necessarily because they think they were mistaken. It can just be that they come to the view that a different standpoint might be more useful for communication purposes. Different standpoints may be appropriate in different circumstances. The core of standpoint semantics does not explain why a person may hold a particular standpoint or the reasons for differences or changes of standpoint, although a more elaborate theory dealing with these issues could be built upon the basic formalism. In taking a standpoint, one is making somewhat arbitrary choices relating to the limits of applicability of natural language terminology. But a key feature of the theory is that all assertions made in the context of a given standpoint must be mutually consistent in their use of terminology. Hence, if I take a standpoint in which I consider Tom to be tall, then if Jim is greater in height than Tom then (under the assumption that height is the only attribute relevant to tallness) I must also agree with the claim that Jim is tall.
7.2 Parameterised Precisification Spaces By itself, supervaluation semantics simply models vagueness in terms of an abstract set of possible interpretations, but gives no analysis of the particular modes of semantic variability that occur in the meanings of natural language vocabulary. A key idea of standpoint semantics is that the range of possible precisifications of a vague language can be described by a (finite) number of relevant parameters relating to objectively observable properties; and the limitations on applicability of vocabulary according to a particular standpoint can be modelled by a set of threshold values, that are assigned to these parameters. To take a simple example, if the language contains a predicate Tall (as applicable to humans), then a relevant observable is ‘height’. And to determine a precisification of Tall we would have to assign a particular threshold value to a parameter, which could be called tall human min height. One issue that complicates this analysis is that vague adjectives tend to be context sensitive in that an appropriate threshold value depends on the category of things to which the adjective is applied. This is an important aspect of the semantics of vague terminology but is a side issue in relation to our main concerns in this chapter. Here we shall assume that vague properties are applied uniformly over the set of things to which they can be applied. To make this explicit we could always use separate properties such as Tall-Human and Tall-Giraffe, although we won’t actually need
Spatial Vagueness
35
to do this for present purposes. A formal treatment of category dependent vague adjectives is given in Bennett (2006). In general a predicate can be dependent on threshold valuations of several different parameters (e.g. Lake might depend on both its area and some parameter constraining its shape.) Thus, rather than trying to identify a single measure by which the applicability of a predicate may be judged, we allow multiple vague criteria to be considered independently. In the initial development of the standpoint approach Santos et al. (2005a); Mallenby and Bennett (2007); Third et al. (2007); Bennett et al. (2008)), it was assumed that standpoints can be given a model theoretic semantics by associating each standpoint with a unique threshold valuation characterising a complete precisification. In so far as standpoints may be identified with an aspect of a cognitive state, this idea is perhaps simplistic. It is implausible that an agent would ever be committed to any completely precise value for a threshold determining the range of applicability of a vague predicate. Cognitive standpoints are more plausibly associated with constraints on a range of possible threshold values rather than exact valuations of thresholds. For instance, if I call someone tall, then my claim implies an upper bound on what I consider to be a suitable threshold for tallness — the threshold cannot be higher than the height of that person. This elaboration of the status of standpoints in relation to thresholds is being developed in ongoing research. But, in the context of implementing cartographic displays showing the spatial extensions of instances of vague terms, modelling a standpoint as a fully determinate parameterised precisification has been found to be useful and informative. It has the advantage that the regions displayed in accordance with a standpoint always corresponds to some precise definition. This is desirable if one wants to compare different instances of vague predicates. Moreover, it is relatively easy to design an interface such that a user can easily change their standpoint by altering the thresholds assigned to one or more of the parameters that define the standpoint. To summarise, the key ideas of standpoint semantics are: a) to identify precisifications with threshold valuations — i.e. assignments of threshold values to a set of parameters that model the variability in meaning of the vague concepts of a language; and b) to always evaluate information relative to a standpoint, which in the simplest case corresponds to a single precisification, but could correspond to a set of precisifications compatible with an agent’s current attitudes to language use. A threshold valuation appropriate for specifying a standpoint in relation to the domain of hydrographic geography might be represented by something like the following: V = [ pond vs lake area threshold = 200 (m2 ), desert max precipitation = 20 (mm per month), elongated region min elongation ratio = 2.2, ...] Here, the first parameter determines a cut-off between ponds and lakes in terms of their surface area and the second sets the maximum precipitation at which a region
36
B. Bennett
could be considered to be a desert. The last parameter might be used to specify conditions under which a region is considered to be ‘elongated’ (how this property might be defined will be discussed in section 9.2 below).
7.3 Defining and Interpreting Vague Concepts Using Parameters As well as providing a formal structure that defines the semantic choices associated with a particular precisification, the parameters of semantic variation and the threshold assignments to these parameters play further key roles in standpoint semantics. As well as their role in the semantics, the parameters may also be referred to explicitly in the formal object language in which we both axiomatise or define vague concepts and in which we also represent information expressed in terms of these concepts. We here assume that the object language is first-order logic — or rather it is firstorder logic with a small syntactic innovation (in fact a similar extension is likely to be possible for other formal languages, but we will not examine this possibility here). The innovation is that, for each vague predicate, we allow additional arguments to be attached to it corresponding to semantic variation parameters, relating to the variability in the meaning of that concept. Specifically, where a vague n-ary predicate V depends on m parameters we write it in the form: V[p1 , . . . , pm ](x1 , . . . , xn ) The following examples illustrate the use of this language augmented with precisification parameters to define some vague spatial concepts: 1. Tall[tall thresh](x) ≡def height(x) > tall thresh 2. Forested[ forest max tree dist ](r) ≡def ∀p[In(p, r) → ∃t[Tree(t) ∧ (dist(p,t) < forest max tree dist)]] 3. Forest[ forest max tree dist, forest min area ](r) ≡def Forested[forest max tree dist](r) ∧ (area(r) > forest min area) ∧ ¬∃r [Forested[forest max tree dist](r ) ∧ PP(r, r )] Example 1 is a simple definition of ‘tall’ as a predicate that applies to anything whose height is greater than a particular threshold. Definition 2 specifies that a region is forested just in case every point in that region is less than a certain threshold distance from a tree. Finally, example 3 defines a forest as being a forested region whose area is greater than a given minimum and that is not contained within some large forested region (here PP(x, y) means that x is a proper part of y). Actually, the additional parameter syntax ‘[p1 , . . . , pm ]’ is not really essential since we could either just treat the variability parameters as ordinary additional arguments or we could simply omit them from the predicate arguments altogether and just have them as constants embedded within the definitions. However, the extended syntax seems to be both convenient and informative as it ensures that the parameters of variability of each vague predicate are clearly indicated and highlights the
Spatial Vagueness
37
conceptual difference between the objects to which a predicate is applied and the parameters used to precisify the predicate’s meaning. We can now understand how a threshold valuation (associated with a standpoint) is used to interpret each vague predicate in a precise way. All we need do is to substitute the values given by the threshold valuation in place of the corresponding threshold parameters given in the definition. If we then remove the ‘[p1 , . . . , pm ]’ argument lists we end up with ordinary first-order formulae, defining precise versions of the vague predicates, in accordance the given threshold valuation.
8 Comparison between Approaches It may be useful to compare how the variable extension of a vague spatial region is modelled according to the different approaches we have considered. We assume that we wish to model the spatial extension of an instance r of some vague spatial predicate V(x). Fig. 7 illustrates the different models that arise. In Fig. 7.a we see the egg-yolk model with its inner yolk corresponding to the region that is definitely part of the extension of r, and its outer egg boundary within which the extension of r must lie. The dotted lines indicate the boundaries of crisp regions that are possible crispings of r. We note that by itself, the egg-yolk model does not place any further restrictions on the shape of these possible crispings. Fig. 7.b depicts region r represented as a fuzzy set of spatial points. The shading represents the degree to which each point is considered to lie within r — the darker the shade the higher the degree of membership. Fig. 7.c is supposed to indicate possible extensions of a feature under a general supervaluation semantics. This is rather misleading as it would depend very much on what semantic conditions were specified in the theory determining the set of admissible precisifications. Most likely the set of extensions according to different precisifications would be much more structured. What this diagram is intended to indicate is just that supervaluation semantics by itself does not impose particular conditions on the range of possible extensions of a region characterised by a vague predicate. Fig. 7d shows possible extensions according to a standpoint semantics. Here we assume that a single parameter has been determined that is relevant to application of a predicate that is used individuate the region. According to different choices of a threshold of applicability, we get different extensions. These will typically be structured like a contour map. If a strict threshold is set (which may be high or low depending on whether the parameter is a positive or negative indicator of the feature under consideration) then a small region is identified, and with less strict thresholds monotonically larger regions are demarcated. We note that models a, b and d have similar structure. The Egg-Yolk boundaries of a can be regarded as representing two distinguished contours chosen from the continuum of contours resulting from threshold choices made in relation to the standpoint model d. Similarly, the fuzzy membership function depicted in b could be chosen so that its α -cuts also correspond to contours in d. Moreover, although
38
B. Bennett
Fig. 7 Comparison of extensions of a region instantiating a vague predicate, as modelled by: a) the Egg-Yolk model, b) a fuzzy set of spatial points, c) the precisifications of a supervaluation semantics, and d) the standpoints of a standpoint semantics
the supervaluation model c shows extensions corresponding to arbitrary precisifications, suitable conditions on the set of admissible precisifications could make this too correspond with the contours of d. A further connection is that the egg and yolk of a (or indeed the contours of d) could be associated with limitations on the extensions of r that are possible in admissible precisifications of a supervaluation semantics. c would then look more like a (or d). Nevertheless, despite the similarity between the resulting models. There are deep conceptual differences. In the fuzzy model, b, the meaning of the vague predicate V is considered to be static and the points are associated with V to a more or less strong degree. But with the standpoint semantics it is the meaning of V that varies, and a point may be part of the extension of an instance of V for some but not others of these meanings. The differences between these approaches would become more evident if we considered examples involving a combination of inter-related classifications — for
Spatial Vagueness
39
instance the extensions of two or more regions that are instances of different but related vague predicates. In fuzzy logic the conjunction of multiple vague predications is modelled by a function of the truth degrees associated with each extensions. This mode of combination makes it difficult to capture the significance of dependencies between vague concepts. But in the standpoint approach, interdependent concepts will share parameters of variability, so that a choice of threshold value may affect the meaning of several different vague concepts. Consequently, for any given standpoint the extensions of related features are coordinated. For example, suppose that the terms ‘forest’ and ‘heath-land’ are defined so that areas with a tree density above a chosen threshold are classified as ‘forest’ but those with tree density less than this threshold (but above some lower threshold) are classified as ‘heath-land’. According to the standpoint semantics the border between forest and heath-land would move according to the adopted standpoint; and for any particular standpoint no region would be both forest and heath-land. By contrast, in the borderline between forest and heath-land we would have an area that is forest to some degree and is also heath-land to some degree. The ways in which separate items of vague information can be combined and collectively interpreted within each of the various formalisms is a very significant and interesting topic. However, further consideration of this aspect of vagueness is beyond the scope of this chapter.
9 Some Significant Vague Spatial Predicates In this section we shall examine certain vague spatial predicates in more detail. Several significant kinds of spatial concept and relation will be covered. But, since there are a large number of ways in which vagueness affects spatial predication, this is not an exhaustive analysis.
9.1 Vague Distance Relations: Near and Far Arguably the most fundamental spatial relations are those relating to distance. Indeed Tarski (1959) showed that all precise geometric predicates can be defined starting from the basic relation of the equidistance of two pairs of points. In natural language we describe distances in a variety of ways. Sometimes we use numerical distance measurements, typically with a very high degree of approximation. For instance, if I say that Leeds is 200 miles from London, I do not imply an exact measurement — if the distance were 10 or even 20 miles more or less, this would still be regarded as a reasonable claim (the semantics of numerical approximations has been studied by a number of authors — e.g. (Corver et al., 2007; Krifka, 2007)). Although these measurement approximations are closely related to vagueness they are essentially numerical rather than spatial in character and will not be considered further here. Another equally, if not more, common way of describing distances is by means of terms referring to vague distance relationships — i.e. words such as ‘near’, ‘far’,
40
B. Bennett
‘close’, ‘distant’. Examination of the informal concept of nearness in geography goes back at least to the work of Lundberg and Eckman (1973). More recent studies of how people use the words ‘near’ and ‘far’ include (Fisher and Orf, 1991), (Frank, 1992), (Gahegan, 1995) and (Worboys, 2001) Application of fuzzy logic techniques to representing spatial relations such as near are investigated by Robinson (1990, 2000), who describe a system that uses question answering by human subjects to learn a fuzzy representation of the concept ‘near’ by constructing a fuzzy set of the places near the reference place. Fisher and Orf (1991) in their survey of subjects ascription of ‘near’ (in a university campus setting) found that, rather than judgements of nearness being clustered within a single distance range, several clusters were found (three in fact), which seemed to indicate that different semantic interpretations of ‘near’ were being employed by different subjects. What is clear (as indicated by the experimental results of Gahegan (1995)) is that many contextual factors, apart from the actual distances involved, have a strong influence on how subjects apply the description ‘near’. These include the relative sizes of objects or regions involved, connection paths between places, scale, and the perceived significance of objects.
9.2 Elongation vs. Expansiveness A qualitative distinction that appears to be of general importance in our description of the world, and has particular significance in relation to many geographic features, is that between elongated and expansive regions. A typical example is the distinction between a river, which is elongated, and a lake, which is expansive. The term ‘elongated’ is used here to refer to a region that is long an thin, but not necessarily straight. Thus, a river may curve and wind but is still, in this sense, elongated. Given the infinite possible shapes that a region may take, it is not obvious how to measure the degree to which an arbitrary region is elongated. One relatively simple idea is to consider the ratio of the radius of a region’s minimal bounding circle to the radius of its maximal inscribed circle. For the case of a 2-dimensional region, this measure is illustrated in Fig. 8. A similar measure can be defined for 3-dimensional regions, using spheres instead of circles. Such a measure can easily be utilised within a standpoint semantics formalism. For example, one might define the vague property Elongated, using the standpoint semantics notation, as follows: • Elongated[elong ratio](x) ≡def (min bound rad(x)/max insc rad(x)) ≤ elong ratio This formulation is fine as long as we have already demarcated the boundaries of all regions that we wish to classify. In this case we can directly evaluate the truth of the Elongated predicate, relative to a given value of the threshold τ (elong ratio), for any given region. However, in many domains — certainly in geography and biology — we often encounter cases where we are looking for an elongated part of a larger system. For example we may want to individuate a river that is part of a complex hydrological system.
Spatial Vagueness
41
Fig. 8 Calculation of elongation ratio. The elongation ratio is obtained by dividing the radius of the minimum bounding circle by the radius of the maximal inscribed circle — i.e. L = R/r.
Fig. 9 Medial axis skeleton of the Humber estuary and its tributaries
A method for identifying elongated parts of a larger region (and hence partitioning it into elongated and expansive segments) was proposed by Santos et al. (2005b) and further developed in (Mallenby and Bennett, 2007). Initial geometrical processing is carried out to find the medial axis skeleton of the region under analysis.10 This is the locus of all points that are equidistant from two or more boundary points of the region. Fig. 9 shows the medial axis skeleton computed for the Humber estuary in the UK. It can be seen that the skeleton includes line segments of two somewhat different kinds. Some of the segments run along what we might naturally think of as the 10
This was carried out using the software VRONI (Held, 2001).
42
B. Bennett
Fig. 10 Determination of elongated segments of a larger region
middle of the channels of the water system, whereas others run from the middle to the edge of the water region. The latter type of segment arise from relatively small indentations in the river boundary and are not particularly relevant to the shape of the water system as a whole. In order to remove these unwanted segments, the skeleton is then pruned by removing all segments whose rate of approach towards the boundary is greater than a certain threshold. Once this pruning has been carried out, the remaining, more globally significant, part of the medial axis skeleton is used to identify elongated parts of the region. The idea is that these are associated with segments of the medial axis along which the width remains approximately constant. As shown in Fig. 10, the width variance is calculated for each point along a (pruned) medial axis section. The calculation is based on the highest and lowest widths evaluated along a sample segment of the medial axis, extending a certain distance either side of the point under consideration. In order to make the measure of width variation scale invariant, the length of medial axis over which it is calculated is taken to be equal to the width of the region at that point. The width variation is then computed as the ratio of the highest to lowest width (i.e. distance from a point on the medial segment to the nearest boundary point) along the sample segment.11 Fig. 10 illustrates the width variation measure at two points p and q along the medial axis of a region. At p the width variation is low since all widths at medial points within the maximal inscribed circle centred at p are similar, whereas at q the width variation is high. Once this variation measure has been computed for each point on the (pruned) medial axis, the elongated parts of the region are determined as those parts of region such that every point on the segment of medial axis running through that region has a width variation below a given threshold value. Points with width variation above the threshold are considered to lie in expansive sections of the region (as are points that lie on branching points of the medial axis, for which the measure is not well defined). The application of this method to the Humber estuary region is shown below in Fig. 11. Here we see two possible demarcations of the elongated sections of the estuary and the river Hull from which it opens. The upper map shows the 11
A number of variants of this simple calculation also give reasonable results — it is as yet unclear which is the most appropriate.
Spatial Vagueness
43
demarcation obtained by using a strict threshold of 1.09 on the maximum width variance, whereas the lower map was produced using the much more tolerant threshold of 1.32. This provides a good example of how a standpoint semantics approach can be used to visualise a geographic feature according to different interpretations of a vague concept.
Max width variance = 1.09
Max width variance = 1.32
Fig. 11 River stretches identified according to different standpoints
9.3 Geographic Feature Types and Terminology Geographers, and more especially surveyors and cartographers, have long been aware of the difficulties of giving precise definitions of spatial features (see for example (Maling, 1989, chapters 5 and 12)). The prevalence of cartographic maps as the prime medium for geographic information may have hidden the true extent of indeterminism in geographic features and their boundaries. Constructing a map involves the use of complex procedures and conventions for converting observations measurements into cartographic regions and entities. Moreover, many stages of these procedures require a certain amount of subjective judgement in order to transform the multifarious characteristics of the world into precise cartographic objects. Thus the resulting map representation gives an impression that the world is far more neatly organised and compartmentalised than is really the case.
44
B. Bennett
As we have seen from many of the illustrative examples given above, many of the representations proposed by researchers in AI, formal logic and ontology have been developed with geographic applications in mind. Relevant works include that of Varzi (2001a) and the collection of papers in Varzi (2001b). Certain geographic feature types have received particular attention. I now briefly summarise some contributions in this area. The nature of forests and how they should be defined and identified has been examined by Bennett (2001) and Lund (2002). The case of mountains has been considered by Smith and Mark (2003). As mentioned above a number of papers have tackled the definition and individuation of hydrographic features. As well as being interested in specific types of geographic feature, geographers are also concerned with general aspects of the way humans describe the world. Such descriptions are of course greatly affected by the vagueness of our natural language terminology. The question of how vernacular terms related to geographic regions has been examined by Evans and Waters (2007) and the semantics of the notion of ‘place’ has been explored by Bennett and Agarwal (2007).
10 Conclusion This chapter has examined the issue of spatial vagueness from a variety of perspectives and has explored a number of different approaches that have been taken to representing and processing vague spatial information. The reader will no doubt have become aware that this is a subtle and complex area, which is very much open to further exploration. Although the field is still very much in its infancy, it is clear that representation and processing of vague spatial information can potentially play a crucial role in many computational applications ranging from geographic and biological information systems to natural language understanding and robotics. However, it is also evident that in order to support such applications in a general and flexible way, many theoretical and practical obstacles remain to be overcome. Acknowledgements. I would like to thank David Mallenby and Paulo Santos for their collaboration with myself in developing and implementing a standpoint based approach to geographic feature identification. Their work has provided a significant contribution to the material reported in this chapter.
References Arrell, K., Fisher, P., Tate, N.: A fuzzy c-means classification of elevation derivatives to extract the natural landforms in Snowdonia, Wales. Computers and Geoscience 33(10), 1366–1381 (2007) Beall, J.: Liars and Heaps. Clarendon Press, Oxford (2003) Bennett, B.: Modal semantics for knowledge bases dealing with vague concepts. In: Cohn, A.G., Schubert, L., Shapiro, S. (eds.) Principles of Knowledge Representation and Reasoning: Proceedings of the 6th International Conference (KR 1998), pp. 234–244. Morgan Kaufmann, San Francisco (1998)
Spatial Vagueness
45
Bennett, B.: What is a forest? on the vagueness of certain geographic concepts. Topoi 20(2), 189–201 (2001) Bennett, B.: Modes of concept definition and varieties of vagueness. Applied Ontology 1(1), 17–26 (2005) Bennett, B.: A theory of vague adjectives grounded in relevant observables. In: Doherty, P., Mylopoulos, J., Welty, C.A. (eds.) Proceedings of the Tenth International Conference on Principles of Knowledge Representation and Reasoning, pp. 36–45. AAAI Press, Menlo Park (2006) Bennett, B., Agarwal, P.: Semantic categories underlying the meaning of ‘place’. In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 78–95. Springer, Heidelberg (2007) Bennett, B., Mallenby, D., Third, A.: An ontology for grounding vague geographic terms. In: Eschenbach, C., Gruninger, M. (eds.) Proceedings of the 5th International Conference on Formal Ontology in Information Systems (FOIS 2008), IOS Press, Amsterdam (2008) Bittner,T., Stell, J.G.: Vagueness and rough location. Geoinformatica 6(2), 99–121 (2002) Cohn, A.G., Gotts, N.M.: A theory of spatial regions with indeterminate boundaries. In: Eschenbach, C., Habel, C., Smith, B. (eds.) Topological Foundations of Cognitive Science (1994) Cohn, A.G., Gotts, N.M.: The ‘egg-yolk’ representation of regions with indeterminate boundaries. In: Burrough, P., Frank, A.M. (eds.) Proceedings, GISDATA Specialist Meeting on Geographical Objects with Undetermined Boundaries, Francis Taylor, pp. 171–187 (1996a) Cohn, A.G., Gotts, N.M.: Representing spatial vagueness: a mereological approach. In: Aiello, J.D.L.C., Shapiro, S. (eds.) Proceedings of the 5th conference on principles of knowledge representation and reasoning (KR 1996), pp. 230–241. Morgan Kaufmann, San Francisco (1996b) Corver, N., Doetjes, J., Zwarts, J.: Linguistic perspectives on numerical expressions: Introduction. Lingua, special issue on linguistic perspectives on numerical expressions 117(5), 751–775 (2007) Dubois, D., Prade, H.: An introduction to possibilistic and fuzzy logics. In: Smets, P., Mamdani, E.H., Dubois, D., Prade, H. (eds.) Non-Standard Logics for Automated Reasoning, Academic Press, London (1988) Evans, A.J., Waters, T.: Mapping vernacular geography: web-based GIS tools for capturing ‘fuzzy’ or ‘vague’ entities. International Journal of Technology, Policy and Management 7(2), 134–150 (2007) Evans, M.: Can there be vague objects? Analysis 38, 208 (1978); reprinted in his Collected Papers, Oxford, Clarendon Press (1985) Fine, K.: Vagueness, truth and logic. Synth`ese 30, 263–300 (1975) Fisher, P., Orf, T.: An investigation of the meaning of near and close on a university campus. Computers, Environment and Urban Systems (1991) Frank, A.: Qualitative spatial reasoning about distances and directions in geographic space. Journal of Visual Languages and Computing 3, 343–371 (1992) Gahegan, M.: Proximity operators for qualitative spatial reasoning. In: Kuhn, W., Frank, A.U. (eds.) COSIT 1995. LNCS, vol. 988, pp. 31–44. Springer, Heidelberg (1995) Goguen, J.: The logic of inexact concepts. Synthese 19, 325–373 (1969) Guha, R., Lenat, D.: CYC: a mid-term report. AI Magazine 11(3), 32–59 (1990) Halpern, J.Y.: Intransitivity and vagueness. In: Principles of Knowledge Representation: Proceedings of the Ninth International Conference (KR 2004), pp. 121–129 (2004)
46
B. Bennett
Held, M.: VRONI: An engineering approach to the reliable and efficient computation of voronoi diagrams of points and line segments. Computational Geometry: Theory and Applications 18(2), 95–123 (2001) Kamp, H.: Two theories about adjectives. In: Keenan, E. (ed.) Formal Semantics of Natural Language, Cambridge University Press, Cambridge (1975) Keefe, R., Smith, P.: Vagueness: a Reader. MIT Press, Cambridge (1996) Krifka, M.: Approximate interpretation of number words: a case for strategic communication. In: Bouma, G., Kraer, I., Zwarts, J. (eds.) Cognitive foundations of interpretation, Koninklijke Nederlandse Akademie van Wetenschapen, pp. 111–126 (2007) Kronenfeld, B.J.: mplications of a data reduction framework to assignment of fuzzy membership values in continuous class maps Special issue on Spatial Vagueness, Uncertainty and Granularity. In: Bennett, B., Cristani, M. (eds.) Spatial Cognition and Computation, vol. 3(2/3), pp. 221–238 (2003) Kulik, L.: Spatial vagueness and second-order vagueness. Spatial Cognition and Computation 3(2&3), 157–183 (2003) Lawry, J.: Appropriateness measures: an uncertainty model for vague concepts. Synthese 161, 255–269 (2008) Lehmann, F., Cohn, A.G.: The EGG/YOLK reliability hierarchy: Semantic data integration using sorts with prototypes. In: Proc. Conf. on Information Knowledge Management, pp. 272–279. ACM Press, New York (1994) Lund, H.G.: When is a forest not a forest? Journal of Forestry 100(8), 21–28 (2002) Lundberg, U., Eckman, G.: Subjective geographic distance: A multidimensional comparison. Psychometrika 38, 113–122 (1973) Maling, D.: Measurements from Maps: principles and methods of cartometry. Pergamon Press, Oxford (1989) Mallenby, D., Bennett, B.: Applying spatial reasoning to topographical data with a grounded ontology. In: Fonseca, F., Rodr´ıgues, M.A., Levashkin, S. (eds.) GeoS 2007. LNCS, vol. 4853, pp. 210–227. Springer, Heidelberg (2007) Mehlberg, H.: The Reach of Science. Extract on Truth and Vagueness, pp. 427–455. University of Toronto Press (1958); reprinted in Keefe, Smith (1996) Randell, D.A., Cui, Z., Cohn, A.G.: A spatial logic based on regions and connection. In: Proc. 3rd Int. Conf. on Knowledge Representation and Reasoning, pp. 165–176. Morgan Kaufmann, San Mateo (1992) Robinson, V.: Interactive machine acquisition of a fuzzy spatial relation. Computers and Geosciences 16, 857–872 (1990) Robinson, V.: Individual and multipersonal fuzzy spatial relations acquired using humanmachine interaction. Fuzzy Sets and Systems 113, 133–145 (2000) Santos, P., Bennett, B., Sakellariou, G.: Supervaluation semantics for an inland water feature ontology. In: Kaelbling, L., Saffiotti, A. (eds.) Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), Edinburgh, pp. 564–569 (2005a) Santos, P., Bennett, B., Sakellariou, G.: Supervaluation semantics for an inland water feature ontology. In: Kaelbling, L.P., Saffiotti, A. (eds.) Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), pp. 564–569. Professional Book Center, Edinburgh (2005b) Schockaert, S., De Cock, M., Kerre, E.E.: Spatial reasoning in a fuzzy region connection calculus. Artificial Intelligence 173(2), 258–298 (2009) Schockaert, S., De Cock, M., Cornelis, C., Kerre, E.E.: Fuzzy region connection calculus: An interpretation based on closeness. International Journal of Approximate Reasoning 48(1), 332–347 (2008)
Spatial Vagueness
47
Simons, P.: Parts: A Study In Ontology. Clarendon Press, Oxford (1987) Smith, B., Mark, D.M.: Do mountains exist? towards an ontology of landforms. Environment and Planning B: Planning and Design 30(3), 411–427 (2003) Smith, N.J.: Vagueness and Degrees of Truth. Oxford University Press, Oxford (2008) Spatial Cognition and Computation: special issue on spatial vagueness, uncertainty and granularity (2003) Tarski, A.: What is elementary geometry? In: Henkin, L., Suppes, P., Tarski, A. (eds.) The Axiomatic Method (with special reference to geometry and physics), North-Holland, Amsterdam (1959) Third, A., Bennett, B., Mallenby, D.: Architecture for a grounded ontology of geographic information. In: Fonseca, F., Rodr´ıguez, M.A., Levashkin, S. (eds.) GeoS 2007. LNCS, vol. 4853, pp. 36–50. Springer, Heidelberg (2007) Tye, M.: Vague objects. Mind 99, 535–557 (1990) van Fraassen, B.C.: Presupposition, supervaluations and free logic. In: Lambert, K. (ed.) The Logical Way of Doing Things, ch. 4, pp. 67–91. Yale University Press, New Haven (1969) Varzi, A.C.: Vagueness in geography. Philosophy and Geography 4, 49–65 (2001a) Varzi, A. (ed.): Topoi: special issue on the philosophy of geography, vol. 20(2). Kluwer, Dordrecht (2001b) Waismann, F.: Verifiability. In: Flew, A. (ed.) Logic and Language, Doubleday, New York, pp. 122–151 (1965), Wang, F., Brent Hall, G.: Fuzzy representation of geographical boundaries in GIS. International Journal of GIS 10(5), 573–590 (1996) Williamson, T.: Vagueness and ignorance. In: Proceedings of the Aristotelian Society, vol. 66, pp. 145–162 (1992) Williamson, T.: Vagueness. The problems of philosophy. Routledge, London (1994) Worboys, M.F.: Nearness relations in environmental space. International Journal of Geographical Information Science 15(7), 633–651 (2001) Zadeh, L.A.: Fuzzy logic and approximate reasoning. Synthese 30, 407–428 (1975) Zimmermann, H.-J.: Fuzzy set theory—and its applications, 3rd edn. Kluwer Academic Publishers, Norwell (1996)
A General Approach to the Fuzzy Modeling of Spatial Relationships* Pascal Matsakis, Laurent Wendling, and JingBo Ni
Abstract. How to satisfactorily model spatial relationships between 2D or 3D objects? If the objects are far enough from each other, they can be approximated by their centers. If they are not too far, not too close, they can be approximated by their minimum bounding rectangles or boxes. If they are close, no such simplifying approximation should be made. Two concepts are at the core of the approach described in this paper: the concept of the F-histogram and that of the F-template. The basis of the former was laid a decade ago; since then, it has naturally evolved and matured. The latter is much newer, and has dual characteristics. Our aim here is to present a snapshot of these concepts and of their applications. It is to highlight (and reflect on) their duality⎯a duality that calls for a clear distinction between the terms spatial relationship, relationship to a reference object, and relative position. Finally, it is to identify directions for future research.
1 Introduction Philosophers, physicists and mathematicians have been debating about space for centuries. Here, space is considered Euclidean and independent of time (our apology to Einstein). It is not, however, a mere abstract void (and Leibniz would rejoice): talking about space implies talking about (spatial) objects and relationships. Indeed, space is viewed as “the structure defined by the set of spatial relationships between objects” [58]. In the present paper as in related literature, space is usually two- or three-dimensional, with a Cartesian coordinate system. A physical object is of no interest in itself; the focus is on the part of space it occupies. Objects, therefore, are seen as subsets of the Euclidean space. A point, a line segment, a disk, a toroid, the union of these, are examples of objects. An object may be bounded or unbounded, convex or concave, open or closed, connected or disconnected, etc. Practically, it is either in raster or vector form. A Pascal Matsakis · JingBo Ni University of Guelph, Ontario, Canada e-mail: {pmatsaki,jni}@uoguelph.ca Laurent Wendling Université Paris Descartes, France e-mail:
[email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 49–74. springerlink.com © Springer-Verlag Berlin Heidelberg 2010
50
P. Matsakis, L. Wendling, and J. Ni
raster object in 2D space, for example, is sometimes seen as the union of unit squares (pixels), and others as a cloud of points (pixel centers). Finally, note that fuzzy subsets of the Euclidean space may also be considered. Fuzzy sets make it possible to encapsulate information regarding the imprecision or the uncertainty in the spatial extent of some physical objects. There is, in the end, a variety of spatial objects. So there is a variety of spatial relationships. Some are language-based, in the sense that they are naturally referred to using everyday terms, e.g., the relationships “right” (is to the right of), “far” (is far from), “touch” (touches). Others are math-based, and may or may not be named (e.g., the 512 relationships defined by the 9-intersection model). Some are binary; they involve two objects only (e.g., object A is to the right of object B). Others are not (e.g., object A is between objects B and C). In this paper, we limit our discussion to binary relationships, which are by far the most common subject of studies. They are usually categorized into directional (e.g., “right”), distance (e.g., “far”), and topological (e.g., “touch”) relationships. This is not surprising, since angles and distances are at the core of Euclidean geometry, and Euclidean spaces are, above all, topological spaces. Other categories, however, are sometimes considered (e.g., “intersect” is set-theoretical before being topological). Spatial relationships are often modeled by fuzzy relations on the set of all objects. Consider, for example, the statement “A is far from B”. In many everyday situations, one would find it neither completely true nor completely false (even if A and B are very simple crisp objects). A fuzzy model of “far” attaches a numerical value to the pair (A, B), and this value is seen as the degree of truth of the statement above. Not only the use of fuzzy relations seems more natural than the use of standard, all-ornothing relations, but it also allows two fundamental questions to be answered. How to identify the most salient relationship between two given objects in a scene? How to identify the object that best satisfies a given relationship to a reference object? Answering these questions comes down to calculating and comparing the degrees of truth of several statements. See Figs. 1 and 2. These statements, however, are not independent from each other. Part of the calculation of each degree of truth might therefore be common to all degrees of truth and yield an intermediate result, interesting if only for efficiency purposes. This result can be seen as a quantitative representation of either the relative position between the two objects (first question, Fig. 1) or the relationship to the reference object (second question, Fig. 2). What we argue here is that a clear distinction should be made between the terms spatial relationship (a binary relationship), relationship to a reference object (a unary relationship), and relative position. True, the position of an object with respect to another may be described in terms of spatial relationships. However, it may also have a representation of its own, as mentioned above. Ideally, such a representation should allow any relationship between the two objects to be assessed. Practically, this is never the case. The information relative to a given relationship might not have been captured by the representation, or might have been encapsulated in an unfathomable way. One representation may be better suited than another to the assessment of some relationships, and vice versa. At any rate, we may be interested in relative positions for what they are, and not in any particular relationship (e.g., when carrying out a scene matching task).
A General Approach to the Fuzzy Modeling of Spatial Relationships
51
Section 2 illustrates the discussion above. Its aim is to clarify, through examples, the differences between the terms spatial relationship, relationship to a reference object, and relative position. Sections 3 and 4 introduce the two concepts at the core of the general approach described in this paper, while pointing out dual characteristics. Sections 5 and 6 show how these concepts may rely on two others, also with dual characteristics. Section 7 deals with algorithmic issues. Many applications have been studied; Section 8 presents a review of the related literature. Finally, directions for future research are given in Section 9. Note that spatial relationships have been studied for many years, in many disciplines, including cognitive science, linguistics, geography and artificial intelligence. See, e.g., [7] [8] [13] [14] [18] [19] [39] [40]. The approach described here focuses on fuzzy models of spatial relationships (as opposed to, e.g., qualitative models) and is general only in the sense that: a variety of spatial objects can be handled (e.g., crisp or fuzzy, connected or disconnected, in raster form or in vector form); a variety of spatial information can be captured and exploited (i.e., directional, distance, topological); there is a variety of current and potential applications.
Fig. 1 How to identify the most salient relationship between two given objects A and B? Here, A and B are represented by vector data, the position of A relative to B is represented by an F-histogram (Section 3), and the 3 statements by fuzzy logic values. The answer to the question is R1.
Fig. 2 How to identify the object that best satisfies a given relationship R to a reference object B? Here, R is represented by a fuzzy binary relation, B by vector data, the relationship R to B by an F-template (Section 4), and the 3 statements by fuzzy logic values. The answer to the question is A3.
P. Matsakis, L. Wendling, and J. Ni
52
2 An Important Distinction 2.1 Relative Position vs. Relationship⎯Example Consider two points p and q in the 2D space. Possible representations of the position of p relative to q are the tuple (xp, yp, xq, yq), whose elements are the Cartesian coordinates of p and q; the pair (xqp, yqp), whose elements are the Cartesian coordinates of the vector qp; the pair (ρqp,θqp), whose elements are the polar coordinates of qp; the angle θqp; etc. The first representation is trivial. The second one, (xqp, yqp), is much more interesting. Although some information about p and q is lost, there is no loss of information about the position of p relative to q (assuming that relative position is invariant to translation). The third representation has the same characteristic. However, it is better suited for the assessment of distance relationships. These relationships cannot be assessed from the fourth representation, θqp. Too much information is lost. Nonetheless, θqp is a very compact representation, well suited for the assessment of directional relationships. For example, assuming that angular coordinates belong to (−π,π] and that the polar axis is horizontal and pointing to the right, we may consider that the degree of truth of the statement “p is to the right of q” is min{1, max{0,1−2|θqp|/π}}. In other words, the fuzzy relation R defined by
2 ⎧ ⎫⎫ ⎧ R( p, q) = min ⎨1, max ⎨ 0,1 − θqp ⎬ ⎬ π ⎭⎭ ⎩ ⎩
(1)
is a fuzzy model of the binary directional relationship “right”. If θqp= 0 then R(p, q)=1, i.e., p is definitely to the right of q. If θqp= π/2 then R(p, q)=0, i.e., p is definitely not to the right of q. In the end, once the relative position θqp has been calculated given the Cartesian coordinates of p and q (a painful task if you are using only pen and paper), the statements “p is to the right of q”, “p is above q”, “p is in direction 45° of q”, etc., can be evaluated with comparatively much less effort. The link with Fig. 1 should now be clear to the reader. Note that this example can be easily adapted to the 3D case.
2.2 Relationship vs. Relationship to a Reference Object⎯Example Here, an object is a “friendly” set of points in a rectangular region R of the 2D space, i.e., it is a nonempty, bounded, connected, regular closed set of points, included in R. Consider two objects A and B. Let |B| be the area of B. For any two points p and q, let |qp| be the distance between p and q. We call
d(A, B) = inf p∈A, q∈B qp
(2)
A General Approach to the Fuzzy Modeling of Spatial Relationships
53
the distance between A and B, and we call s(B) = 2
B π
(3)
the size of B (it is the diameter of a disk whose area is |B|). The fuzzy relation R defined by ⎧ d(A, B) ⎫ R(A, B) = max ⎨0,1− (4) ⎬, s(B) t ⎭ ⎩
where t denotes a positive real number, is a fuzzy model of the binary distance relationship “close”. If the distance between A and B is 0, then R(A, B)=1, i.e., A is definitely close to B. If the distance between A and B is t times larger than B, then R(A, B)=0, i.e., A is not close at all to B. Now, given B, assume we are asked to evaluate the statement “A is close to B” for a large number of objects A. Going through (2) and (4) every time would be inefficient. A better strategy is to compute the function dB defined on R by d B ( p) = infq∈B qp
(5)
d(A, B) = inf p∈A d B ( p) .
(6)
⎧ d ( p) ⎫ d B ( p) = max ⎨0,1− B ⎬ s(B) t ⎭ ⎩
(7)
R (A, B) = sup p ∈A d B ( p) .
(8)
and then use the fact that
Or, compute the function
and use the fact that
_ Once dB has been computed, the statements “A1 is close to B”, “A2 is close to B”, _ etc., can be readily evaluated. dB and dB are two possible representations of the unary distance relationship “close to B”. See the link with Fig. 2. Note that dB is usually known as a distance map. Again, this example can be easily adapted to the 3D case.
3 F-Histograms One of the two concepts at the core of the general approach described in this paper is the concept of the F-histogram. Its basis was laid a decade ago [23]. Since then, of course, the concept has evolved and matured. The idea and assumption behind
P. Matsakis, L. Wendling, and J. Ni
54
it are that acceptable representations of relative positions can be obtained by reducing the handling of all 3D and 2D objects to the handling of 1D entities. Notation and terminology are as follows. S is the Euclidean space. A direction θ is a unit vector. θ⊥ is the subspace orthogonal to θ that passes through the origin ω (an arbitrary point of S). The expression Sθ(p) denotes the line in direction θ that passes through the point p. Now, consider a fuzzy subset A of S. The membership α degree of p in A is μA(p). For any α∈[0,1], the α-cut of A is A ={p ∈S | μA(p)≥α}. The (fuzzy) intersection of A with Sθ(p) is denoted by Aθ(p) and called a section of A. An object is a fuzzy subset A of S such that any μA(p) belongs to the set α {α1,α2,…,αk+1}, with 1=α1>α2>…>αk+1=0, and any (Aθ(p)) i has a finite number of connected components. Consider two objects A and B. Consider a function F that accepts argument values of the form (θ, Aθ(p), Bθ(p)). The F-histogram associated with the pair (A, B) is a function FAB of θ. Its intended purpose is to represent, in some way, the position of A with respect to B. The histogram value FAB(θ) is defined as a combination of the F(θ, Aθ(p), Bθ(p)) values, for all p in θ⊥. See (9) and Fig. 3, where ope stands for the combination operator. Figure 4 is related, but will be commented in Section 4. F AB (θ) = ope p ∈θ⊥ F (θ, Aθ ( p), Bθ ( p))
(9)
Typically, F and FAB are real functions, the combination operator ope is the addition, and F AB (θ) = ∫
p∈θ⊥
F (θ, Aθ ( p), Bθ ( p)) dp .
(10)
The key point, then, is how to choose F. First, we might want to reduce the handling of fuzzy sections I and J to that of crisp sections, through some other function F: F (θ, I, J ) = ∑ i=1 ∑ j=1 (αi − α i+1 )(α j − α j+1 ) F(θ, I αi , J k
k
αj
).
(11)
Second, we might want to reduce the handling of crisp sections I and J to that of their connected components I1, I2, …, Im and J1, J2, …, Jn: F(θ, I, J ) = ∑ i=1 ∑ j=1 f (θ, Ii , J j ) . m
n
(12)
A General Approach to the Fuzzy Modeling of Spatial Relationships
Fig. 3 Principle of the calculation of the F-histogram F
Fig. 4 Principle of the calculation of the F-template F
55
AB
RB
Further reduction can be expressed as f (θ, I, J ) = ∫
∫
p∈I q∈J
ϕ (θ, p, q) dp dq ,
where I and J are (crisp) singletons, segments, lines or half-lines.
(13)
P. Matsakis, L. Wendling, and J. Ni
56
Note that for any fuzzy sections I and J, we then have F (θ, I, J ) = ∫
∫
p∈S q∈S
μ I ( p) μ J (q) ϕ (θ, p, q) dp dq .
(14)
FAB can also be referred to as the F-histogram FAB, the f-histogram f AB, or the ϕ-histogram ϕAB, depending on whether (11), (11) and (12), or (11), (12) and (13) hold. This categorization is illustrated by Fig. 5. Two properties are worth noticing at this point: F AB = ∑ f
∑ j=1 (αi − α i+1 )(α j − α j+1 ) F A k
( U i =1 Ai )( U j =1 B j ) m
and
k i=1 n
= ∑ i=1 ∑ j=1 f m
n
Ai B j
αi
B
αj
,
(15)
(16)
where A1, A2, …, Am are pairwise disjoint objects, and B1, B2, …, Bn too. Now, for any real number r, consider the function ϕr defined by: ϕr(θ, p, q)=1/|qp|r if p≠q and if θ is the direction of the vector qp; ϕr(θ, p, q)=0 otherwise. The ϕr-histogram ϕrAB is called a force histogram. The reason for the term force (and for the symbols F, F, f and ϕ, which all refer to the first letter of the words function and force) is the following. For any direction θ, the value ϕrAB (θ) can be seen as the scalar resultant of elementary physical forces. These forces (which are additive vector quantities) are exerted by the points of A on those of B, and each tends to move B in direction θ. Assume r =2. The forces then correspond to gravitational forces. This is according to Newton’s law of gravity, which states that every particle attracts every other particle with a force inversely proportional to the square of the distance between them. Under the above assumption, it is as if the objects A and B had mass and density: the area (2D case) or volumetric (3D case) mass density of A at point p is μA(p); the density of B at q is μB(q). Note that in the 2D case A and B can be seen as flat metal plates of constant and negligible thickness.
Fig. 5 Categorization of F-histograms F-histograms include F-histograms, which include f-histograms, etc.
A General Approach to the Fuzzy Modeling of Spatial Relationships
57
4 F-Templates How to identify, in a scene, the object that best satisfies a given relationship to a reference object? This question, which is one of the two fundamental questions that arise when dealing with spatial relationships (Section 1), defines an object localization task. One theory supported by cognitive experiments is that people accomplish this task by parsing space around the reference object into good regions (where the object being sought is more likely to be), acceptable, and unacceptable regions (where the object being sought cannot be) [15] [22]. These regions blend into one another and form a so-called spatial template [22], which assigns each point in space a value between 0 (unacceptable region) and 1 (good region). In other words, a spatial template is a fuzzy subset of the Euclidean space that represents a relationship to a reference object. The concept of the F-template was introduced in a series of three conference papers [29] [52] [10]. The idea and assumption behind it are that acceptable representations of relationships to reference objects can be obtained by reducing the handling of all 3D and 2D objects to the handling of 1D entities. A formal definition of the F-template is given in Section 4.1, and it is followed by an important example in Section 4.2.
4.1 Definition Here, the line in direction θ that passes through the point p is denoted by Sp(θ) (instead of Sθ(p), as in Section 3). The (fuzzy) intersection of Sp(θ) with a fuzzy subset A of S is denoted by Ap(θ) (instead of Aθ(p)). Consider a spatial relationship R and an object B. Consider a function F that accepts argument values of the form (p, R, Bp(θ)). The F-template associated with the pair (R, B) is a function F RB of p. Its intended purpose is to represent, in some way, the relationship R to the reference object B. The template value F RB (p) is defined as a combination of the
F(p, R, Bp(θ)) values, for all θ. See (17) and Fig. 4, where ope stands for the combination operator. F RB ( p) = opeθ F ( p, R, B p (θ))
(17)
There is obviously a duality between the F-template and the F-histogram, and it echoes the duality between the two fundamental questions mentioned in Section 1. Compare Fig. 4 with Fig. 3, and Fig. 2 with Fig. 1. Compare (17) with (9). In (17), θ varies and p does not. In (9), p varies and θ does not. Replace any subset of A with R, replace p with θ and θ with p, and (9) transforms into (17). Typically, F and F RB are real functions with output values in the range [0,1], the template F RB is a fuzzy subset of the Euclidean space, and F RB (p) aims to represent the degree to which p satisfies the relationship R to the reference object B.
58
P. Matsakis, L. Wendling, and J. Ni
4.2 An Important Example: Basic Directional Templates A spatial template that represents a directional relationship to a reference object may be called a directional (spatial) template. To emphasize the analogy with the wellknown distance maps (mentioned in Section 2.2), we may also call it a directional map. In [2], Bloch introduces the concept of the fuzzy landscape. A fuzzy landscape is a specific example of directional template, which does not sacrifice the geometry of the reference object (the object is not approximated through, e.g., its centroid, or its minimum bounding rectangle or box). Moreover, the defining equation (whose roots can be traced to earlier works [34] [20]) is very simple and intuitive. Because of this and the fact that the term template was coined earlier, and also to increase precision in language, we prefer to talk of basic directional templates, or basic directional maps, instead of fuzzy landscapes. The basic directional template induced by the object B in direction δ associates the value 2 ⎧ ⎧ ⎫⎫ sup q∈S −{ p} μ B (q) min ⎨1, max ⎨0,1− ∠(qp,δ) ⎬ ⎬ π ⎩ ⎭⎭ ⎩
(18)
with each point p, where ∠(qp, δ) denotes the angle between the two vectors qp and δ. Compare (18) with (1). In the case of raster data, the exact algorithm that naturally results from (18) is straightforward but computationally expensive. Reference [2] describes a much faster approximation algorithm, inspired by chamfer methods. Consider, e.g., a 2D image. The pixels are examined sequentially, from top to bottom and left to right, and then from bottom to top and right to left. Each time a pixel is examined, it is assigned a value whose calculation also involves the pixel’s neighbors. As shown in [2], a basic directional template can be seen as the morphological dilation of the reference object by a fuzzy structuring element. The idea behind the algorithm is to perform the fuzzy dilation with a limited support for the structuring element. According to Bloch, “most approaches [e.g., the F-histogram / template approach] reduce the representation of objects to points, segments or projections” [3] while hers “takes morphological information about the shapes… into account” [2], “considers the objects as a whole and therefore better accounts for their shape” [3]. The argument does not hold water, since basic directional templates can be proved to be Ftemplates [29]. As a result, they can be calculated by reducing objects to segments, using an F-template approach [29] [52]. An extensive experiment [30] has shown that this approach should be preferred to the morphological one in the case of 2D raster data, but that the opposite holds in the case of 3D raster data. 2D vector data can only be handled using the F-template approach, and there is yet no algorithm for 3D vector data. Once the basic directional template induced by B in direction δ has been computed, the degree of truth of the statement “A is in direction δ of B” (i.e., “A is in relationship R with B” where R denotes the relationship “in direction δ”) can be calculated for any object A, in comparatively no time, using a fuzzy pattern-matching approach [12] [2].
A General Approach to the Fuzzy Modeling of Spatial Relationships
59
5 F-Histograms from Spatial Correlation Most work on F-histograms has focused on force histograms. The reasons are multiple, as explained in Section 5.1. Force histograms can actually be generated from spatial correlations; this is an important concept, covered in Section 5.2.
5.1 Interest in Force Histograms Force histograms are relative position descriptors with high discriminative power [25]. Moreover, the way they change when the objects are affine-transformed is known [25] [36]. This is an important issue in computer vision and pattern recognition, especially because it is intrinsically linked to the design of widely used affine invariant descriptors. Remember that affine transformations include, e.g., translations, rotations, scalings, reflections and stretches. Let aff be any invertible affine transformation. It can be written as the composition of a translation with a linear transformation lin (an affine transformation such that lin(ω)=ω). It is a common convention to see lin as a matrix (a 2×2 matrix if S is of dimension 2; a 3×3 matrix if S is of dimension 3). Likewise, vectors can be seen as column matrices and vice versa. As shown in [36], for any real number r, any objects A and B, and any direction θ, [A ] aff [B ] ϕ aff (θ) = det(lin) r
lin −1 ⋅ θ
r −1
ϕ rAB (θ') .
(19)
In this equation, det(lin) is the determinant of the matrix lin and⎟det(lin) ⎜ its absolute value; the symbol . denotes matrix multiplication;⎟ lin−1 . θ ⎜ is the norm of
the vector lin−1 . θ ; the direction θ' is the unit vector (lin−1 . θ) /⎟ lin−1 . θ ⎜. The importance of having a property such as (19) is discussed in [25] and illustrated through experiments with synthetic and real data. Another reason for the special interest in force histograms is that they lend themselves, with great flexibility, to the modeling of directional relationships by fuzzy binary relations [26]. The main methods that can be used to achieve this are the aggregation method [20], the compatibility method [34], and the method based on force categorization [27]. The fuzzy relations then satisfy four fundamental properties, which express the following intuitive ideas: if the objects in hand are sufficiently far apart, each one can be seen as a single point in space; the directional relationships are not sensitive to scale changes; all directions have the same importance; the semantic inverse principle [16] is respected (e.g., object A is to the left of object B as much as B is to the right of A). As a corollary of these properties, it is possible to determine how the fuzzy relations react when the objects are similarity-transformed. There is, of course, a link with (19), since similarity transformations are particular affine transformations. Note that the four abovementioned properties form the axiomatic basis upon which the concept of the histogram of forces was actually developed [23].
P. Matsakis, L. Wendling, and J. Ni
60
Directional relationships are not, however, the only spatial relationships that can be assessed from force histograms. Reference [42] describes a fuzzy model of inner-adjacency. The position of A relative to B is then represented by ϕrA(B−A) instead of ϕrAB. Reference [45] describes a fuzzy model of surroundedness. The underlying assumption is that A is connected and does not intersect the convex hull of B. Reference [24] describes a fuzzy model of betweenness. Although the preposition “between” usually denotes a ternary relationship, its model in [24] is a fuzzy binary relation. A sentence such as “A is between B and C ” is read “A is between B∪C ”. The position of A relative to B and C is represented by ϕrA(B∪C). Most work on F-histograms has focused on force histograms, but not all. Consider ϕ2AB. It has interesting characteristics [33]. Usually, however, it is not defined anywhere if A and B intersect, because the integral in (10) then diverges. As shown in [23], ϕ-histograms that are not force histograms make it possible to overcome this limitation while preserving the abovementioned characteristics. Reference [23] also suggests f-(non-ϕ-)histograms for the handling of convex objects. The fuzzy model of surroundedness mentioned in the previous paragraph suits the application considered in [45], but only because the objects there satisfy certain conditions. Another model would otherwise be necessary. Its design could be based on F-(non-f-)histograms, instead of force histograms. This is a promising avenue, as pointed in [24]. Finally, [31] describes F-(non-F-)histograms for the combined extraction of directional and topological relationship information. The particularity of these histograms is that they are coupled with Allen relations [1] using fuzzy set theory. Various systems rely on them to capture the essence of the relative positions of objects with natural language descriptions [32] [55] [56].
5.2 Spatial Correlation In [27], a natural language description of the relative position between two objects A and B is generated from the force histograms ϕ 0 AB and ϕ 2 AB. The fact is that ϕ 0 AB and ϕ 2 AB have very different and interesting characteristics, which complement one another. As this example shows, it may be useful to calculate two or more force histograms associated with the same pair of objects. These histograms are obviously not totally independent from each other. Part of the calculation of one might therefore be common to all and yield an intermediate result, interesting if only for efficiency purposes. The same idea was expressed in Section 1; the intermediate result was seen as a quantitative representation of the relative position between two objects. Here, the intermediate result is a spatial correlation. Compare Fig. 6 with Fig. 1. Figure 7 is related, but will be commented in Section 6. The spatial correlation between A and B provides raw information about the position of A relative to B. It is the function ψ AB defined by ψ AB (v) = ∫
q∈S
μ A (q + v) μ B (q) dq ,
(20)
A General Approach to the Fuzzy Modeling of Spatial Relationships
61
where v denotes any vector and + denotes point-vector addition. All force AB histograms ϕ AB as follows: r associated with A and B can be derived from ψ ϕ rAB (θ) = ∫
+∞ ψ AB (uθ)
0
ur
du .
(21)
Reference [36] shows that (20) and (21) lead to different algorithms than (10) and (14) and are better adapted to the solving of some theoretical issues.
Fig. 6 F-histograms from spatial correlation
Fig. 7 F-templates from force field
6 F-Templates from Force Field Basic directional templates have been used for spatial reasoning, object localization and identification, structural and model-based pattern recognition [4] [9] [21] [48]. They have, however, important flaws. They are overly sensitive to outliers. Elongated reference objects pose problems, and concave objects as well [28]. The main reason is that basic directional templates make use of angle information but ignore distance information. According to cognitive experiments [22] [17] [15], the former is indeed of primary importance, but the latter also contributes in shaping a directional template. For a given angular deviation, the membership degrees are not constant. They fluctuate slightly, depending on the distance to the reference object. Moreover, the fluctuation varies from one angular deviation to another. Finally, when sufficiently far from the object, all the membership degrees drop. For example, if you were told that the soccer ball was
P. Matsakis, L. Wendling, and J. Ni
62
to the right of the bench, you would not look for it hundreds of feet from the bench. One may wonder whether angle information and distance information could be processed in separate steps. In [52], the authors argue that the answer is negative, and they show how directional F-templates can embed distance information to elegantly overcome the abovementioned flaws. Their work is based on the following results: (i) basic directional templates are F-templates [29]; (ii) distance _ maps like dB and dB (Section 2.2) are F-templates too [52]; a binary operation ⊗ and two F-templates p
opeθ F1(p,R, Bp(θ)) and p
opeθ F2(p,R, Bp(θ))
define a new F-template p opeθ F1(p,R, Bp(θ)) ⊗ F2(p,R, Bp(θ)). Now, assume different directional relationships to the same reference object need to be considered. Assume they are represented by directional templates. These templates are obviously not totally independent from each other. Part of the calculation of one might therefore be common to all and yield an intermediate result, interesting if only for efficiency purposes. The same idea was expressed in Section 1; the intermediate result was seen as a quantitative representation of a relationship to a reference object. Here, the idea is coupled with the desire to exploit the duality between F-templates and F-histograms; the intermediate result is a force field. Compare Fig. 7 with Fig. 2, and Fig. 7 with Fig. 6. Once the force field has been computed, F-templates that represent directional relationships to the reference object can be derived from the field in negligible time. Basic directional templates, and the templates described in [52], cannot be calculated using such a two-step procedure. The force field induced by B is the function ψ Br defined as follows: ψ rB ( p) = ∫
q ∈S
μ B (q)
qp qp
r +1
dq
(22)
The reason for the term force is the same as in Section 3. The object B is seen as an object with mass and density: the density of B at point q is μB(q). The vector ψ Br ( p) is the force exerted on B by a particle of mass 1 located at p. The force field-based template induced by B in direction δ makes use of both angle and B distance information. It is a function ϕ R r that may be defined as rB ( p)i RB ( p) = max 0,
, r B (q)i sup qS r
(23)
where y denotes the dot product and R is the relationship “in direction δ”. Once B ϕR r has been computed, the degree of truth of the statement “A is in relationship
R with B” (i.e., “A is in direction δ of B”) can be calculated for any object A, the
A General Approach to the Fuzzy Modeling of Spatial Relationships
63
same way as mentioned in Section 4.2. Preliminary experiments [28] [38], where the characteristics of force field-based templates are examined and compared with those of basic directional templates, show the interest of the approach. Note that B RB the connection between the two pairs (ψ AB, ϕ AB r ) and (ψ r , ϕ r ) in Figs. 6 and 7 can be elegantly expressed by the equation below:
∫θ ϕ r −1 (θ) θ dθ = ∫ p μ A ( p) ψ r ( p) d p B
AB
(24)
7 On the Design of Efficient algorithms F-histograms and F-templates lend themselves to the design of efficient algorithms, whether the Euclidean space is of dimension two or three, the objects are crisp or fuzzy, in raster or vector form. Section 7.1 illustrates some of the typical steps in the design process. These steps are briefly described in Section 7.2, from a higher perspective. An important issue (the selection of a set of reference directions) is covered more extensively in Section 7.3.
7.1 Illustrative Example How can (10) and (14) be adapted to the case of 2D raster data? Consider two objects A and B, a direction θ and a point p. As illustrated in Fig. 8a, the line Sθ(p) might pass through some pixels i and j of A and B, with nonzero membership degrees μA(i) and μB(j). These pixels can be determined by rasterizing Sθ(p) using a line-drawing algorithm. They project on Sθ(p) as segments Ii and Jj. Let I=Aθ(p) and J=Bθ(p). The value of F (θ, I, J ) may be calculated as follows: F (θ, I, J ) = ∑ i ∑ j μ A (i) μ B ( j)
∫ p∈I ∫q∈J i
ϕ (θ, p, q) dp dq
(25)
j
(25) then replaces (14). Moreover, the integral in (10) can be approximated by a finite sum; (10) may be replaced with F AB (θ) = εθ ∑ k ∈Z F (θ, Aθ ( pk ), Bθ ( pk )) ,
(26)
where εθ and the points pk are as suggested in Fig. 9a. Symbolic computation of the double integral in (25) yields closed-form expressions that do not depend on A nor B. See Fig. 10. In the end, numerical computation of F (θ, I, J ) ⎯ and F AB (θ) ⎯ translates into multiple instantiations of these expressions.
P. Matsakis, L. Wendling, and J. Ni
64
A J1 I4 I1
I2
A
J2
J1
I5
I2 B
B
I3
I1 (a)
S (p)
(b)
S (p)
Fig. 8 The sections Aθ(p) and Bθ(p) are decomposed into segments Ii and Jj. In the case of fuzzy objects (left), all segments are of the same length. In the case of crisp objects (right), the segments Ii are mutually disjoint, and the segments Jj also.
A
S (pk+1) S (pk) S (pk1)
A
B
B (a)
(b)
Fig. 9 In the case of raster data (left), the lines Sθ(pk) partition the objects into sets of adjacent pixels; the distance between two consecutive lines is constant. In the case of vector data (right), the lines pass through the vertices of the objects and partition the objects into trapezoids; the distance between two consecutive lines varies.
For crisp objects, (25) can be rewritten as follows:
F (θ, I, J ) = ∑ i ∑ j
∫ p∈I ∫q∈J i
ϕ (θ, p, q) dp dq ,
(27)
j
where the segments Ii and Jj are now as illustrated in Fig. 8b. In this case, symbolic computation of the double integral yields more expressions than as in Fig. 10. F AB (θ) , however, computes much faster, since each instantiation corresponds to the process of a batch of pairs of object pixels instead of the process of a single pair. Actually, (27) can be used in place of (25) whether the objects are crisp or fuzzy: the idea is to exploit (15), i.e., it is to handle the fuzzy objects through their level cuts (which are crisp objects). Equations (15) and (27) lead to shorter processing times than (25) if one object is crisp and the other fuzzy with few membership degrees. Note that all of the above holds in the case of 3D raster data: replace the word ‘pixel’ with ‘voxel’, and the sum in (26) with a
A General Approach to the Fuzzy Modeling of Spatial Relationships
pI qJ
u2
i
Ii Jj
u=1 Ii
Jj
u=0 Ii =Jj
u1 Jj
j
65
r (, p, q) dp dq
r=1 [(u1).ln(u1)2u.ln(u)+(u+1).ln(u+1)] r=2 2.ln(u)ln(u1)ln(u+1) r1 and r2 2r [(u1)2r2u2r+(u+1)2r] / [(1r)(2r)] r=1 2 ln(2) r1 and r<2 2r (22r2) / [(1r)(2r)] r2 + r<1 2r / [(1r)(2r)] r1 + 0
Ii Fig. 10 Symbolic integration. Several cases must be considered, depending on r and on the position of Ii relative to Jj. These segments are of length .
double sum. Vector data require more important changes: these data can be handled very efficiently, because the objects can be partitioned into bigger blocks (Fig. 9b); however, the symbolic integration step is more complex and generates a higher number of closed-form expressions.
7.2 Typical Steps Typical steps in the design of efficient algorithms include the following. First, a set of reference directions is chosen (more on this in the next section). Then, for every reference direction θ, a partitioning of the space is undertaken. When dealing with vector data, each block of the partition is a region of space delimited by two lines in direction θ (2D case), or by planes that include such lines (3D case). When dealing with raster data, each block corresponds to a raster line in direction θ, or to a region of the image defined by the union of such lines. The blocks, in turn, cut the considered object(s) into pieces, and vice versa. The whole partitioning procedure can rely on efficient computer graphics software and hardware tools. For example, line-drawing algorithms such as Bresenham’s [6] are often implemented in the firmware or hardware of graphics cards, and graphics accelerators provide operations such as polygon clipping implemented in highspeed hardware. Once the partitioning procedure is completed, the different blocks can be processed independently from each other. The same, of course, applies to
66
P. Matsakis, L. Wendling, and J. Ni
the reference directions. The algorithms for the computation of F-histograms and
F-templates are, therefore, highly parallelizable. When equations like, e.g., (13) or (22) are involved, an additional and more common way to increase efficiency is to harness the power of integral calculus. A definite integral can be approximated by a finite sum. One may use, e.g., a Riemann sum, the trapezoidal rule, Simpson’s rule, or any other Newton-Cotes rule. Different algorithms may actually result from this procedure, depending, e.g., on whether the integral is written in Cartesian or polar coordinates. In some cases, however, symbolic integration can be performed and closed formulas obtained. The domain of integration and the integrand usually involve various parameters. One single integral may therefore correspond to one, a few, tens, or even hundreds of formulas, which can be hardcoded and organized in a tree structure. The appropriate formula can then be found and instantiated at run-time, when the values of the parameters are known. The procedure is particularly efficient. Note that crisp and vector data tend to lend themselves to symbolic integration more easily than fuzzy or raster data.
7.3 Set of Reference Directions An important issue is how to choose the set of reference directions. Practically, of course, only a finite number K of directions can be considered. The higher K, the more complete the collected F-histogram data, or the more accurate the computed F-template values, but the longer the processing time. Usually, the reference directions are so chosen as to satisfy the following properties: they are evenly distributed in space; they include the primitive directions (right and left, above and below, front and behind); if θ is a reference direction, its opposite −θ (which can be processed at the same time) also is a reference direction. In the 2D case, the standard procedure is therefore to pick the directions defined by the angles 2πi/K, for all i in 0..K−1, with K a multiple of 4. The value for K can be as low as 4, and arbitrarily high, but a few tens to a few hundreds of reference directions, with a maximum of 360, seem to be necessary and sufficient for most applications. Note that when the objects are in raster form, K is naturally limited by the size of the image. In an n×n image, for example, the largest set of directions worth considering is a set of 8n−8 unevenly distributed directions [35] [36]. In the 3D case, finding an arbitrarily large set of evenly distributed directions is not obvious. One might want to pick the directions ωp/|ωp|, for all vertices p of a regular convex polyhedron centered at ω. However, there exist only 5 regular convex polyhedra (the Platonic solids), and none has more than 20 vertices. A solution is to calculate (e.g., using a random-start hill-climbing heuristic) the equilibrium positions, on a sphere centered at ω, of K points p that repulse each other like equally charged particles [37]. Obviously, to get the same density of information, many more reference directions are needed in 3D than in 2D. For example, 30 directions in 2D correspond roughly to 300 directions in 3D (neighbor particles on the unit sphere will then be about 2π/30 apart). 300
A General Approach to the Fuzzy Modeling of Spatial Relationships
67
directions in 2D correspond to 30,000 in 3D. If the running time of the algorithm is linear in K, which is typical, one should definitely consider parallel computing. Some algorithms, however, focus on the calculation of intermediate data, and their running time is practically independent of K [35] [36]. Also note that K can be dynamically increased: since directions are handled independently from each other, additional ones can be considered in subsequent stages, depending on results and time constraints. A peculiar situation is worth mentioning: it may happen that FAB(θ)=0 for all reference directions θ (Fig. 11a). One may then force FAB(θ0) to FAB(qp/|qp|), where p is an arbitrary point of A and q is an arbitrary point of B, and where θ0 is the reference direction closest to qp/|qp| (i.e., the dot product of the two vectors is maximum). In this situation, however, the objects might be too far apart, and a simpler model than the F-histogram might be sufficient (e.g., a model based on minimum bounding rectangles or boxes); the objects might not be appropriate for the model (e.g., clouds of scattered, small connected components); the number of reference directions might be too low (note that this number can be easily adapted to the objects, as illustrated in Fig. 11b). Similar comments apply to F-templates (Fig. 11c).
(a)
(b)
(c)
Fig. 11 Each dark gray rectangle represents (the minimum bounding rectangle of) some 2D object. The reference directions are the horizontal, vertical and diagonal directions. (a) All computed F-histogram values are 0. (b) The number of reference directions may be chosen depending on the angle between the two lines. (c) The F-template values in the black areas must be calculated independently from the others. These areas are outside the region of interest (light gray rectangle) defined by the reference directions. The higher the number of directions, the larger the region of interest.
8 Applications and Related Literature Numerous applications of the general approach described in this paper have been studied, and new applications continue to be explored. Here is a review of the related literature. Relative position descriptors like force histograms are orthogonal—and, therefore, constitute a natural complement—to color, texture, and shape descriptors. Reference [25] illustrates the point and explores the behavior of force
68
P. Matsakis, L. Wendling, and J. Ni
histograms towards affine transformations. The findings, supported by experiments on synthetic and real data, suggest that these histograms would yield powerful edge attributes in attributed relational graphs and could be of great use in scene matching. The latter idea is examined and validated in [43]. The authors present a system able to determine whether two images acquired under different viewing conditions capture the same scene. If the answer is positive, the system produces a mapping of objects from one view to the other and recovers the viewing transformation parameters. The approach is based, of course, on the computation and geometric properties of force histograms. The F-histogram F AA is called the F-signature of the object A [23]. In [57], Fsignatures are used to classify orbits and sinuses represented by drawings of craniums from the 3rd century A.D. The results are consistent with human responses, and the approach compares favorably with standard ones based on geometric criteria and Fourier descriptors. Reference [49] focuses on F-signatures of fuzzy objects. A region of a grayscale image is seen as a fuzzy object whose membership grades correspond to intensity values. An F-signature of this object can, of course, be calculated. In [49], however, F-signatures of its α-cuts are calculated instead, and then arranged in layers to form a 3D signature. The approach makes it easier to discriminate regions with similar shapes but different gray levels. It is extended to color images and validated using real data. Note that F-signatures of 2D objects can be easily expressed as periodic functions. For storage efficiency and noise reduction purposes, they can therefore be approximated based on the calculation of their Fourier descriptors. This is the approach adopted in [50], which deals with the recognition of graphical symbols in technical line drawings. The methodology is demonstrated using architectural drawings. Reference [41] takes advantage of another characteristic of Fsignatures: they do not require objects to be connected. The paper presents a geospatial information retrieval and indexing system. It brings a diverse set of technologies together with an aim of allowing image analysts to more rapidly identify relevant imagery. In particular, the system is able to retrieve database satellite images with man-made objects in specific spatial configurations. This ability comes from the fact that several objects in a given configuration form a single disconnected object, an F-signature of which can be calculated. In [27], degrees of truth calculated from force histograms provide inputs to a fuzzy rule base that produces intuitive, human-like linguistic descriptions of relative positions. The system is tested on regions from laser radar range images of a complex power plant scene. The same system is used in [45], where a mobile robot describes its environment based on readings from a ring of sonar sensors. Experiments are carried out with the Nomad simulator. Note that, in [27], the force histograms are computed from raster data, the spatial reference frame is implicitly determined by the reader’s location (world view), and the linguistic descriptions involve directional relationships only. In [45], however, the histograms are computed from vector data, the reference frame is determined by
A General Approach to the Fuzzy Modeling of Spatial Relationships
69
the intrinsic orientation of the robot (egocentric view), and surroundedness is also considered. As a further step, the system is coupled with a multimodal robot interface [46]. This time, spatial information is extracted from an evidence grid map, which is built from range sensor data accumulated over time. Real examples of natural dialogs are presented. They include both robot-to-human feedback and human-to-robot commands. Another system for generating linguistic descriptions is worth mentioning. In [32], the descriptions are generated from Allen Fhistograms [31] [54] instead of force F-histograms, and they are built around topological relationships. The approach is validated using several sets of real and synthetic data. References [27] and [32] show the specificity and limits of each type of histogram, and they show how each one can contribute to the generation of natural language expressions that capture the essence of relative positions. Finally, let us mention [44], which deals with hand-sketched route maps. Such a map does not generally contain complete map information and is not necessarily drawn to scale, but yet it contains the correct qualitative information for route navigation. The system presented in [44] is able to generate, from the sketch, a natural language description of the route to follow. The methodology is based on the use of force histograms as relative position descriptors. It is demonstrated using example sketches drawn on a handheld PDA. The concept of the F-template is too recent to be the subject of application papers. At this time of writing (August 2009), [30], [28] and [38] have not even been published yet. We believe, however, that there is a real potential for the concept: first, because of the duality between the F-template and the
F-histogram⎯and the fact that the latter has aroused significant interest, as illustrated above; second, because spatial templates have already been the subject of application papers [48] [4] [21] [9] [47]; third, because most of these papers make use of basic directional templates⎯and the F-template approach provides new, efficient algorithms for their computation [30]; last, because force fieldbased templates might in many cases be preferable to basic directional templates [28] [38]. Let us briefly describe, e.g., the work in [9] and [21]. As pointed out by Logan and Sadler in [22], spatial templates can be combined to represent compound relationships to reference objects. In [9], (normalized) distance maps and basic directional templates are combined using fuzzy conjunctions and disjunctions. The template resulting from such a fusion is used to construct a new external force for a deformable model⎯a force that expresses constraints about spatial relationships. The approach is shown to improve the segmentation of brain subcortical structures in 3D magnetic resonance images. Reference [21] describes an image retrieval system that can handle queries involving spatial relationships. The images are represented by fuzzy attributed relational graphs: each node in the graph represents an image region; each edge represents the relationships between two regions and has an attribute whose value is a tuple of degrees of truth calculated from basic directional templates. The system is tested using synthetic and natural image databases.
70
P. Matsakis, L. Wendling, and J. Ni
9 Directions for Future Work Several algorithms for the computation of force histograms in the case of 2D raster data have been implemented and are worth considering. The traditional algorithm [23] [33] runs in O(Kk2N√N) time, where K is the number of directions in which forces are considered, k is the number of possible membership degrees, and N is the number of pixels in the image. It is based on (15) and (27). A variant of the algorithm runs in O(KN√N) time and is based on (25). A second variant runs in O(KkN√N). A third variant, dedicated to the computation of constant force histograms (r = 0), runs in O(KN) [53]. The traditional algorithm and its variants rely on (10). A completely different algorithm, the correlation-based algorithm [35] [36], is in O(NlogN) and relies on (21). Which algorithm or variant performs better under which conditions is an issue discussed in [36]. From a theoretical point of view, extension to 3D raster data is straightforward. An implementation of the extended traditional algorithm is presented in [37]. The extended correlation-based algorithm, however, has not been implemented yet. This is the major missing piece for the handling of raster data. Vector data have received much less attention. Only one algorithm has been developed so far for the computation of force histograms in the 2D case [23] [33]. The algorithm runs in O(Kk2 η log η) time, where η is the total number of object vertices, and relies on (10) and (15). A variant runs in O(Kk η log η). The current implementation, however, can only handle disjoint crisp objects. Extension to 3D vector data can be easily achieved. For example, partition the Euclidean space using equidistant parallel planes. Each plane Pi intersects the 3D objects A and B in the 2D objects Ai and Bi. Compute the histograms ϕrAiBi using the algorithm for 2D vector data. For any direction θ in the planes, ϕrAB(θ) can be approximated by the Riemann sum Σi ϕrAiBi (θ) d, where d is the distance between two consecutive planes. This algorithm, however, has yet to be implemented. There are also directions, for future research, of a more theoretical nature. For example, given the F-histogram F AB, is it possible to find all the pairs (C, D) of objects such that F AB = F CD ? Only the beginnings of an answer are given in [25]. Given the F-histograms F AB and F BC, is it possible to find F AC ? Another question concerns the design of affine-invariant relative position descriptors. We know that the histogram of forces reacts “well” to affine transformations, in a mathematically predictable way (Section 5.1). However, the normalization procedure described in [25] leads to a similarity-invariant relative position descriptor, not to an affine-invariant descriptor. Other normalization procedures should be developed. On a different note, it would be worth investigating new fuzzy models of “surround” and “between”, based on dedicated F-histograms. Finally, let us come back to the systems for generating linguistic descriptions (Section 8). Further mechanisms to adapt these systems to individual users should be researched. Very preliminary work is reported in [5] and [51].
A General Approach to the Fuzzy Modeling of Spatial Relationships
71
We have not talked about F-templates in this section. Writing another long list of directions for future research would be useless. There is, indeed, only one question that seems to really matter at this point, and this question summarizes it all: investigate the transfer of models, properties, algorithms, etc., from Fhistograms to F-templates, using the duality between the two concepts. Acknowledgements. The authors thank the anonymous reviewer for his/her constructive comments. Pascal Matsakis wants to express his gratitude for support from the Natural Science and Engineering Research Council of Canada (NSERC), Grant 262117.
References [1] Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 832–843 (1983) [2] Bloch, I.: Fuzzy relative position between objects in image processing: A morphological approach. IEEE Trans. on Pattern Analysis and Machine Intelligence 21(7), 657–664 (1999) [3] Bloch, I.: Fuzzy spatial relationships for image processing and interpretation: A review. Image and Vision Computing 23, 89–110 (2005) [4] Bloch, I., Saffiotti, A.: On the representation of fuzzy spatial relations in robot maps. In: Bouchon-Meunier, B., Foulloy, L., Yager, R.R. (eds.) Intelligent systems for information processing, pp. 47–57. Elsevier, NL (2003) [5] Bondugula, R., Matsakis, P., Keller, J.: Force histograms and neural networks for human-based spatial relationship generalization. In: Proc. of the 2004 IASTED Int. Conf. on Neural Networks and Computational Intelligence, pp. 185–190. ACTA Press (2004) [6] Bresenham, J.E.: Algorithm for computer control of a digital plotter. IBM Systems J. 4(1), 25–30 (1965) [7] Chang, S.K., Shi, Q.Y., Yan, C.W.: Iconic indexing by 2-D strings. IEEE Trans on Pattern Analysis and Machine Intelligence 9(3), 413–428 (1987) [8] Clementini, E., Di Felice, P., Hernández, D.: Qualitative representation of positional information. Artificial Intelligence 95, 317–356 (1997) [9] Colliot, O., Camara, O., Bloch, I.: Integration of fuzzy spatial relations in deformable models Application to brain MRI segmentation. Pattern Recognition 39(8), 1401–1414 (2006) [10] Coros, S., Ni, J., Matsakis, P.: Object localization based on directional information: Case of 2D vector data. In: Proc. of the 14th Int. Symp. on Advances in Geographic Information Systems, pp. 163–170 (2006) [11] Dubois, D., Jaulent, M.C.: A general approach to parameter evaluation in fuzzy digital pictures. Pattern Recognition Letters 6(4), 251–259 (1987) [12] Dubois, D., Prade, H.: Weighted fuzzy pattern matching. Fuzzy Sets and Systems 28, 313–331 (1988) [13] Dutta, S.: Approximate spatial reasoning: Integrating qualitative and quantitative constraints. Int. J. of Approximate Reasoning 5, 307–331 (1991) [14] Egenhofer, M.J., Franzosa, R.: Point-set topological spatial relations. Int J of Geographical Information Systems 5, 161–174 (1991)
72
P. Matsakis, L. Wendling, and J. Ni
[15] Franklin, N., Henkel, L.A., Zengas, T.: Parsing surrounding space into regions. Memory and Cognition 23, 397–407 (1995) [16] Freeman, J.: The modelling of spatial relations. Computer Graphics and Image Processing 4, 156–171 (1975) [17] Gapp, K.P.: Angle, distance, shape, and their relationship to projective relations. In: Proc. of the 17th Annual Conf. of the Cognitive Science Society, Mahwah, NJ, pp. 112–117 (1995) [18] Herskovits: Language and spatial cognition: An interdisciplinary study of the prepositions in English. Cambridge University Press, Cambridge (1986) [19] Kuipers, B.: Modeling spatial knowledge. Cognitive Science 2, 129–153 (1978) [20] Krishnapuram, R., Keller, J.M., Ma, Y.: Quantitative analysis of properties and spatial relations of fuzzy image regions. IEEE Trans. on Fuzzy Systems 1(3), 222–233 (1993) [21] Krishnapuram, R., Medasani, S., Jung, S.H., Choi, Y.S., Balasubramaniam, R.: Content-based image retrieval based on a fuzzy approach. IEEE Trans. on Knowledge and Data Engineering 16(10), 1185–1199 (2004) [22] Logan, G.D., Sadler, D.D.: A computational analysis of the apprehension of spatial relations. In: Bloom, P., Peterson, M.A., Nadel, L., Garrett, M.F. (eds.) Language and Space. MIT Press, Cambridge (1996) [23] Matsakis, P.: Relations spatiales structurelles et interprétation d’images. PhD Thesis, Paul Sabatier University, Toulouse, France (1998) [24] Matsakis, P., Andréfouët, S.: The fuzzy line between among and surround. In: Proc. of the 2002 IEEE Int. Conf. on Fuzzy Systems, vol. 2, pp. 1596–1601 (2002) [25] Matsakis, P., Keller, J., Sjahputera, O., Marjamaa, J.: The use of force histograms for affine-invariant relative position description. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(1), 1–18 (2004) [26] Matsakis, P., Keller, J., Wendling, L.: F-histogrammes et relations spatiales directionnelles floues. In: Proc. of the 1999 Conf. on Fuzzy Logic and Its Applications (LFA), vol. 1, pp. 207–213 (1999) [27] Matsakis, P., Keller, J., Wendling, L., Marjamaa, J., Sjahputera, O.: Linguistic description of relative positions in images. IEEE Trans. on Systems, Man and Cybernetics Part B 31(4), 573–588 (2001) [28] Matsakis, P., Ni, J., Veltman, M.: Directional relationships to a reference object: A quantitative approach based on force fields. In: Proc. of the 16th Int. Conf. on Image Processing, pp. 321–324 (2009) [29] Matsakis, P., Ni, J., Wang, X.: Object localization based on directional information: Case of 2D raster data. In: Proc. of the 18th IAPR Int. Conf. on Pattern Recognition, pp. 142–146 (2006) [30] Matsakis, P., Ni, J., Wang, X., Coros, S.: Basic directional spatial templates. Int. J. of Computer Vision (submitted) [31] Matsakis, P., Nikitenko, D.: Combined extraction of directional and topological relationship information from 2D concave objects. In: Cobb, M., Petry, F., Robinson, V. (eds.) Fuzzy modeling with spatial information for geographic problems, pp. 15–40. Springer, Heidelberg (2005) [32] Matsakis, P., Wawrzyniak, L., Ni, J.: Relative positions in words: A system that builds descriptions around Allen relations. Int. J. of Geographical Information Science 99999(1), 1–23 (2008)
A General Approach to the Fuzzy Modeling of Spatial Relationships
73
[33] Matsakis, P., Wendling, L.: A new way to represent the relative position of areal objects. IEEE Trans. on Pattern Analysis and Machine Intelligence 21(7), 634–643 (1999) [34] Miyajima, K., Ralescu, A.: Spatial organization in 2-D segmented images: Representation and recognition of primitive spatial relations. Fuzzy Sets and Systems 65(2/3), 225–236 (1994) [35] Ni, J., Matsakis, P.: Force histograms computed in O(NlogN). In: Proc. of the 19th IAPR Int. Conf. on Pattern Recognition, pp. 1–4 (2008) [36] Ni, J., Matsakis, P.: An equivalent definition of the histogram of forces: Theoretical and algorithmic implications. Pattern Recognition 43(4), 1607–1617 (2010) [37] Ni, J., Matsakis, P., Wawrzyniak, L.: Quantitative representation of the relative position between 3D objects. In: Proc. of the 4th IASTED Int. Conf. on Visualization, Imaging, and Image Processing, pp. 402–407. ACTA Press (2004) [38] Ni, J., Veltman, M., Matsakis, P.: Directional force field-based maps: Implementation and application. In: Jiang, X., Petkov, N. (eds.) Computer Analysis of Images and Patterns. LNCS, vol. 5702, pp. 309–317. Springer, Heidelberg (2009) [39] Papadias, D., Theodoridis, Y.: Spatial relations, minimum bounding rectangles, and spatial data structures. Int. J. of Geographical Information Science 11, 111–138 (1997) [40] Randell, D.A., Cui, Z., Cohn, A.G.: A spatial logic based on regions and connection. In: Proc. of the 3rd Int. Conf. on Principles of Knowledge Representation and Reasoning, pp. 165–176 (1992) [41] Shyu, C.R., Klaric, M., Scott, G.J., Barb, A.S., Davis, C.H., Palaniappan, K.: GeoIRIS: Geospatial information retrieval and indexing system–Content mining, semantics modeling, and complex queries. IEEE Trans. on Geoscience and Remote Sensing 45(4), 839–852 (2007) [42] Shyu, C.R., Matsakis, P.: Spatial lesion indexing for medical image databases using force histograms. In: Proc. of the 2001 IEEE Int. Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 603–608 (2001) [43] Sjahputera, O., Keller, J.M.: Scene matching using F-histogram-based features with possibilistic C-means optimization. Fuzzy Sets and Systems 158(3), 253–269 (2007) [44] Skubic, M., Blisard, S., Bailey, C., Adams, J., Matsakis, P.: Qualitative analysis of sketched route maps: Translating a sketch into linguistic descriptions. IEEE Trans. on Systems, Man and Cybernetics Part B 34(2), 1275–1282 (2004) [45] Skubic, M., Matsakis, P., Chronis, G., Keller, J.: Generating multi-level linguistic spatial descriptions from range sensor readings using the histogram of forces. Autonomous Robots 14(1), 51–69 (2003) [46] Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M., Brock, D.: Spatial language for human-robot dialogs. IEEE Trans. on Systems, Man, and Cybernetics Part C 34(2), 154–167 (2004) [47] Sledge, I., Keller, J.: Mapping natural language to imagery: Placing objects intelligently. In: Proc. of the 2009 IEEE Int. Conf. on Fuzzy Systems, pp. 1–6 (2009) [48] Smith, G.B., Bridges, S.M.: Fuzzy Spatial Data Mining. In: Proc. of the 2002 North American Fuzzy Information Processing Society Annual Conf., pp. 184–189 (2002) [49] Tabbone, S., Wendling, L.: Color and grey level object retrieval using a 3D representation of force histogram. Image and Vision Computing 21(6), 483–495 (2003)
74
P. Matsakis, L. Wendling, and J. Ni
[50] Tabbone, S., Wendling, L., Tombre, K.: Matching of graphical symbols in linedrawing images using angular signature information. Int. J. on Document Analysis and Recognition 6(2), 115–125 (2003) [51] Wang, X., Matsakis, P., Trick, L., Nonnecke, B., Veltman, M.: A study on how humans describe relative positions of image objects. In: Ruas, A., Gold, C. (eds.) Headway in spatial data handling Proc. of the 13th Int Symp on Spatial Data Handling, pp. 1–18. Springer, Heidelberg (2008) [52] Wang, X., Ni, J., Matsakis, P.: Fuzzy object localization based on directional (and distance) information. In: Proc. of the 2006 IEEE Int. Conf. on Fuzzy Systems, pp. 256–263 (2006) [53] Wang, Y., Makedon, F., Drysdale, R.L.: Fast algorithms to compute the force histogram (2004) (unpublished) [54] Wawrzyniak, L., Matsakis, P., Nikitenko, D.: Representing topological relationships between complex regions by F-histograms. In: Fisher, P.F. (ed.) Developments in spatial data handling Proc. of the 11th Int. Symp. on Spatial Data Handling, pp. 245–258. Springer, Heidelberg (2004) [55] Wawrzyniak, L., Nikitenko, D., Matsakis, P.: Describing topological relationships in words: Refinements. In: Proc. of the 2005 IEEE Int. Conf. on Fuzzy Systems, pp. 743–748 (2005) [56] Wawrzyniak, L., Nikitenko, D., Matsakis, P.: Speaking with spatial relations. Int. J. of Intelligent Systems Technologies and Applications (special issue on Intelligent Image and Video Processing and Applications: The Role of Uncertainty) 1(3/4), 280–300 (2006) [57] Wendling, L., Tabbone, S., Matsakis, P.: Fast and robust recognition of orbit and sinus drawings using histograms of forces. Pattern Recognition Letters 23(14), 1687–1693 (2002) [58] Wikipedia contributors Space. In: Wikipedia, the free encyclopedia (July 20, 2005), http://en.wikipedia.org/w/index.php?title=Space&oldid=191 94480 (accessed July 20, 2005)
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning Isabelle Bloch
Abstract. Spatial information may be endowed with a bipolarity component. Typical examples concern possible vs forbidden places for an object in space, or “opposite” spatial relations such as “possibly to the right of an object and certainly not to its left”. However, bipolarity has not been much exploited in the spatial domain yet. Moreover, imprecision has often to be taken into account as well, for instance to model vague statements such as “to the right of an object”. In this paper we propose to handle both features in the framework of bipolar fuzzy sets. We introduce some geometrical measures and mathematical morphology operations on bipolar fuzzy sets and illustrate their potential for spatial reasoning on a simple scenario in brain imaging.
1 Introduction In many domains, it is important to be able to deal with bipolar information [34, 36, 37]. Positive information represents what is possible, for instance because it has already been observed or experienced, while negative information represents what is impossible or forbidden, or surely false. The intersection of the positive information and the negative information has to be empty in order to achieve consistency of the representation, and their union does not necessarily cover the whole underlying space, i.e. there is no direct duality between both types of information. This domain has recently motivated work in several directions, for instance for applications in knowledge representation, preference modeling, argumentation, multi-criteria decision analysis, cooperative games, among others [1, 5, 21, 23, 37, 38, 42, 43, 49, 50, 51]. In particular, fuzzy and possibilistic formalisms for Isabelle Bloch T´el´ecom ParisTech, CNRS LTCI, Paris, France e-mail:
[email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 75–102. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
76
I. Bloch
bipolar information have been proposed [4, 5, 34, 36]. Interestingly enough, they are formally linked to intuitionistic fuzzy sets [2], interval-valued fuzzy sets [60] and vague sets, as shown by several authors [22, 33]. However, their respective semantics differ. When dealing with spatial information, in image processing or for spatial reasoning applications, this bipolarity also occurs. For instance, when assessing the position of an object in space, we may have positive information expressed as a set of possible places, and negative information expressed as a set of impossible or forbidden places (for instance because they are occupied by other objects). As another example, let us consider spatial relations. Human beings consider “left” and “right” as opposite relations. But this does not mean that one of them is the negation of the other one. The semantics of “opposite” captures a notion of symmetry (with respect to some axis or plane) rather than a strict complementation. In particular, there may be positions which are considered neither to the right nor to the left of some reference object, thus leaving room for some indetermination [6]. This corresponds to the idea that the union of positive and negative information does not cover all the space. Similar considerations can be provided for other pairs of “opposite” relations, such as “close to” and “far from” for instance. An example is illustrated in Figure 1. It shows an object at some position in the space (the rectangle in this figure). Let us assume that some information about the position of another object is provided: it is to the left of the rectangle and not to the right. The region “to the left of the rectangle” is computed using a fuzzy dilation with a directional fuzzy structuring element providing the semantics of “to the left” [6], thus defining the positive information. The region “to the right of the rectangle” defines the negative information and is computed in a similar way. The membership functions μL and μR represent respectively the positive and negative parts of the bipolar fuzzy set. They are not the complement of each other, and we have: ∀x, μL (x) + μR (x) ≤ 1.
Fig. 1 Region to the left of the rectangle (positive information, μL ) and region to the right of the rectangle (negative information, μR ). The membership degrees vary from 0 (black) to 1 (white).
Another example, for the pair of relations close/far, is illustrated in Figure 2. The reference object is the square in the center of the image. The two fuzzy regions are computed using fuzzy dilations, using structuring elements that provide the semantics of “close” and “far” [7]. Again, the two membership functions μC and μF are
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
77
Fig. 2 Region close to the square (μC ) and region far from the square ( μF )
not the complement of each other and actually define a bipolar fuzzy set, with its positive and negative parts. To our knowledge, bipolarity has not been much exploited in the spatial domain. A few works deal with image thresholding or edge detection, based on intuitionistic fuzzy sets derived from image intensity and entropy or divergence criteria [24, 29, 57]. Spatial representations of interval-valued fuzzy sets have also been proposed in [25], as a kind of fuzzy egg-yolk, for evaluating classification errors based on ground-truth, or in [44, 45] with preliminary extensions of RCC to these representations. But there are still very few tools for manipulating spatial information using both its bipolarity (and not simply some kind of imprecision on the membership values) and imprecision components. The above considerations are the motivation for the present work, which aims at filling this gap by proposing formal models to manage spatial bipolar information. We consider here both objects and spatial relations between objects, as motivated by the previous examples. Additionally, imprecision has to be included, since it is an important feature of spatial information, related either to the objects themselves or to the spatial relations between them. For spatial relations, we consider their spatial representations, as proposed in [8], defining the regions of space where a relation to a reference object is satisfied (to some degree). More specifically, we consider bipolar fuzzy sets in the spatial domain, representing either objects or spatial relations to some reference objects, and propose definitions of some geometrical measures and of mathematical morphology operators (dilation and erosion) on these representations, extending our preliminary work [12, 14]. The choice of mathematical morphology for a first insight into the manipulation of spatial bipolar fuzzy sets is related to its wide use in image and spatial information processing [52, 54], its interest for modeling spatial relations in various formal settings (quantitative, qualitative, or fuzzy) [11], and its strong algebraic basis [39]. In Section 2, we recall some definitions on bipolar fuzzy sets. Then we introduce definitions of some simple geometrical measures on spatial bipolar fuzzy sets, in Section 3. In Section 4, we extend our work on mathematical morphology and detail definitions of erosion and dilation using a bipolar fuzzy structuring element, their properties, and some derived operations. Finally, in Section 5, we suggest some ways to define bipolar fuzzy representations of spatial relations, and we present some examples for spatial reasoning in Section 6.
78
I. Bloch
2 Background Let S be the underlying space (the spatial domain for spatial information processing), that is supposed to be bounded and finite here. A bipolar fuzzy set on S is defined by a pair of functions ( μ , ν ) such that ∀x ∈ S , μ (x) + ν (x) ≤ 1. Note that a bipolar fuzzy set is formally (although not semantically) equivalent to an intuitionistic fuzzy set [2]. It is also equivalent to an interval-valued fuzzy set [60], where the interval at each point x is [μ (x), 1 − ν (x)] [33]. Although there has been a lot of discussion about terminology in this domain recently [3, 33], we use the bipolarity terminology in this paper, for its appropriate semantics, as explained in our motivation. For each point x, μ (x) defines the degree to which x belongs to the bipolar fuzzy set (positive information) and ν (x) the non-membership degree (negative information). This formalism allows representing both bipolarity and fuzziness. Concerning semantics, it should be noted that a bipolar fuzzy set does not necessarily represent one physical object or spatial entity, but rather more complex information, potentially issued from different sources. Let us consider the set L of pairs of numbers (a, b) in [0, 1] such that a + b ≤ 1. This set is a complete lattice, for the partial order defined as [28]: (a1 , b1 ) (a2 , b2 ) iff a1 ≤ a2 and b1 ≥ b2 .
(1)
The greatest element is (1, 0) and the smallest element is (0, 1). The supremum and infimum are respectively defined as: (a1 , b1 ) ∨ (a2 , b2 ) = (max(a1 , a2 ), min(b1 , b2 )),
(2)
(a1 , b1 ) ∧ (a2 , b2 ) = (min(a1 , a2 ), max(b1 , b2 )).
(3)
The partial order induces a partial order on the set of bipolar fuzzy sets: (μ1 , ν1 ) (μ2 , ν2 ) iff ∀x ∈ S , μ1 (x) ≤ μ2 (x) and ν1 (x) ≥ ν2 (x).
(4)
Note that this corresponds to the inclusion on intuitionistic fuzzy sets [2]. Similarly the supremum and the infimum are equivalent to the intuitionistic union and intersection. It follows that, if B denotes the set of bipolar fuzzy sets on S , (B, ) is a complete lattice.
3 Some Basic Geometrical Measures 3.1 Cardinality Let (μ , ν ) ∈ B be a bipolar fuzzy set defined in the spatial domain S . The cardinality of intuitionistic or interval valued fuzzy sets has been introduced e.g. in [56] as an interval: [∑x∈S μ (x), ∑x∈S (1 − ν (x))], with the lower bound representing the classical cardinality of the fuzzy set defining the positive part (the least certain
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
79
cardinality), and the upper bound the cardinality of the complement of the negative part (i.e. the whole not impossible region is considered, leading to the largest possible cardinality). The length of the interval reflects the indetermination encoded by the bipolar representation. Several authors have used a similar approach, based on interval representations of the cardinality. When dealing with fuzzy sets, it may be more interesting to consider the cardinality as a fuzzy number, instead as a crisp number, for instance using the extension principle [35]: |μ |(n) = sup{α ∈ [0, 1] | |μα | = n}, where μα denotes α -cuts, defining the degree to which the cardinality of μ is equal to n. Here we propose a similar approach for defining the cardinality of a bipolar fuzzy set as a bipolar fuzzy number, which contrasts with the previously interval-based approaches. Definition 1. Let (μ , ν ) ∈ B. Its cardinality is defined as: ∀n, |(μ , ν )|(n) = (|μ |(n), 1 − |1 − ν |(n)).
(5)
Proposition 1. The cardinality introduced in Definition 1 is a bipolar fuzzy number, i.e. a bipolar fuzzy set defined on N, with ∀n, |μ |(n) + (1 − |1 − ν |(n)) ≤ 1. In the spatial domain, the cardinality can be interpreted as the surface (in 2D) or the volume (in 3D) of the considered bipolar fuzzy set. Let us consider the example of possible/forbidden places for an object, represented by ( μ , ν ). Then the positive part of the cardinality represents how large is the possible set of places, while the negative part is linked to the size of the forbidden regions. An example is shown in Figure 3. For this example, the cardinality computed as an interval would provide [11000, 40000], which approximately corresponds to the 0.5-level of the bipolar fuzzy number. Cardinal as a bipolar fuzzy number 1 ’obj2cc_cardP’ ’obj2cc_cardN’
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
5000 10000 15000 20000 25000 30000 35000 40000 45000 50000
Fig. 3 Bipolar fuzzy set (positive part and negative part) and its cardinality represented as a bipolar fuzzy number (the negative part, in green, is inverted)
3.2 Center of Gravity Here we propose a simple approach to define the center of gravity of a bipolar fuzzy set, that accounts for the indetermination. The underlying idea is that a point should contribute a lot if it belongs strongly to μ (positive part) and weakly to ν
80
I. Bloch
(negative part). Just translating this idea as a weight for each point x ∈ S defined as min(μ (x), 1 − ν (x)) is not interesting since this always provides μ (x) and the indetermination is then not taken into account. Therefore, we define the center of gravity by weighting each point by its membership to the positive part plus a portion of the indetermination. Definition 2. The center of gravity of a bipolar fuzzy set (μ , ν ) ∈ B is defined as: CoG(μ , ν ) =
∑x∈S x(μ (x) + λ π (x)) ∑x∈S (μ (x) + λ π (x))
(6)
with π (x) = 1− μ (x)− ν (x) denotes the indetermination and λ is a weighting factor, with λ ∈ [0, 1]. The parameter λ allows tuning the influence of the indetermination. For λ = 0, Definition 2 leads to the classical center of gravity of a fuzzy set, by considering only the positive part μ . For λ = 1, it leads to the center of gravity of 1 − ν (i.e. everything that is not impossible is included in the computation). Intermediate values of λ realize a gradual compromise between these two extreme solutions. This is illustrated in Figure 4. μ
CoG( μ)
CoG( μ,ν )
1−ν
ν
CoG( 1−ν )
Fig. 4 A spatial bipolar set (crisp in this example) and its center of gravity for λ = 0.5 (corresponding to the center of the dashed circle in this case)
Other moments could be defined in a similar way. It should be noted that this approach is mainly relevant if the bipolar fuzzy set is considered as one spatial entity, which is not always the case, as mentioned in Section 2. In cases where the bipolar fuzzy set represents some more complex information, pertaining to a same situation but potentially representing different pieces of information or knowledge coming from different sources, then the meaning itself of a center of gravity has to be reconsidered.
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
81
4 Mathematical Morphology Mathematical morphology on bipolar fuzzy sets has been first introduced in [12]. Once we have a complete lattice, as described in Section 2, it is easy to define algebraic dilations and erosions on this lattice, as operators that commute with the supremum and the infimum, respectively:
δ ((μ , ν ) ∨ (μ , ν )) = δ ((μ , ν )) ∨ δ ((μ , ν )),
(7)
ε ((μ , ν ) ∧ (μ , ν )) = ε ((μ , ν )) ∧ ε ((μ , ν )),
(8)
and similar expressions for sup and inf taken over any family of bipolar fuzzy sets. Their properties are derived from general properties of lattice operators. If we assume that S is an affine space (or at least a space on which translations can be defined), it is interesting, for dealing with spatial information, to consider morphological operations based on a structuring element, which are hence invariant under translation. A structuring element is a subset of S with fixed shape and size, directly influencing the spatial extent of the morphological transformations. It is generally assumed to be compact, so as to guarantee good properties. In the discrete case considered here, we assume that it is connected, in the sense of a discrete connectivity defined on S . The general principle underlying morphological operators consists in translating the structuring element at every position in space and checking if this translated structuring element satisfies some relation with the original set (inclusion for erosion, intersection for dilation) [52]. This principle has also been used in the main extensions of mathematical morphology to fuzzy sets [20, 30, 31, 46, 47, 53]. We detail the construction of such morphological operators, extending our preliminary work [12, 13], along with some derived operators.
4.1 Erosion As for fuzzy sets [20], defining morphological erosions of bipolar fuzzy sets, using bipolar fuzzy structuring elements, requires to define a degree of inclusion between bipolar fuzzy sets. Such inclusion degrees have been proposed in the context of intuitionistic fuzzy sets [32]. With our notations, a degree of inclusion of a bipolar fuzzy set (μ , ν ) in another bipolar fuzzy set (μ , ν ) is defined as: inf I((μ (x), ν (x)), (μ (x), ν (x)))
x∈S
(9)
where I is an implication operator. Two types of implication are considered [27, 32], among the different classes of implications, one derived from a bipolar t-conorm ⊥1 : 1
A bipolar disjunction is an operator D from L × L into L such that D((1, 0), (1, 0)) = D((0, 1), (1, 0)) = D((1, 0), (0, 1)) = (1, 0), D((0, 1), (0, 1)) = (0, 1) and that is increasing in both arguments. A bipolar t-conorm is a commutative and associative bipolar disjunction such that the smallest element of L is the unit element.
82
I. Bloch
IN ((a1 , b1 ), (a2 , b2 )) = ⊥((b1 , a1 ), (a2 , b2 )),
(10)
and one derived from a residuation principle from a bipolar t-norm 2 : IR ((a1 , b1 ), (a2 , b2 )) = sup{(a3 , b3 ) ∈ L | ((a1 , b1 ), (a3 , b3 )) (a2 , b2 )} (11) where (ai , bi ) ∈ L and (bi , ai ) is the standard negation of (ai , bi ). Two types of t-norms and t-conorms are considered in [32] and will be considered here as well: 1. operators called t-representable t-norms and t-conorms, which can be expressed using usual t-norms t and t-conorms T from the fuzzy sets theory [35]: ((a1 , b1 ), (a2 , b2 )) = (t(a1 , a2 ), T (b1 , b2 )),
(12)
⊥((a1 , b1 ), (a2 , b2 )) = (T (a1 , a2 ),t(b1 , b2 )).
(13)
2. Lukasiewicz operators, which are not t-representable: W ((a1 , b1 ), (a2 , b2 )) = (max(0, a1 + a2 − 1), min(1, b1 + 1 − a2, b2 + 1 − a1)), (14) ⊥W ((a1 , b1 ), (a2 , b2 )) = (min(1, a1 + 1 − b2, a2 + 1 − b1), max(0, b1 + b2 − 1)). (15) In these equations, the positive part of W is the usual Lukasiewicz t-norm of a1 and a2 (i.e. the positive parts of the input bipolar values). The negative part of ⊥W is the usual Lukasiewicz t-norm of the negative parts (b1 and b2 ) of the input values. The two types of implication coincide for the Lukasiewicz operators [28]. Based on these concepts, we can now propose a definition for morphological erosion. Definition 3. Let (μB , νB ) be a bipolar fuzzy structuring element (in B). The erosion of any (μ , ν ) in B by (μB , νB ) is defined from an implication I as: ∀x ∈ S , ε(μB ,νB ) ((μ , ν ))(x) = inf I((μB (y − x), νB (y − x)), (μ (y), ν (y))), y∈S
(16)
where μB (y − x) denotes the value at point y of μB translated at x. A similar approach has been used for intuitionistic fuzzy sets in [48], but with weaker properties (in particular an important property such as the commutativity of erosion with the conjunction may be lost). 2
A bipolar conjunction is an operator C from L × L into L such that C((0, 1), (0, 1)) = C((0, 1), (1, 0)) = C((1, 0), (0, 1)) = (0, 1), C((1, 0), (1, 0)) = (1, 0) and that is increasing in both arguments. A bipolar t-norm is a commutative and associative bipolar conjunction such that the largest element of L is the unit element.
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
83
4.2 Morphological Dilation of Bipolar Fuzzy Sets Dilation can be defined based on a duality principle or based on the adjunction property. Both approaches have been developed in the case of fuzzy sets, and the links between them and the conditions for their equivalence have been proved in [9, 16]. Similarly we consider both approaches to define morphological dilation on B. 4.2.1
Dilation by Duality
The duality principle states that the dilation is equal to the complementation of the erosion, by the same structuring element (if it is symmetrical with respect to the origin of S , otherwise its symmetrical is used), applied to the complementation of the original set. Applying this principle to bipolar fuzzy sets using a complementation c (typically the standard negation c((a, b)) = (b, a)) leads to the following definition of morphological bipolar dilation. Definition 4. Let (μB , νB ) be a bipolar fuzzy structuring element. The dilation of any (μ , ν ) in B by (μB , νB ) is defined from erosion by duality as:
δ(μB ,νB ) ((μ , ν )) = c[ε(μB ,νB ) (c((μ , ν )))]. 4.2.2
(17)
Dilation by Adjunction
Let us now consider the adjunction principle, as in the general algebraic case. An adjunction property can also be expressed between a bipolar t-norm and the corresponding residual implication as follows: ((a1 , b1 ), (a3 , b3 )) (a2 , b2 ) ⇔ (a3 , b3 ) IR ((a1 , b1 ), (a2 , b2 )).
(18)
Definition 5. Using a residual implication for the erosion for a bipolar t-norm , the bipolar fuzzy dilation, adjoint of the erosion, is defined as:
δ(μB ,νB ) ((μ , ν ))(x) = inf{(μ , ν )(x) | (μ , ν )(x) ε(μB ,νB ) ((μ , ν ))(x)} = sup ((μB (x − y), νB (x − y)), (μ (y), ν (y))). (19) y∈S
4.2.3
Links between Both Approaches
It is easy to show that the bipolar Lukasiewicz operators are adjoint, according to Equation 18. It has been shown that the adjoint operators are all derived from the Lukasiewicz operators, using a continuous bijective permutation on [0, 1] [32]. Hence equivalence between both approaches can be achieved only for this class of operators. This result is similar to the one obtained for fuzzy mathematical morphology [9, 16].
84
I. Bloch
4.3 Properties Proposition 2. All definitions are consistent: they actually provide bipolar fuzzy sets of B. Proposition 3. In case the bipolar fuzzy sets are usual fuzzy sets (i.e. ν = 1 − μ and νB = 1 − μB ), the definitions lead to the usual definitions of fuzzy dilations and erosions (using classical Lukasiewicz t-norm and t-conorm for the definitions based on the Lukasiewicz operators). Hence they are also compatible with classical morphology in case μ and μB are crisp. Proposition 4. The proposed definitions of bipolar fuzzy dilations and erosions commute respectively with the supremum and the infinum of the lattice (B, ). Proposition 5. The bipolar fuzzy dilation is extensive (i.e. ( μ , ν ) δ(μB ,νB ) ((μ , ν ))) and the bipolar fuzzy erosion is anti-extensive (i.e. ε(μB ,νB ) ((μ , ν )) (μ , ν )) if and only if (μB , νB )(0) = (1, 0), where 0 is the origin of the space S (i.e. the origin completely belongs to the structuring element, without any indetermination). Note that this condition is equivalent to the conditions on the structuring element found in classical and fuzzy morphology to have extensive dilations and antiextensive erosions [20, 52]. Proposition 6. The dilation satisfies the following iterativity property:
δ(μB ,νB ) (δ(μB ,νB ) ((μ , ν ))) = δδ(μ
B ,νB )
(( μB ,νB )) ((μ , ν )).
(20)
Proposition 7. Conversely, if we want all classical properties of mathematical morphology to hold true, the bipolar conjunctions and disjunctions used to define intersection and inclusion in B have to be bipolar t-norms and t-conorms. If both duality and adjunction are required, then the only choice is bipolar Lukasiewicz operators (up to a continuous permutation on [0, 1]). This result [15] is very important, since it shows that the proposed definitions are the most general ones to have a satisfactory interpretation in terms of mathematical morphology.
4.4 Interpretations Let us first consider the implication defined from a t-representable bipolar t-conorm. Then the erosion is written as:
ε(μB ,νB ) ((μ , ν ))(x) = inf ⊥((νB (y − x), μB (y − x)), (μ (y), ν (y))) y∈S
= ( inf T (νB (y − x), μ (y)), sup t(μB (y − x), ν (y))). (21) y∈S
y∈S
This resulting bipolar fuzzy set has a membership function which is exactly the fuzzy erosion of μ by the fuzzy structuring element 1 − νB , according to the
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
85
original definitions in the fuzzy case [20]. The non-membership function is exactly the dilation of the fuzzy set ν by the fuzzy structuring element μB . Let us now consider the derived dilation, based on the duality principle. Using the standard negation, it is written as:
δ(μB ,νB ) ((μ , ν ))(x) = ( sup t(μB (x − y), μ (y)), inf T (νB (x − y), ν (y))). y∈S
y∈S
(22)
The first term (membership function) is exactly the fuzzy dilation of μ by μB , while the second one (non-membership function) is the fuzzy erosion of ν by 1 − νB , according to the original definitions in the fuzzy case [20]. This observation has a nice interpretation, which fits well with intuition. Let ( μ , ν ) represent a spatial bipolar fuzzy set, where μ is a positive information for the location of an object for instance, and ν a negative information for this location. A bipolar structuring element can represent additional imprecision on the location, or additional possible locations. Dilating (μ , ν ) by this bipolar structuring element amounts to dilate μ by μB , i.e. the positive region is extended by an amount represented by the positive information encoded in the structuring element. On the contrary, the negative information is eroded by the complement of the negative information encoded in the structuring element. This corresponds well to what would be intuitively expected in such situations. A similar interpretation can be provided for the bipolar fuzzy erosion. Let us now consider the implication derived from the Lukasiewicz bipolar operators (Equations 14 and 15). The erosion and the dilation are then expressed as: ∀x ∈ S , ε(μB ,νB ) ((μ , ν ))(x) = inf (min(1, μ (y) + 1 − μB (y − x), νB (y − x) + 1 − ν (y)), max(0, ν (y) + μB (y − x) − 1)) =
y∈S
( inf min(1, μ (y)+1− μB (y−x), νB (y−x)+1− ν (y)), sup max(0, ν (y)+ μB (y−x)−1)), y∈S
y∈S
(23) ∀x ∈ S , δ(μB ,νB ) ((μ , ν ))(x) = ( sup max(0, μ (y)+ μB (x −y)−1), inf min(1, ν (y)+1− μB (x −y), νB (x −y)+1− μ (y)). y∈S
y∈S
(24)
The negative part of the erosion is exactly the fuzzy dilation of ν (negative part of the input bipolar fuzzy set) with the structuring element μB (positive part of the bipolar fuzzy structuring element), using the Lukasiewicz t-norm. Similarly, the positive part of the dilation is the fuzzy dilation of μ (positive part of the input) by μB (positive part of the bipolar fuzzy structuring element), using the Lukasiewicz t-norm. Hence for both operators, the “dilation” part (i.e. negative part for the erosion and positive part for the dilation) has always a direct interpretation and is the same as the one obtained using t-representable operators, for t being the Lukasiewicz t-norm. In the case the structuring element is non bipolar (i.e. ∀x ∈ S , νB (x) = 1− μB (x)), then the “erosion” part has also a direct interpretation: the positive part of the erosion
86
I. Bloch
is the fuzzy erosion of μ by μB for the Lukasiewicz t-conorm; the negative part of the dilation is the erosion of ν by μB for the Lukasiewicz t-conorm. This case is then equivalent to the one where t-representable operators are used with Lukasiewicz tnorm and t-conorm.
4.5 Ilustrative Example Let us now illustrate these morphological operations on the simple example shown in Figure 1. Let us assume that an additional information, given as a bipolar structuring element, allows us to reduce the positive part and to extend the negative part of the bipolar fuzzy region. This can be formally expressed as a bipolar fuzzy erosion, applied to the bipolar fuzzy set (μL , μR ), using this structuring element. It corresponds to situations where the initial bipolar fuzzy set was too “permissive” and provided too large possible regions. Figure 5 illustrates the result in the case of a classical structuring element and in the case of a bipolar one. It can be observed that the region corresponding to the positive information has actually been reduced (via a fuzzy erosion), while the region corresponding to the negative part has been extended (via a fuzzy dilation).
μB = 1 − νB
1 − νB
μB + νB
ε(μB ,νB ) ((μG , μD )): ε(μB ,νB ) ((μG , μD )): ε(μB ,νB ) ((μG , μD )): positive information negative information positive information
Fig. 5 Illustration of a bipolar fuzzy erosion on the example of Figure 1, using Definition 3 with t-representable operators derived from min and max. A first non bipolar structuring element ( μB , νB ) with νB = 1 − μB is used. The results show the reduction of the positive part via an erosion of μL with 1 − νB = μB and an extension of the negative part via a dilation of μL by μB . Next, another structuring element, which is truly bipolar, ( μB , νB ) with μB + νB ≤ 1 is used. The negative part is the same as in the first case (since μB is the same). The positive part undergoes a stronger erosion since 1 − νB ≥ 1 − νB .
An example of bipolar fuzzy dilation is illustrated in Figure 6 for the bipolar fuzzy set close/far of Figure 2. The dilation corresponds to a situation where the structuring element represents by how much the positive part of the information can be expanded (positive part of the structuring element), for instance because new positions become possible, and by how much the negative part of the information should be reduced (negative part of the structuring element), for instance because it was too severe. These operations allow modifying the semantics attached to the
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
87
concepts “close” and “far”: in this example, a larger space around the object is considered being close to the object, and the regions that are considered being far from the object are put further away.
μB = 1 − νB
1 − νB
μB + νB
δ(μB ,νB ) ((μC , μF )): δ(μB ,νB ) ((μC , μF )): δ(μB ,νB ) ((μC , μF )): positive information negative information negative information
Fig. 6 Illustration of a bipolar fuzzy dilation on the example of Figure 2, using Definition 4 with t-representable operators derived from min and max. Results with a non bipolar fuzzy structuring element (μB , νB ) with νB = 1 − μB show the extension of the positive part via a dilation of μC by μB and a reduction of the negative part via an erosion of μF by 1 − νB = μB . Another structuring element (μB , νB ) is used next, which is bipolar: μB + νB ≤ 1. The positive part is the same as in the first case (same μB ). The negative part is more eroded, since 1 − νB ≥ 1 − νB .
positive information μL negative information μR Conjunctive fusion of Disjunctive fusion of positive information negative information
Fig. 7 Fusion of bipolar information on direction (μL , μR ) and on distance δ(μB ,νB ) ((μC , μF )) of Figure 6
When several pieces of information are available, such as information on direction and information on distance, they can be combined using fusion tools, in order to get a spatial region accounting for all available information. This type of approach has been used to guide the recognition of anatomical structures in images, based on medical knowledge expressed as a set of spatial relations between pairs or triplets of structures (e.g. in an ontology), in the fuzzy case [18, 26, 41]. This idea can be extended to the bipolar case. As an example, a result of fusion of directional and distance information is illustrated in Figure 7. The positive information “to the left” of the reference object (and the negative part “to the right”) is combined with the
88
I. Bloch
dilated distance information shown in Figure 6. The positive parts are combined in a conjunctive way (using a min here) and the negative parts in a disjunctive way (as a max here), according to the semantics of the fusion of bipolar information [34]. The meaning of the positive part is then “to the left and close to” and the one of the negative part is “to the right and far from”. This example shows how the search space can be reduced by combining spatial relations to reference objects, expressed as bipolar fuzzy sets. This can be considered as an extension to the bipolar case of attention focusing approaches. Further examples will be given in Section 6.
4.6 Derived Operators Once the two basic morphological operators, erosion and dilation, have been defined on bipolar fuzzy sets, a lot of other operators can be derived in a quite straightforward way. We provide a few examples in this section. 4.6.1
Morphological Gradient
A direct application of erosion and dilation is the morphological gradient, which extracts boundaries of objects by computing the difference between dilation and erosion. Definition 6. Let (μ , ν ) a bipolar fuzzy set. We denote its dilation by a bipolar fuzzy structuring element by (δ + , δ − ) and its erosion by (ε + , ε − ). We define the bipolar fuzzy gradient as: ∇(μ , ν ) = (min(δ + , ε − ), max(δ − , ε + ))
(25)
which is the set difference, expressed as the conjunction between (δ + , δ − ) and the negation (ε − , ε + ) of (ε + , ε − ). Proposition 8. The bipolar fuzzy gradient has the following properties: 1. Definition 6 defines a bipolar fuzzy set. 2. If the dilation and erosion are defined using t-representable bipolar t-norms and t-conorms, we have: ∇(μ , ν ) = (min(δμB (μ ), δμB (ν )), max(ε1−νB (ν ), ε1−νB (μ ))).
(26)
Moreover, if (μ , ν ) is not bipolar (i.e. ν = 1 − μ ), then the positive part of the gradient is equal to min(δμB (μ ), 1 − εμB (μ )), which is exactly the morphological gradient in the fuzzy case. An illustration is displayed in Figure 8. It illustrates both the imprecision (through the fuzziness of the gradient) and the indetermination (through the indetermination between the positive and the negative parts).
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
Positive part
Bipolar fuzzy structuring element Negative part Positive part
89
Negative part
Original set
Dilation
Erosion
Gradient
Fig. 8 Gradient using a fuzzy bipolar structuring element and t-representable operators derived from min and max
Another example is shown in Figure 9. The object is here somewhat more complex, and exhibits two different parts, that can be considered as two connected Positive part
Negative part
Positive part
Negative part
Original set
Dilation
Erosion
Gradient
Fig. 9 Gradient using a fuzzy (non bipolar) structuring element (νB = 1 − μB as in Figure 6) on a more complex object
90
I. Bloch
components to some degree. The positive part of the gradient provides a good account of the boundaries of the union of the two components, which amounts to consider that the region between the two components, which has lower membership degrees, actually belongs to the object. The positive part has the expected interpretation as a granted position and extension of the contours. The negative part shows the level of indetermination in the gradient: the gradient could be larger as well, and it could also include the region between the two components. 4.6.2
Conditional Dilation
Another direct application of the basic operators concerns the notion of conditional dilation (respectively conditional erosion) [52]. These operations are very useful in mathematical morphology in order to constrain an operation to provide a result restricted to some region of space. In the digital case, a conditional dilation can be expressed using the intersection of the usual dilation with an elementary structuring element and the conditioning set. This operation is iterated in order to provide the conditional dilation with a larger structuring element. Iterating this operation until convergence leads to the notion of reconstruction. This operation is very useful in cases we have a marker of some objects, and we want to recover the whole objects marked by this marker, and only these objects. The extension of these types of operations to the bipolar fuzzy case is straightforward: given a bipolar fuzzy marker (μM , μN ), the dilation of (μM , μN ), conditionally to a bipolar fuzzy set (μ , ν ) is simply defined as the conjunction of the dilation of (μM , μN ) and (μ , ν ). It is easy to show that this defines a bipolar fuzzy set. An example is shown in Figure 10, showing that the conditional dilation of the marker is restricted to only one component (the one including the marker) of the original object (only the positive parts are shown). Iterating further this dilation would provide the whole marked component.
Fig. 10 Conditioning set, marker and conditional dilation (only the positive parts are shown)
4.6.3
Opening, Closing, and Derived Operators
Applying a dilation and then an erosion by the same structuring element defines a closing, while applying first an erosion and then a dilation defines an opening. Thanks to the strong underlying algebraic framework (see [12] for details), opening and closing have all required properties: they are idempotent and increasing (hence
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
91
they define morphological filters), opening is anti-extensive and closing is extensive (whatever the choice of the structuring element), if Lukasiewicz operators are used (up to a permutation on [0, 1]) since the adjunction property is required for these properties. The closing of the bipolar fuzzy object shown in Figure 9 is displayed in Figure 11. The small region between the two components in the positive part has been included in this positive part (to some degree) by the closing, which is the expected result. Positive part
Negative part
Fig. 11 Bipolar fuzzy closing using Lukasiewicz operators. The fuzzy bipolar structuring element (μB , νB ) of Figure 6 was used here.
Another example is shown in Figure 12, where some small parts have been introduced in the bipolar fuzzy set. The opening successfully removes these small parts (i.e. small regions with high μ values and small regions with low ν values are removed from the positive part and the negative part, respectively). A typical use of this operation is for situations where the initial bipolar fuzzy set represents possible/forbidden regions for an object. If we have some additional information on the size of the object, so that it is sure that it cannot fit into small parts, then opening can be used to remove possible small places, and to add to the negative part such small regions. Positive part
Negative part
Bipolar fuzzy sets with small regions
Positive part
Negative part
Result of the opening
Fig. 12 Bipolar fuzzy opening using Lukasiewicz operators. Circles indicate small regions that are removed by the opening (see text). The bipolar fuzzy structuring element ( μB , νB ) of Figure 6 was used in this example.
92
I. Bloch
From these new operators, a lot of other ones can be derived, extending the classical ones to the bipolar case. For instance, several filters can be deduced from opening and closing, such as alternate sequential filters [52], by applying alternatively opening and closing, with structuring elements of increasing size. Another example is the top-hat transform [52], which allows extracting bright structures having a given approximative shape, using the difference between the original image and the result of an opening using this shape as a structuring element. Such operators can be directly extended to the bipolar case using the proposed framework.
4.7 Distance from a Point to a Bipolar Fuzzy Set While there is a lot of work on distances and similarity between interval-valued fuzzy sets or between intuitionistic fuzzy sets (see e.g. [55, 57]), none of the existing definitions addresses the question of the distance from a point to a bipolar fuzzy set, nor includes the spatial distance in the proposed definitions. As in the fuzzy case [7], we propose to define the distance from a point to a bipolar fuzzy set using a morphological approach [17]. In the crisp case, the distance from a point x to a set X is equal to n iff x belongs to the dilation of size n of X (the dilation of size 0 being the identity), but not to dilations of smaller size (it is sufficient to test this condition for n − 1 in the discrete case). The transposition of this property to the bipolar fuzzy case leads to the following definition, using bipolar fuzzy dilations [17]. Definition 7. The distance from a point x of S to a bipolar fuzzy set (μ , ν ) (∈ B) is defined as: d(x, (μ , ν ))(0) = (μ (x), ν (x)) and ∀n ∈ N∗ , d(x, (μ , ν ))(n) = δ(nμB ,νB ) (x) ∧ c(δ(n−1 μB ,νB ) (x)), where c is a complementation (typically the standard negation c(a, b) = (b, a) is used) and δ(nμB ,νB ) denotes n iterations of the dilation, using the bipolar fuzzy set (μB , νB ) as structuring element. In order to clarify the meaning of this definition, let us consider the case where the structuring element is not bipolar, i.e. νB = 1 − μB . Then the dilation is expressed as: δ(μB ,1−μB ) (μ , ν ) = (δμB (μ ), εμB (ν )), where δμB (μ ) is the fuzzy dilation of μ by μB and εμB (ν ) is the fuzzy erosion of ν by μB . The bipolar degree to which the distance from x to (μ , ν ) is equal to n is then written as: d(x, (μ , ν ))(n) = (δμnB (μ ) ∧ εμn−1 (ν ), εμnB (ν ) ∨ δμn−1 (μ )), i.e. the positive part is the conjunction of the positive B B part of the dilation of size n (i.e. a dilation of the positive part of the bipolar fuzzy object) and the negative part of the dilation of size n − 1 (i.e. an erosion of the negative part of the bipolar fuzzy object), and the negative part is the disjunction of the negative part of the dilation of size n (erosion of ν ) and the positive part of the dilation of size n − 1 (dilation of μ ). Proposition 9. The distance introduced in Definition 7 has the following properties: (i) it is a bipolar fuzzy set on N; (ii) it reduces to the distance from a point to a fuzzy set, as defined in [7], if (μ , ν ) and (μB , νB ) are not bipolar (hence the consistency with the classical definition of the distance from a point to a set is achieved as well); (iii) the distance is strictly equal to 0 (i.e. d(x, (μ , ν ))(0) = (1, 0) and
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
93
∀n = 0, d(x, (μ , ν ))(n) = (0, 1)) iff μ (x) = 1 and ν (x) = 0, i.e. x completely belongs to the bipolar fuzzy set. An example is shown in Figure 13. The results are in agreement with what would be intuitively expected. The positive part of the bipolar fuzzy number is put towards higher values of distances when the point is moved to the right of the object. After a number n of dilations, the point completely belongs to the dilated object, and the value to which the distance is equal to n , with n > n, becomes (0, 1). Note that the indetermination in the membership or non-membership to the object (which is truly bipolar in this example) is also reflected in the distances.
Bipolar fuzzy object: positive part negative part Test points in red (numbered 1 to 5 from left to right) Distance from a point to a bipolar fuzzy set
Distance from a point to a bipolar fuzzy set
1
Distance from a point to a bipolar fuzzy set
1
1
’distP32’ ’distN32’
’distP43’ ’distN43’
’distP48’ ’distN48’
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0
0 0
10
20
30
40
50
60
0 0
10
point 1
20
30
40
50
60
point 2
Distance from a point to a bipolar fuzzy set
0
10
20
30
40
50
60
point 3
Distance from a point to a bipolar fuzzy set
1
1 ’distP52’ ’distN52’
’distP59’ ’distN59’
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 0
10
20
30
point 4
40
50
60
0
10
20
30
40
50
60
point 5
Fig. 13 A bipolar fuzzy set and the distances from 5 different points to it, represented as bipolar fuzzy numbers (positive part in red and negative part in green)
These distances can be easily compared using the extension principle [40, 58], providing a bipolar degree d≤ to which a distance is less than another one. For the examples in Figure 13, we obtain for instance : d≤ [d(x1 , (μ , ν )) ≤ d(x2 , (μ , ν ))] = [0.69, 0.20] where xi denotes the ith point from left to right in the figure. In this case, since x1 completely belongs to (μ , ν ), the degree to which its distance is less than the distance from x2 to (μ , ν ) is equal to [supa d + (a), infa d − (a)], where d + and
94
I. Bloch
d − denote the positive and negative parts of d(x2 , (μ , ν )). As another example, we have d≤ [d(x5 , (μ , ν )) ≤ d(x2 , (μ , ν ))] = [0.03, 0.85], reflecting that x5 is clearly not closer to the bipolar fuzzy set (μ , ν ) than x2 .
5 Definition of Bipolar Fuzzy Spatial Relations As mentioned in the introduction, several spatial relations go by pairs of “opposite” relations, such as left/right, above/below, close/far, etc. This is one of the motivations for handling them as bipolar information. Now the question that remains open in the previous sections is: how to define the bipolar fuzzy sets representing these relations in the spatial domain with respect to an object of reference? (i.e. how to construct the representations shown in Figure 1 for instance). Here we assume that the reference object can be crisp or fuzzy, but not bipolar. Our proposal is to rely on our previous work for defining fuzzy spatial representations of spatial relations using dilations with fuzzy structuring elements providing the semantics of the relations [6, 7, 10]. For instance the region to the right of a reference object is defined as the dilation of the reference object with a specific structuring element (see Figure 1 where μR and μL have been defined using this approach). Let (μ , ν ) be a pair of fuzzy sets in S representing a pair of “opposite” relations with respect to a reference object. The main problem to be solved is to guarantee that ∀x ∈ S , μ (x) + ν (x) ≤ 1. This property may not hold depending on the shape of the reference object: for instance a concavity of an object can be both to the right and to the left of the object, leading to conflicting areas. In cases the property does not hold directly, we propose three approaches, inspired from [34]: 1. indulgent approach: the positive part is kept unchanged and the negative part is reduced so as to achieve the bipolar constraint. For instance ν can be modified as ν (x) = min(ν (x), 1 − μ (x)), and (μ , ν ) is then bipolar. Note that only the points for which the property is not satisfied are modified. This approach corresponds for instance to cases where the negative part can be interpreted as rules that can be modified in order to achieve consistency with observations. 2. severe approach: the negative part is kept unchanged and the positive part is modified, e.g. μ (x) = min(μ (x, 1 − ν (x)), so as to have (μ , ν ) bipolar. This means that the negative part is privileged in the conflicting areas. 3. tunable compromise: both parts are modified in the conflicting areas, e.g. as:
μ (x) = μ (x) − λ (μ (x) + ν (x) − 1) ν (x) = ν (x) − (1 − λ )(μ (x) + ν (x) − 1) with λ ∈ [0, 1]. Points x for which μ (x) + ν (x) ≤ 1 are not modified. This leads to μ (x) + ν (x) = 1 for the modified points, i.e. the conflict has been replaced by a duality constraint. The first approach is included in this one by taking λ = 0 and the second one for λ = 1.
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
95
6 Application to Spatial Reasoning Mathematical morphology provides tools for spatial reasoning at several levels [19]. Its features allow representing objects or object properties, that we do not address here to concentrate rather on tools for representing spatial relations. The notion of structuring element captures the local spatial context, in a fuzzy and bipolar way here, which endows dilation and erosion with a low level spatial reasoning feature, as shown in the interpretation part (Section 4.4). This is then reinforced by the derived operators (opening, closing, gradient, conditional operations...), as introduced for bipolar fuzzy sets in Section 4.6. At a more global level, several spatial relations between spatial entities can be expressed as morphological operations, in particular using dilations [10, 19], leading to large scale spatial reasoning, based for instance on distances [17]. In this section, we illustrate a typical scenario showing the interest of bipolar representations of spatial relations and of morphological operations on these representations for spatial reasoning[15]. Note that this is not a complete application yet, but should only be considered as an illustrative example. An example of a brain image is shown in Figure 14, with a few labeled structures of interest.
LCN RCN
LPU
RPU
tumor
RTH
LTH
RLV
LLV
Fig. 14 A slice of a 3D MRI brain image, with a few structures: left and right lateral ventricles (LLV and RLV), caudate nuclei (LCN and RCN), putamen (LPU and RPU) and thalamus (LTH and RTH). A ring-shaped tumor is present in the left hemisphere (the usual “left is right” convention is adopted for the visualization).
Let us first consider the right hemisphere (i.e. the non-pathological one). We consider the problem of defining a region of interest for the RPU, based on a known segmentation of RLV and RTH. An anatomical knowledge base or ontology provides some information about the relative position of these structures [41, 59]: • directional information: the RPU is exterior (left on the image) of the union of RLV and RTH (positive information) and cannot be interior (negative information);
96
I. Bloch
• distance information: the RPU is quite close to the union of RLV and RTH (positive information) and cannot be very far (negative information).
Fig. 15 Fuzzy structuring elements νL , νR , νC and νF , defining the semantics of left, right, close and far, respectively
These pieces of information are represented in the image space based on morphological dilations using appropriate structuring elements [10] (representing the semantics of the relations, as displayed in Figure 15) and are illustrated in Figure 16. A bipolar fuzzy set modeling the direction information is defined as: (μdir , νdir ) = (δνL (RLV ∪ RTH), δνR (RLV ∪ RTH)), where νL and νR define the semantics of left and right, respectively. Similarly a bipolar fuzzy set modeling the distance information is defined as: (μdist , νdist ) = (δνC (RLV ∪ RTH), 1 − δ1−νF (RLV ∪ RTH)),
Fig. 16 Bipolar fuzzy representations of spatial relations with respect to RLV and RTH. Top: positive information, bottom: negative information. From left to right: directional relation, distance relation, conjunctive fusion. The contours of the RPU are displayed to show the position of this structure with respect to the region of interest.
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
97
where νC and νF define the semantics of close and far, respectively. The neutral area between positive and negative information allows accounting for potential anatomical variability. The conjunctive fusion of the two types of relations is computed as a conjunction of the positive parts and a disjunction of the negative parts: (μFusion , νF usion ) = (min(μdir , μdist ), max(νdir , νdist )). As shown in the illustrated example, the RPU is well included in the bipolar fuzzy region of interest which is obtained using this procedure. This region can then be efficiently used to drive a segmentation and recognition technique of the RPU. Let us now consider the left hemisphere, where a ring-shaped tumor is present. The tumor induces a deformation effect which strongly changes the shape of the normal structures, but also their spatial relations, to a less extent. In particular the LPU is pushed away from the inter-hemispheric plane, and the LTH is pushed towards the posterior part of the brain and compressed. Applying the same procedure as for the right hemisphere does not lead to very satisfactory results in this case (see Figure 18). The default relations are here too strict and the resulting region of interest is not adequate: the LPU only satisfies with low degrees the positive part of the information, while it also slightly overlaps the negative part. In such cases, some relations (in particular metric ones) should be considered with care. This means that they should be more permissive, so as to include a larger area in the possible region, accounting for the deformation induced by the tumor. This can be easily modeled by a bipolar fuzzy dilation of the region of interest with a structuring element (μvar , νvar ) (Figure 17), as shown in the last column of Figure 18: (μdist , νdist ) = δ(μvar ,νvar ) (μdist , νdist ),
where (μdist , νdist ) is defined as for the other hemisphere. Now the obtained region is larger but includes the correct area. This bipolar dilation amounts to dilate the positive part and to erode the negative part, as explained in Section 4.4.
Fig. 17 Bipolar fuzzy structuring element (μvar , νvar )
Let us finally consider another example, where we want to use symmetry information to derive a search region for a structure in one hemisphere, based on the segmentation obtained in the other hemisphere. As an illustrative example, we consider the thalamus, and assume that it has been segmented in the non pathological hemisphere (right). Its symmetrical position with respect to the inter-hemispheric plane should provide an adequate search region for the LTH in normal cases. Here this is not the case, because of the deformation induced by the tumor (see Figure 19).
98
I. Bloch
Fig. 18 Bipolar fuzzy representations of spatial relations with respect to LLV and LTH. From left to right: directional relation, distance relation, conjunctive fusion, Bipolar fuzzy dilation. First line: positive parts, second line: negative parts. The contours of the LPU are displayed to show the position of this structure.
Since the brain symmetry is approximate, a small deviation could be expected, but not as large as the one observed here. Here again a bipolar dilation allows defining a proper region, by taking into account both the deformation induced by the tumor and the imprecision in the symmetry.
Fig. 19 RTH and its symmetrical, bipolar dilation defining an appropriate search region for the LTH (left: positive part, right: negative part)
7 Conclusion New concepts on bipolar fuzzy sets are introduced in this paper, in particular geometrical measures, morphological dilations and erosions and derived operators, for which good properties are exhibited and nice interpretations in terms of bipolarity
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
99
in spatial reasoning can be derived. Further work aims at exploiting these new operations in concrete problems of spatial reasoning, as illustrated in the last part of this paper, in particular for handling the bipolarity nature of some spatial relations. This will require to design a method for evaluating the degree of satisfaction of a bipolar fuzzy relation. Also relations with respect to a bipolar fuzzy set would be an interesting extension of this work.
References 1. Amgoud, L., Cayrol, C., Lagasquie-Schiez, M.C., Livet, P.: On bipolarity in argumentation frameworks. International Journal of Intelligent Systems 23(10), 1062–1093 (2008) 2. Atanassov, K.T.: Intuitionistic Fuzzy Sets. Fuzzy Sets and Systems 20, 87–96 (1986) 3. Atanassov, K.T.: Answer to D. Dubois, S. Gottwald, P. Hajek, J. Kacprzyk and H. Prade’s Paper Terminology Difficulties in Fuzzy Set Theory – The Case of “Intuitionistic Fuzzy Sets” Fuzzy Sets and Systems 156, 496–499 (2005) 4. Benferhat, S., Dubois, D., Kaci, S., Prade, H.: Bipolar Possibility Theory in Preference Modeling: Representation, Fusion and Optimal Solutions. Information Fusion 7, 135–150 (2006) 5. Benferhat, S., Dubois, D., Prade, H.: Modeling positive and negative information in possibility theory. International Journal of Intelligent Systems 23(10), 1094–1118 (2008) 6. Bloch, I.: Fuzzy Relative Position between Objects in Image Processing: a Morphological Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(7), 657–664 (1999) 7. Bloch, I.: On Fuzzy Distances and their Use in Image Processing under Imprecision. Pattern Recognition 32(11), 1873–1895 (1999) 8. Bloch, I.: Spatial Representation of Spatial Relationships Knowledge. In: Cohn, A.G., Giunchiglia, F., Selman, B. (eds.) 7th International Conference on Principles of Knowledge Representation and Reasoning KR 2000, Breckenridge, CO, pp. 247–258. Morgan Kaufmann, San Francisco (2000) 9. Bloch, I.: Duality vs Adjunction and General Form for Fuzzy Mathematical Morphology. In: Bloch, I., Petrosino, A., Tettamanzi, A.G.B. (eds.) WILF 2005. LNCS (LNAI), vol. 3849, pp. 354–361. Springer, Heidelberg (2006) 10. Bloch, I.: Fuzzy Spatial Relationships for Image Processing and Interpretation: A Review. Image and Vision Computing 23(2), 89–110 (2005) 11. Bloch, I.: Spatial Reasoning under Imprecision using Fuzzy Set Theory, Formal Logics and Mathematical Morphology. International Journal of Approximate Reasoning 41, 77–95 (2006) 12. Bloch, I.: Dilation and Erosion of Spatial Bipolar Fuzzy Sets. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS (LNAI), vol. 4578, pp. 385–393. Springer, Heidelberg (2007) 13. Bloch, I.: An Extension of Skeleton by Influence Zones and Morphological Interpolation to Fuzzy Sets. In: International Symposium on Mathematical Morphology (ISMM 2007), Rio de Janeiro, Brazil, October 2007, pp. 3–14 (2007) 14. Bloch, I.: A Contribution to the Representation and Manipulation of Fuzzy Bipolar Spatial Information: Geometry and Morphology. In: Workshop on Soft Methods in Statistical and Fuzzy Spatial Information, Toulouse, France, September 2008, pp. 7–25 (2008) 15. Bloch, I.: Bipolar Fuzzy Mathematical Morphology for Spatial Reasoning. In: International Symposium on Mathematical Morphology ISMM 2009, Groningen, The Netherlands, vol. 5720, pp. 24–34 (August 2009)
100
I. Bloch
16. Bloch, I.: Duality vs. Adjunction for Fuzzy Mathematical Morphology and General Form of Fuzzy Erosions and Dilations. Fuzzy Sets and Systems 160, 1858–1867 (2009) 17. Bloch, I.: Geometry of Spatial Bipolar Fuzzy Sets based on Bipolar Fuzzy Numbers and Mathematical Morphology. In: Di Ges`u, V., Pal, S.K., Petrosino, A. (eds.) Fuzzy Logic and Applications. LNCS (LNAI), vol. 5571, pp. 237–245. Springer, Heidelberg (2009) 18. Bloch, I., G´eraud, T., Maˆıtre, H.: Representation and Fusion of Heterogeneous Fuzzy Information in the 3D Space for Model-Based Structural Recognition - Application to 3D Brain Imaging. Artificial Intelligence 148, 141–175 (2003) 19. Bloch, I., Heijmans, H., Ronse, C.: Mathematical Morphology. In: Aiello, M., PrattHartman, I., van Benthem, J. (eds.) Handbook of Spatial Logics, ch. 13, pp. 857–947. Springer, Heidelberg (2007) 20. Bloch, I., Maˆıtre, H.: Fuzzy Mathematical Morphologies: A Comparative Study. Pattern Recognition 28(9), 1341–1387 (1995) 21. Bonnefon, J.F.: Two routes for bipolar information processing, and a blind spot in between. International Journal of Intelligent Systems 23(9), 923–929 (2008) 22. Bustince, H., Burillo, P.: Vague Sets are Intuitionistic Fuzzy Sets. Fuzzy Sets and Systems 79, 403–405 (1996) 23. Caferra, R., Peltier, N.: Accepting/rejecting propositions from accepted/rejected propositions: A unifying overview. International Journal of Intelligent Systems 23(10), 999– 1020 (2008) 24. Chaira, T., Ray, A.K.: A New Measure using Intuitionistic Fuzzy Set Theory and its Application to Edge Detection. Applied Soft Computing Journal 8(2), 919–927 (2008) 25. Charlier, N., De Tr´e, G., Gautama, S., Bellens, R.: A Twofold Fuzzy Region Model for Imprecise Quality Control of Geographic Information. In: Gervasi, O., Murgante, B., Lagan`a, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds.) ICCSA 2008, Part I. LNCS, vol. 5072, pp. 647–662. Springer, Heidelberg (2008) 26. Colliot, O., Camara, O., Bloch, I.: Integration of Fuzzy Spatial Relations in Deformable Models - Application to Brain MRI Segmentation. Pattern Recognition 39, 1401–1414 (2006) 27. Cornelis, C., Deschrijver, G., Kerre, E.: Implication in Intuitionistic Fuzzy and IntervalValued Fuzzy Set Theory: Construction, Classification, Application. International Journal of Approximate Reasoning 35, 55–95 (2004) 28. Cornelis, C., Kerre, E.: Inclusion Measures in Intuitionistic Fuzzy Sets. In: Nielsen, T.D., Zhang, N.L. (eds.) ECSQARU 2003. LNCS (LNAI), vol. 2711, pp. 345–356. Springer, Heidelberg (2003) 29. Couto, P., Bustince, H., Melo-Pinto, P., Pagola, M., Barrenechea, E.: Image Segmentation using A-IFSs. In: IPMU 2008, Malaga, Spain, pp. 1620–1627 (2008) 30. De Baets, B.: Generalized Idempotence in Fuzzy Mathematical Morphology. In: Kerre, E., Nachtegael, M. (eds.) Fuzzy Techniques in Image Processing. Studies in Fuzziness and Soft Computing, vol. 52, pp. 58–75. Physica Verlag, Springer (2000) 31. Deng, T.-Q., Heijmans, H.: Grey-Scale Morphology Based on Fuzzy Logic. Journal of Mathematical Imaging and Vision 16, 155–171 (2002) 32. Deschrijver, G., Cornelis, C., Kerre, E.: On the Representation of Intuitionistic Fuzzy t-Norms and t-Conorms. IEEE Transactions on Fuzzy Systems 12(1), 45–61 (2004) 33. Dubois, D., Gottwald, S., Hajek, P., Kacprzyk, J., Prade, H.: Terminology Difficulties in Fuzzy Set Theory – The Case of “Intuitionistic Fuzzy Sets”. Fuzzy Sets and Systems 156, 485–491 (2005) 34. Dubois, D., Kaci, S., Prade, H.: Bipolarity in Reasoning and Decision, an Introduction. In: International Conference on Information Processing and Management of Uncertainty, IPMU 2004, Perugia, Italy, pp. 959–966 (2004)
Bipolar Fuzzy Spatial Information: Geometry, Morphology, Spatial Reasoning
101
35. Dubois, D., Prade, H.: Fuzzy Sets and Systems: Theory and Applications. Academic Press, New-York (1980) 36. Dubois, D., Prade, H.: A Bipolar Possibilistic Representation of Knowledge and Preferences and Its Applications. In: Bloch, I., Petrosino, A., Tettamanzi, A.G.B. (eds.) WILF 2005. LNCS (LNAI), vol. 3849, pp. 1–10. Springer, Heidelberg (2006) 37. Dubois, D., Prade, H.: An introduction to bipolar representations of information and preference. International Journal of Intelligent Systems 23(8), 865–866 (2008) 38. Grabisch, M., Greco, S., Pirlot, M.: Bipolar and bivariate models in multicriteria decision analysis: Descriptive and constructive approaches. International Journal of Intelligent Systems 23(9), 930–969 (2008) 39. Heijmans, H.J.A.M., Ronse, C.: The Algebraic Basis of Mathematical Morphology – Part I: Dilations and Erosions. Computer Vision, Graphics and Image Processing 50, 245–295 (1990) 40. Hong, D.H., Lee, S.: Some Algebraic Properties and a Distance Measure for IntervalValued Fuzzy Numbers. Information Sciences 148(1-4), 1–10 (2002) 41. Hudelot, C., Atif, J., Bloch, I.: Fuzzy Spatial Relation Ontology for Image Interpretation. Fuzzy Sets and Systems 159, 1929–1951 (2008) 42. Kaci, S.: Logical formalisms for representing bipolar preferences. International Journal of Intelligent Systems 23(8), 985–997 (2008) 43. Konieczny, S., Marquis, P., Besnard, P.: Bipolarity in bilattice logics. International Journal of Intelligent Systems 23(10), 1046–1061 (2008) 44. Malek, M.R.: Spatial Object Modeling in Intuitionistic Fuzzy Topological Spaces. In: Tsumoto, S., Słowi´nski, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 427–434. Springer, Heidelberg (2004) 45. Malek, M.R.: Intuitionistic Fuzzy Spatial Relationships in Mobile GIS Environment. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS (LNAI), vol. 4578, pp. 313–320. Springer, Heidelberg (2007) 46. Maragos, P.: Lattice Image Processing: A Unification of Morphological and Fuzzy Algebraic Systems. Journal of Mathematical Imaging and Vision 22, 333–353 (2005) 47. Nachtegael, M., Kerre, E.E.: Classical and Fuzzy Approaches towards Mathematical Morphology. In: Kerre, E.E., Nachtegael, M. (eds.) Fuzzy Techniques in Image Processing, Studies in Fuzziness and Soft Computing, ch. 1, pp. 3–57. Physica-Verlag, Springer (2000) 48. Nachtegael, M., Sussner, P., M´elange, T., Kerre, E.: Some Aspects of Interval-Valued and Intuitionistic Fuzzy Mathematical Morphology. In: IPCV 2008 (2008) 49. Da Silva Neves, R., Livet, P.: Bipolarity in human reasoning and affective decision making. International Journal of Intelligent Systems 23(8), 898–922 (2008) ¨ urk, M., Tsoukias, A.: Bipolar preference modeling and aggregation in decision sup50. Ozt¨ port. International Journal of Intelligent Systems 23(9), 970–984 (2008) 51. Raufaste, E., Vautier, S.: An evolutionist approach to information bipolarity: Representations and affects in human cognition. International Journal of Intelligent Systems 23(8), 878–897 (2008) 52. Serra, J.: Image Analysis and Mathematical Morphology. Academic Press, London (1982) 53. Sinha, D., Dougherty, E.R.: Fuzzification of Set Inclusion: Theory and Applications. Fuzzy Sets and Systems 55, 15–42 (1993) 54. Soille, P.: Morphological Image Analysis. Springer, Berlin (1999) 55. Szmidt, E., Kacprzyk, J.: Distances between Intuitionistic Fuzzy Sets. Fuzzy Sets and Systems 114(3), 505–518 (2000)
102
I. Bloch
56. Szmidt, E., Kacprzyk, J.: Entropy for Intuitionistic Fuzzy Sets. Fuzzy Sets and Systems 118(3), 467–477 (2001) 57. Vlachos, I.K., Sergiadis, G.D.: Intuitionistic Fuzzy Information – Applications to Pattern Recognition. Pattern Recognition Letters 28(2), 197–206 (2007) 58. Wang, G., Li, X.: The Applications of Interval-Valued Fuzzy Numbers and IntervalDistribution Numbers. Fuzzy Sets and Systems 98(3), 331–335 (1998) 59. Waxman, S.G.: Correlative Neuroanatomy, 24th edn. McGraw-Hill, New York (2000) 60. Zadeh, L.A.: The Concept of a Linguistic Variable and its Application to Approximate Reasoning. Information Sciences 8, 199–249 (1975)
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data Theresa Beaubouef and Frederick E. Petry1
Abstract. The management of uncertainty in databases is necessary for real world applications, especially for systems involving spatial data such as geographic information systems. Rough and fuzzy sets are important techniques that can be used in various ways for modeling uncertainty in data and in spatial relationships between data entities. This chapter discusses various approaches involving rough and fuzzy sets for spatial database applications such as GIS. Keywords: Spatial Interpolation, Triangulated Irregular Networks, Spatial Data Mining, Minimum Bounding Rectangles, Rough Sets, Topological Spatial Relations.
1 Introduction A spatial database is a collection of data concerning objects located in some reference space, which attempts to model some enterprise in the real world. The real world abounds in uncertainty, and any attempt to model aspects of the world should include some mechanism for incorporating uncertainty. There may be uncertainty in the understanding of the enterprise or in the quality or meaning of the data. There may be uncertainty in the model, which leads to uncertainty in entities or the attributes describing them. And at a higher level, there may be uncertainty about the level of uncertainty prevalent in the various aspects of the database. There has been a strong demand to provide approaches that deal with inaccuracy and uncertainty in geographical information systems (GIS) and their underlying Theresa Beaubouef Department of Computer Science and Industrial Technology Southeastern Louisiana University Hammond, LA 70402 Frederick E. Petry Mapping Charting and Geodesy Branch Naval Research Laboratory Stennis Space Center, MS 39529 R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 103–129. springerlink.com © Springer-Verlag Berlin Heidelberg 2010
104
T. Beaubouef and F.E. Petry
spatial databases. The issue of spatial database accuracy has been viewed as critical to the successful implementation and long-term viability of GIS technology [63]. There are a variety of aspects of potential errors in GIS encompassed by the general term "accuracy." However, here we are only interested in those aspects that lend themselves to modeling by fuzzy and rough set techniques. Many operations are applied to spatial data under the assumption that features, attributes and their relationships have been specified a priori in a precise and exact manner. However, inexactness often exists in the positions of features and the assignment of attribute values and may be introduced at various stages of data compilation and database development. Models of uncertainty have been proposed for spatial information that incorporate ideas from natural language processing, the value of information concept, non-monotonic logic and fuzzy sets, and evidential and probability theory [51]. In modern GIS there is a need to more precisely model and represent the underlying uncertain spatial data. Models have been proposed recently allowing enriching database models to manage uncertain spatial data. A major motivation for this is that there exist geographic objects with uncertain boundaries, and fuzzy sets are a natural way to represent this uncertainty [11]. An ontology for spatial data has been developed in which the terms imperfection, error, imprecision and vagueness are organized into a hierarchy to assist in management of these issues [19]. At the most basic level of vagueness modeling approaches for spatial data are considered including fuzzy set and rough set theory. The following section discusses uncertainty and how rough set uncertainty can be managed in databases, as well as the rough set modeling of spatial data. Section 3 provides an overview of various types of representations of spatial phenomena using fuzzy and rough set techniques. The representation of spatial relationships is discussed in Section 4, along with the management of uncertainty in these relationships. In Section 5 data mining for uncertain data is discussed. Lastly, conclusions and directions for future research are presented.
2 Background In this section we discuss some of the approaches to modeling uncertainty in spatial data using fuzzy and rough set theory. Then we provide a brief introduction to the basic concepts and terminology of fuzzy set and rough set theory.
2.1 Overview In general, the idea of implementing fuzzy set theory as a way to model uncertainty in spatial databases has a long history. Some early work by geographical scientists in the 1970s utilized fuzzy sets [61] in topics such as behavioral geography and geographical decision making [23]. However, the first consistent approach to the use of fuzzy set theory as it could be applied in GIS was developed by Robinson [39]. He has considered several models as appropriate for this situation—two early fuzzy database approaches using simple membership values in relations, and a similarity-based approach. In modeling a situation in which both the
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
105
data and relationships are imprecise, he assesses that this situation entails imprecision intrinsic to natural language which is possibilistic in nature. For example if we are classifying various slopes in a particular region and wish to use a fuzzy set representation of steep slopes then we might have the start of steepness as a = 15 degrees and b= 30 degrees for slopes that are certainly classified as steep, i.e., have membership value of 1. Another application is in soil classification as a certain soil sample may have 0.49 membership in the set of Loamy Soil, it may have 0.33 membership in Sandy Soil, and it may have 0.18 membership in Rocky Soil. Another spatial modeling approach considers some objects as comprising a core (full membership of 1.0 in the set in question), or a boundary (the area beyond which they have no or negligible membership in the set). A classic spatial example of the core and boundary problem is determining where a forest begins. Is it determined based on a hard threshold of trees per hectare? This may be the boundary set by management policy, but it is likely not the natural definition. There are several ways to manage these uncertain boundaries [22]. If a spatial database can represent the outlying trees as being partial members of the forest, then a decision maker will see these features as being partial members if the database is queried or the data presented on a graphical user interface. More recently, there have been a number of efforts utilizing fuzzy sets for spatial databases including: capturing spatial relationships [12], querying spatial information [55], and object-oriented modeling [14]. Models have been proposed in recent years that allow for enriching database models to manage uncertain spatial data [35]. A major motivation for this is that there exist geographic objects with uncertain boundaries, and fuzzy sets are a natural way to represent this uncertainty. A description of spatial data using rough sets was proposed in the ROSE system [41], which focused on a formal modeling framework for realm-based spatial data types in general. In [58] Worboys models imprecision in spatial data based on the resolution at which the data is represented, and for issues related to the integration of such data. This approach relies on the issue of indiscernability – a core concept for rough sets – but does not carry over the entire framework and is just described as “reminiscent of the theory of rough sets” [59]. Ahlqvist and colleagues [2] used a rough set approach to define a rough classification of spatial data and to represent spatial locations. They also proposed a measure for quality of a rough classification compared to a crisp classification and evaluated their technique on actual data from vegetation map layers. They considered the combination of fuzzy and rough set approaches for reclassification as required by the integration of geographic data. Another research group in a mapping and GIS context [57] have developed an approach using a rough raster space for the field representation of a spatial entity and evaluated it on a classification case study for remote sensing images. In [10] Bittner and Stell consider K-labeled partitions, which can represent maps, and then develop their relationship to rough sets to approximate map objects with vague boundaries. Additionally they investigate stratified partitions, which can be used to capture levels of details or granularity such as in consideration of scale transformations in maps, and extend this approach using the concepts of stratified rough sets.
106
T. Beaubouef and F.E. Petry
2.2 Fuzzy Set Basics Extensions to ordinary set theory, known as fuzzy set theory, provide widely recognized representations of imprecision and vagueness [61]. Here we will overview some basic concepts of fuzzy sets and a more complete introduction can be found in several comprehensive sources [18, 29, 38]. Conventionally we can specify a set C by its characteristic function, Char C(x). If U is the universal set from which values of C are taken, then we can represent C as C = { x | x ∈ U and Char C (x) = 1 } This is the representation for a crisp or non-fuzzy set. For an ordinary set C, the characteristic function is of the form Char C (x): U →{ 0, 1 } However for a fuzzy set A we have Char A (x): U →[ 0, 1 ] That is, for a fuzzy set the characteristic function takes on all values between 0 and 1 and not just the discrete values of 0 or 1 representing the binary choice for membership in a conventional crisp set such as C. For a fuzzy set the characteristic function is often called the membership function and denoted µ A (x). As an example of a fuzzy set consider a description of mountainous terrain. We want to use a linguistic terminology to represent whether an estimate of elevation is viewed as a low, medium, or high cost. If we assume we have obtained opinions of experts knowledgeable about such terrain, we can define fuzzy sets for these terms. Clearly it is reasonable to represent these as fuzzy sets as they represent judgmental opinions and cannot validly be given precise specification. Here we will provide a typical representation of a fuzzy set for the term "HIGH". HIGH = { 0.0 / 0.1K, 0.125 / 0.5K, 0.5 / 1K, 0.8 / 2K, 0.9 / 3K, 1.0 / 4K } This typical representation enumerates selected elements and their respective membership values as x / µ A (x). The elements are shown in kilometers, i.e., K. It is also common to more fully specify the membership function µ A (x) in an analytic form or as a graphical depiction. The membership function for the representation shown as in HIGH could be fully specified by interpolation between the consecutive elements listed. Also extrapolation past the first and last elements completes the specification, i.e., µ A (x) = 0.0, x ≤ 0. 1K and µ A (x) = 1.0, x ≥ 4 K All of the basic set operations must have equivalent ones in fuzzy sets, but there are additional operations based on membership values of a fuzzy set that hence have no correspondence in crisp sets. We will use the membership functions µ A
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
107
and µ B to represent the fuzzy sets A and B involved in the operations to be illustrated. Set Equality: Set Containment: Set Complement:
A = B ⇔µ A (x) = µ B (x) A ⊆ B ⇔µ A (x) ≤ µ B (x) A ⇔ { x / ( 1 − µ A (x) ) }
For ordinary crisp sets A ∩ A = Ø; however, this is not generally true for a fuzzy set and its complement. This may seem to violate the law of the excluded middle, but this is just the essential nature of fuzzy sets. Since fuzzy sets have imprecise boundaries, we cannot place an element exclusively in a set or its complement. This definition of complementation has been justified more formally by Bellman and Giertz [7]. Set Union and Set Intersection A ∪ B ⇔µ A ∪ B (x) = Max ( µ A (x), µ B (x) ) A ∩ B ⇔µ A ∩B (x) = Min ( µ A (x), µ B (x) ) The justification for using the Max and Min functions for these operations is given in [7]. With these definitions, the standard properties for crisp sets of commutativity, associativity, and so forth, hold for fuzzy sets. There have been a number of alternative functions proposed to represent set union and intersection [18, 60]. For example, in the case of intersection, a product definition, µ A (x) * µ B (x), has been considered.
2.3 Rough Set Basics Rough set theory, introduced by Pawlak [37] is a technique for dealing with uncertainty and for identifying cause-effect relationships in databases as a form of database learning. They have been widely used in data mining applications. Rough sets involve the following: U is the universe, which cannot be empty, R is the indiscernability relation, or equivalence relation, A = (U,R), an ordered pair, is called an approximation space, [x]R denotes the equivalence class of R containing x, for any element x of U, elementary sets in A - the equivalence classes of R, definable set in A - any finite union of elementary sets in A. Therefore, for any given approximation space defined on some universe U and having an equivalence relation R imposed upon it, U is partitioned into equivalence classes called elementary sets which may be used to define other sets in A. Given that X ⊆ U, X can be defined in terms of definable sets in A as following:
108
T. Beaubouef and F.E. Petry
∈ U | [x] upper approximation of X in A is the set R X = {x ∈ U | [x] lower approximation of X in A is the set RX = {x
R
⊆ X}
R
∩ X ≠ ∅}.
Another way to describe the set approximations is as follows. Given the upper and lower approximations R X and RX, of X a subset of U, the R-positive region of X is POSR(X) = RX, the R-negative region of X is NEGR(X) = U - R X, and the boundary or R-borderline region of X is BNR(X) = R X - RX. X is called Rdefinable if and only if RX = R X. Otherwise, RX ≠ R X and X is rough with respect to R. In Figure 1 the universe U is partitioned into equivalence classes denoted by the squares. Those elements in the lower approximation of X, POSR(X), are denoted with the letter P and elements in the R-negative region by the letter N. All other classes belong to the boundary region of the upper approximation.
U N
N
X
N N
P P
P
P
N
P
N
N N N
Fig. 1 Example of a Rough Set X
2.4 Rough Set Modeling of Spatial Data Let U = {tower, stream, creek, river, forest, woodland, pasture, meadow}and let the equivalence relation R be defined as follows: R* = {[tower], [stream, creek, river], [forest, woodland], [pasture, meadow]}. Given some set X = { tower, stream, creek, river, forest, pasture}, we would like to define it in terms of its lower and upper approximations: RX = {tower, stream, creek, river}, and R X = {tower, stream, creek, river, forest, woodland, pasture, meadow}. The lower approximation contains those equivalence classes that are included entirely in the set X. The upper approximation contains the lower approximation plus those classes that are only partially included in X. In this example all the values in the classes [tower] and [stream, creek, river] are included in X so these
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
109
belong to the lower approximation region. The class [forest, woodland] is not entirely included in X since X does not contain ‘woodland.’ However, [forest, woodland] is part of the upper approximation since forest X. A rough set in A is the group of subsets of U with the same upper and lower approximations. In the example given, the rough set is
∈
{{tower, stream, creek, river, forest, pasture} {tower, stream, creek, river, forest, meadow} {tower, stream, creek, river, woodland, pasture} {tower, stream, creek, river, woodland, meadow}}. Although the rough set theory defines the set in its entirety this way, for our applications we typically will be dealing with only certain parts of this set at any given time. The major rough set concepts of interest are the use of an indiscernibility relation to partition domains into equivalence classes and the concept of lower and upper approximation regions to allow the distinction between certain and possible, or partial, inclusion in a rough set. The indiscernibility relation allows us to group items based on some definition of ‘equivalence’ as it relates to the application domain. We may use this partitioning to increase or decrease the granularity of a domain, to group items together that are considered indiscernible for a given purpose, or to “bin” ordered domains into range groups. In order to allow possible results, in addition to the obvious, certain results encountered in querying an ordinary spatial database system, we may employ the use of the boundary region information in addition to that of the lower approximation region. The results in the lower approximation region are certain. These correspond to exact matches. The boundary region of the upper approximation contains those results that are possible, but not certain.
3 Applications There have been many applications of both fuzzy and rough set theory to various topics related to spatial data. In following we discuss a number of these important applications and present details on significant ones.
3.1 Fuzzy Set Terrain Modeling Several approaches to deriving fuzzy representation of terrain features from digital elevation models (DEM) have been proposed. Skidmore [47] used Euclidean distances of a given location to the nearest streamline and ridgeline to represent the location’s relative position, but a Euclidean distance is often not sufficient to represent local morphological characteristics. Irvin et al. [27] performed a continuous classification of terrain features using the fuzzy k-mean method. As a basically unsupervised classification, the fuzzy k-mean method sometimes has difficulty in producing results that satisfactorily match domain experts’ (e.g., soil
110
T. Beaubouef and F.E. Petry
scientists) views on landscapes. MacMillan et al. [33] developed a sophisticated and comprehensive rule-based method for fuzzy classification of terrain features that requires intensive terrain analysis operations and has a high demand for users’ knowledge of local landform. Another method [45] derives the fuzzy membership of a test location as being a specific terrain feature based on the location’s similarity to the typical locations of that terrain feature. This can be very useful for special terrain features that have very unique meanings to soil-landscape analysts as unique soil conditions often exist at such locations. A definition-based and a knowledge-based approach are given as ways to specify typical locations. Where there is a clear geomorphology, simple rules based on the definitions can be used to determine the typical locations. For example there are algorithms for determining ridgelines and streamlines that can be used. However, if a terrain feature has only has a local or regional meaning, finding the typical location may require knowledge from local experts. This may be captured through manual delineation using a GIS visualization tool. The similarities of any other location to those specified typical locations can be evaluated based on a set of selected terrain attributes such as elevation, slope gradient, curvatures, etc. The process of assigning fuzzy membership value to a location then consists of three steps: 1. Evaluation of similarity of a test location and a typical location at the individual terrain attribute level. 2. Integration of similarities on individual terrain attributes yielding overall similarity between test location and a typical location. 3. Integration of test location’s similarities to all typical locations producing a final fuzzy membership of the test location for being the terrain feature under concern.
3.2 Rough Sets for Gridded Data Often spatial data is associated with a particular grid. The positions are set up in a regular matrix-like structure and data is affiliated with point locations on the grid. This is the case for raster data and for other types of non-vector type data such as topography or sea surface temperature data. There is a tradeoff between the resolution or the scale of the grid and the amount of system resources necessary to store and process the data. Higher resolutions provide more information, but at a cost of memory space and execution time. If we approach the data from a rough set point of view, we can see that there is indiscernibility inherent in the process of gridding or rasterizing data. In Figure 2, for example, there are grid locations that represent the various lake, chemical plant, forest, boatyard, residential, and other classifications. Some grid points are directly on one of these classifications and some are in between one or more of them. A data item at a particular grid point in essence may represent data near the point as well. This is due to the fact that often point data must be mapped to the grid using techniques such as nearest-neighbor, averaging, or statistics. We may
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
111
Fig. 2 Gridded data for land classification showing coarse grid lines
set up our rough set indiscernibility relation so that the entire spatial area is partitioned into equivalence classes where each point on the grid belongs to an equivalence class. If we change the resolution of the grid, we are in fact, changing the granularity of the partitioning, resulting in fewer, but larger classes.
3.3 Fuzzy Triangulated Irregular Networks Triangulated Irregular Networks (TINs) are one common approach to represent field data as opposed to object-based spatial data. A TIN is based on a partition of the two-dimensional space into non-overlapping triangles. Extensions of TINs [54] have been developed using fuzzy membership grades, fuzzy numbers and type-2 fuzzy sets. The ETIN structure uses a mapping function that specifies a property F of a geographic area. Consider the description of a specific site under evaluation for purchase as “Close to” New York. So a value of 1 for the function indicates the site is near (or in) New York; 0 means the location is actually far (not close) from New York and intermediate values such as 0.6 implies the site might be considered as being more or less close to the city. Another TIN extension is based on fuzzy numbers with triangular membership functions, as these provide a simple model for a fuzzy number. To use fuzzy numbers in the ETIN, it is necessary to extend the type with the associated data value in a point from a simple (crisp) value to a fuzzy set. This can be accomplished at every point of the region under consideration by associating a triangular membership function. Three characterizing points are then of importance: the two points where the membership grade equals 0 which delimit the membership function, and the intermediate point for which the membership grade equals 1.
112
T. Beaubouef and F.E. Petry
Finally the ETIN structure can use type-2 fuzzy sets, a generalization of regular fuzzy sets, allowing imprecision as well as uncertainty regarding the membership grades to be modeled. Consider the certainty about the extent to which a site is "close to" New York. When describing for example the location of some individual, there might be doubt as to exactly where they are located. The person could be located close to New York, but also near Newark, New Jersey. A type-2 fuzzy set allows this doubt to be modeled: With this approach, the membership grade on every location is extended to a "fuzzy" membership grade. As a result, every point will now have an associated fuzzy set over [0,1].
3.4 Fuzzy Spatial Interpolation Since as we have seen geographical data are a combination of fuzzy and crisp data types there is a need to rely on the application of fuzzy based interpolation techniques. When interpolation data are not sets of real numbers but ranges of values whose distribution within the range are qualitative, sample data have to be determined with a theory of possibility. For example, geological data may be collected from wells where it is not obvious from the sample description the exact component percentages of clay, sand, or silt. A fuzzy interpolation approach [17] is derived from gradual rules that in fact fully capture the interpolation process. The formulations are given on the basis of linear interpolation that uses fuzzy and precisely known – crisp data which has roots in the fuzzy Lagrange interpolation theorem. This approach has been applied to two dimensional spatial interpolation based on fuzzy Voronoi diagrams, fuzzy function estimator, three dimensional spatial interpolation based on fuzzy neural networks, and GIS based fuzzy spatio-temporal interpolation. For example a fuzzy Voronoi approach can be applied to thematic maps represented by polygons with categories such as forest types where each polygon is assigned specific attributes (e.g. wood volume). Polygon boundaries are uncertain because of varying interpretations of imagery data. Distributions of attribute values over surfaces are not reliable because of sparseness of in situ measurements. Since most geographic attributes are not of a continuous nature, spatial interpolation is needed to create a continuous surface of selected attributes and to represent the transition zones between polygons. These issues can be resolved using fuzzy Voronoi diagrams first by constructing Voronoi diagrams around known points with well-specified attributes. The next step positions a “query point” in the Voronoi diagram and a new diagram reconstructed as if the query point was one of the original data points. Thus, new polygons are delineated containing the area stolen from the original polygons. The percentage of the stolen area from each polygon constitutes the fuzzy membership value for a thematic category represented by the corresponding original polygon. If a grid of query points is processed over the entire surface at regular intervals, a series of grid points with fuzzy membership values are produced for each geographic category. Linear interpolation can then be used to produce a continuous surface that can be stored in a raster GIS format. The attributes of interest are evaluated at any location on the defined fuzzy map by multiplying the mean estimated volume of the particular attribute for each
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
113
geographic category by the corresponding fuzzy membership value over all geographic categories. In a GIS spatial data is represented by snapshot layers corresponding to time intervals limiting by the temporal granularity spatial change detection. Fuzzy temporal interpolation [16] uses fuzzy probable trajectories of gradual progression from one class to another. The degrees of membership in a specific class at a particular intermediate space-time location are calculated using fuzzy set membership functions. In [24] a spatial interpolation technique is described that is based on conservative fuzzy interpolation reasoning for interpolating fuzzy rules in sparse fuzzy rule bases. The technique works best in local spatial interpolation so a self-organizing map is used to divide the data into subpopulations in order to reduce the complexity of the whole data space to more homogeneous local regions.
4 Representation of Spatial Relations Relationships among spatial objects can generally be classified in three types: Topological - Touches, Disjoint, Overlap, … French border touches German border 2. Directional - East, North-West, … Prague is East of Frankfurt 3. Metric – Distance Wien is about 50 kilometers from Bratislava 1.
Many topological relations between two objects A and B can be specified using the 9-intersection model which uses the intersections between the interior, boundary and exterior of A and B [21]. This section will describe a variety of approaches introducing uncertainty into these relationships.
4.1 Spatial Relations In [36], Papadias and his colleagues present an approach for determining configuration similarity for spatial constraints involving topology, direction and distance. The approach utilizes extended objects for direction and topology, and centroids for distance. They handle uncertainty in the areas of fuzzy relations, e.g., an object that satisfies more than one directional constraint, as well as fuzziness related to linguistic relationship terms. The concept of graded sections, allows comparison of alternative conceptualizations of direction [30]. To describe graded sections, section bundles are introduced, providing a formal means to (1) compare alternative candidates related via a direction relation like “north” or “south-east,” (2) distinguish between good and not so good candidates, and (3) select a best candidate. Vazirgiannis [53] also considers the problem of representing uncertain topological, directional, and distance relationships on the assumption of crisply bounded objects. All relationship definitions for this approach are centroid-based. A minimal set of topological relations, overlapping and adjacency, are defined based on
114
T. Beaubouef and F.E. Petry
Egenhofer’s boundary/interior model. This model is enhanced by providing degrees of relationship satisfaction. Direction relations are defined by a sinusoidal function based on the angle between two objects’ centroids. Close and far are the two categorizations of distance directions. Membership assignment to one of these categories is determined by the ratio of the distance to a maximum application-dependent distance. The three relationships are combined for query retrieval. Afterward, a similarity measure is computed for each relationship and then combined into a single, overall similarity measure. Another approach to spatial relations uses the histogram of forces [34] to provide a fuzzy qualitative representation of the relative position between two dimensional objects. This can also be used in scene description where relative positions are represented by fuzzy linguistic expressions. In Guesgen [25] we see the introduction of several approaches for reasoning about fuzzy spatial relations, including an extension of Allen’s algorithm and additionally methods for fuzzy constraint satisfaction. Also relevant is [20] which presents a unified framework for approximate spatial and temporal reasoning using topological constraints as the representation schema and fuzzy logic for representing imprecision and uncertainty. The application of the resulting fuzzy representation to each of Allen's interval relationships is developed as the possibility of the occurrence of the conditions of the original definition. Yet another approach of Cobb and Petry [12] is based on minimum bounding rectangles (MBRs) and Allen’s relationships. An MBR is an approximation of the geometry of spatial objects and is defined as the smallest X-Y parallel rectangle which completely encloses an object. The use of MBRs in geographic databases is widely practiced as an efficient way of locating and accessing objects in space. An extension into the spatial domain of Allen's temporal relationships [1] represents any relationship that can exist between two one-dimensional (temporal) intervals including: before, equal, meets, overlaps, during, starts, and finishes, along with their inverses. Given the minimum bounding rectangles of two objects, the binary relationship between the objects in both the horizontal and vertical directions can be completely defined by a tuple, [rx, ry], where rx is the one of the described above Allen's temporal relations that defines the interaction of the object MBRs in the x direction, and ry represents the same for the y direction. For example, for the case of the relationship, A [finishes, starts] B, the definition is given as: { Bxl < Axl < Bx2, Ax2 = Bx2, Byl < Ay2 < By2, Ayl = By1 } where {xl,y1} and {x2, y2} represent the lower left and upper right corners, respectively, of the minimum bounding rectangles. In Figure 3 is an example set of four object MBRs, {A,B,C,D}. A subset of the existing relationships between them consists of: {A [ before, overlaps ] B; B [ before, overlaps –1 ] C; D [ during, meets ] C }.
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
(x2,y2)
(x2,y2)
A
115
(x2,y2)
B C
(x1,y1)
(x1,y1)
(x1,y1)
D (x2,y2) (x1,y1)
Fig. 3 Object for MBR Relationship Description
Again we can use the notation of representing one of Allen’s relations by its initial letter and so we have determined for the relation partially-surrounded-by : { [df] [fd] [do] [ds] [od] [sd] [do-l] } These basic relationship definitions can be used in a similar manner for defining directional relationships: N, S, E, W, NE, SE, SW, NW. Given the spatial extent of two-dimensional objects, it is very likely that in any one case, more than one of the eight directions listed above will apply, to either a greater or lesser degree. So a method for defining directional relationships that would allow for fuzzy querying of any of the directional relationships that exists between two objects is needed The concept of object sub-groups is then used as a basis for determining the set of directions that defines the directional relationship between two objects. Definitions for directions can now be defined in a manner analogous to the way in which qualitative topological relationships were defined earlier. The definition for any particular direction includes the set of all relationships containing that direction as a member of its direction set. The definition for the direction East is shown below as an example. E ::={[dd],[df],[fd],[do],[ds], [ff],[d=],[fo],[fs],[f=],[dd-1],[do-1], …} The basic relationship definitions and their use in defining relevant directional and qualitative topological relationships can then be used to provide a framework for the abstract spatial graph (ASG), a spatial data structure specifically designed to retain orientation and topological information with respect to two-dimensional objects, and to provide information to support fuzzy querying capabilities on these relationships. The ASGs categorize the original relationships according to the level of interaction of the MBRs into four distinct categories: disjoint, tangent, overlapping and containment.
116
T. Beaubouef and F.E. Petry
N B B2
B1 A2, B4
W
B3
E A1
S Fig. 4 Application of thresholding for ASG construction of [fo] relationship
Figure 4 shows the construction of an abstract spatial graph for the [fo] relationship using a thresholding technique. We will note that the northeastern and northwestern axes for sub-group B1, as well as the southeastern and southwestern axes for sub-group A1, are discarded, so that the node on the northern axis of the ASG singularly represents B1; the node on the northwestern axis represents B2; the node on the western axis represents B3; and the node on the southern axis represents A1. The center node of the ASG represents the sub-groups A2 and B4—the reference area. In addition to providing information directly relevant to the representation of the abstract spatial graph, we also needed to represent ancillary information that can be used for fuzzy query inferences. This information is represented in the form of node "weights" that can then be used for the defining of both fuzzy topological and directional qualifiers for use with a fuzzy query framework. Calculation of weights uses both the areas of object sub-groups and the lengths of axes that pass through object sub-groups. Three different types of weights are computed: axis weights, area weights and node weights. The area weights and total node weights of ASGs directly support fuzzy queries regarding qualitative topological and directional information in two specific ways. Area weights provide an indication of the degree to which an object participates in a qualitative topological relationship. By mapping ranges of area weights to linguistic qualifiers such as some, most, etc., fuzzy information such as "some of object A overlaps most of object B," can be determined. Total node weights, on the other hand, are used to indicate the extent to which one object can be considered to lie in a certain direction with respect to a second object. Again, ranges of weights can be correlated to linguistic terms, e.g. slightly, mostly, to provide qualifiers for directional orientation. Then, for example, one could determine that, while object A is slightly southwest of object B, it is at the same time mostly west of object B. So we can determine for our example of Figure 3 that: 1. B is mostly west of C 2. Little of B is northeast of A 3. D is directly south of C 4. C is slightly southeast of B
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
117
4.2 Topological Spatial Relationships for Vague Regions The interplay of topological relations and nearness lies at the core of the motivation of the formalism developed in a series of papers by Schockaert et.al [42 43,44]. These papers provide characterizations of the fuzzy spatial relations, corresponding to the particular case where connection is defined in terms of closeness between fuzzy sets. Also generalization of region connection calculus (RCC) is based on fuzzy set theory, and a development of how reasoning tasks such as satisfiability and entailment checking can be cast into linear programming problems. Keukelaar [28] develops an approach for rough spatial topological relations using 3-valued logic, allowing “maybe” answers to queries about the spatial relationships between objects. Wang et.al [56] deal with imprecise spatial relationships in a straightforward manner for the 9-intersection model. In this they replace the interior, exterior and boundary with positive, negative and boundary regions in a rough set sense based on the lower and upper approximations. A rough matrix representation facilitates computation of rough topological relationships among several spatial objects. In Zhan [62] a method is developed for approximately analyzing binary topological relations between geographic regions with indeterminate boundaries. It shows the eight binary topological relations between regions in a two-dimensional space can be easily determined by this method. A computational fuzzy topology can be developed based on the interior operator and closure operator [32]. These operators are further defined as a coherent fuzzy topology—the complement of the open set is the closed set and vice versa; where the open set and closed set are defined by interior and closure operators— two level cuts. The elementary components of fuzzy topology for spatial objects— interior, boundary and exterior—are thus computed based on the computational fuzzy topology. Yet another approach proposes basic fuzzy spatial object types based on fuzzy topology [52]. These object types are a natural extension of current non-fuzzy spatial object types. A fuzzy cell complex structure is defined for modeling fuzzy regions, lines and points. Furthermore, fuzzy topological relations between these fuzzy spatial objects are formalized based on the 9-intersection approach. In [9] Bittner and Stell present an approach to spatial relations where the consideration of uncertainty is based on the case in which there is limited resolution of spatial data and using approximations that have a close relationship to rough sets. They develop two methods for approximating topological relations, syntactic and semantic. In the first, use is made of the set of precise regions which could be an interpretation of the approximate regions. The syntactic approach also uses algebraic operations which generalize operations on precise regions by using pairs of greatest minimal and least maximal meet operations to approximate the crisp meet used for defining topological relations. Rough set [4] and egg-yolk [13] approaches can also be used to model spatial relationships. In spatial data, it is often the case that we need information concerning the relative distances of objects. Is object A adjacent to object B? Or, is object A near object B? The first question appears to be fairly straightforward. The
118
T. Beaubouef and F.E. Petry
system must simply check all the edges of both objects to see if any parts of them are coincident. This provides what would be certain results in the ideal case . However, often in a GIS, data is input either automatically via scanners or digitized by humans, and in both cases it is easy for error in position of data objects to occur. Therefore, it might be desirable to have the system check to see if object B is very near object A, to derive a possible result. If so, the user could be informed that “it is not certain, but it is possible, that A is adjacent to B.” One may want to know whether a cliff is next to the sea. If the system returns the result that it is possible, but not certain, that the cliff is adjacent to the sea, for example, he may be led to investigate the influence of the tides in the area to determine whether low beaches alongside the cliffs are exposed at low tide. The concepts of connection and overlap can be managed by rough sets in a similar manner to the above. Connection is similar to adjacency, but related to line type objects instead of area objects. Overlap can be defined in a manner similar to that of nearness with the user deciding how much overlap is required for the lower approximation. Coincidence of a single point may constitute possible overlap, as can very close proximity of two objects, if there is a high degree of positional error involved in the data. Inclusion is related to overlap as follows. If an object A is completely surrounded by some other object B, perhaps we can conclude with certainty that A is included in B, lacking any additional information about the two objects. If the two objects overlap, then it may be possible that one of the objects includes the other. Approximation regions can be defined to reflect these concepts as well. Both the rough set and egg-yolk approaches are useful for managing the types of uncertainty and vagueness related to topology, a few of which were just briefly discussed. These include concepts such as nearness, contiguity, connection, orientation, inclusion, and overlap of spatial entities. If we are only concerned about the vagueness of boundaries, we may be inclined to use the egg-yolk approach [13], since this approach does not include any partitioning of the space into equivalence classes as does rough sets. In this approach concentric subregions make up a vague region, with inner subregions having the property that they are ‘crisper’ than outer subregions. These regions indicate a type of membership in the vague region. The simplest case, is that of two
1
4
2
5
3
6
7
Fig. 5 A sample of the 46 possible relationships between regions X (dashed line) and Y (dotted line). A solid line indicates coincidence of an X and Y region boundary.
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
119
subregions. In this most common case, the center region is known as the yolk, the outer region surrounding the yolk is known as the white, and the entire region, as the egg. Figure 5 depicts a sample of these relationships. The yolk and egg regions correspond to the lower and upper approximation regions of rough sets respectively. The rough set theory has only these two approximation regions, unlike the possible numerous subregions that may make up a vague region in the egg-yolk method. However, because of the indiscernibility relation in rough sets, one can vary the partitioning in order to increase or decrease the level of uncertainty present, which results in changes to the approximation regions. Consider specifically the results of Cohn and Gotts [13] who delineate forty-six possible egg-yolk pairs showing all of the possible relationships between two vague regions. The forty-six configurations of egg-yolk pairs were clustered into thirteen groups based on RCC-5 [40] relations between complete crispings, or relations that are “mutually crispable”. Each cluster relates to one or more additional clusters via a crisping relationship or a subset relationship between a set of complete crispings. The clustering of egg-yolk pairs can also be viewed by noting that the relationships for each cluster based on mathematical principles from rough sets. We now recall that “crisping” from the egg-yolk theory can also be related to forcing a finer partitioning on the domain for the rough sets. Some definitions from rough set theory used in categorizing the clusters include: Equality of 2 rough sets: Two rough sets X and Y are equal, X = Y, if RX = RY and R X = R Y. Intersection of two rough sets: R(X∩Y) = RX ∩ RY, and R (X∩Y) = R X ∩ R Y. Subset relationship: X
⊂ Y implies that RX ⊂ RY and R X ⊂ R Y.
In [4] properties of rough sets are used to define the crispings in the various topological clusters as well as the spatial relationships themselves. Figure 6 shows the relationships between clusters based on the levels of crisping from one cluster to another. Numbers within each cluster represent each of the 46 egg-yolk pairs of Cohn and Gotts [13] denoting uncertain spatial relationship for two vague regions. Within the hierarchy an arrow from one cluster to another means that there is some property of rough sets theory that is added to those properties of the beginning cluster in order to make it more “crisp.”
120
T. Beaubouef and F.E. Petry
Fig. 6 Clustering of egg-yolk relationships
Spatial and geographical information systems will continue to play an everincreasing role in applications based on spatial data. Uncertainty management will be necessary for any of these applications, and both rough sets and egg-yolk methods are appropriate for the representation of vague regions in spatial data. Rough sets, however, can also model indiscernibility and allow for the change of granularity of the partitioning through its indiscernibility relation, which has an effect on the boundaries of the vague regions, and also allows the extension of egg-yolk regions from continuous to discrete space. The clustering of egg-yolk pairs by RCC-5 relations can be expressed in terms of operations using rough sets, and rough set techniques can further enhance the egg-yolk approach. The interrelationships between rough set, egg-yolk, and RCC models merit further study.
5 Mining Spatial Information 5.1 Spatial Data Mining An approach [31] to the discovery of association rules for fuzzy spatial data combined and extended techniques developed in both spatial and fuzzy data mining in order to deal with the uncertainty found in typical spatial data. It attempts to uncover correlations of spatially related data such as soil types, directional or geometric relationships, etc. For example an association rule that can be discovered by mining appropriate spatial data is: If C is a small city and has good terrain nearby then there is a road nearby with 90% confidence.
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
121
Such a rule incorporates fuzzy information in the linguistic terms used such as “small” and “nearby”. In the spatial data mining area there have only been a few efforts using rough sets. In the research described in [3], Beaubouef et.al. have investigated approaches for attribute induction knowledge discovery in rough spatial data. Bittner [8] considers rough sets for spatio-temporal data and how to discover characteristic configurations of spatial objects focusing on the use of topological relationships for characterizations. In a survey of uncertainty-based spatial data mining, Shi et al. [46] provide a brief general comparison of fuzzy and rough set approaches for spatial data mining.
5.2 Fuzzy Minimum Bounding Rectangles To utilize minimum bounding rectangles for vague regions, in [50] a fuzzy MBR (FMBR) is defined as consisting of nested rectangles. The inner rectangle is the MBR over the core of the vague region (certain region or membership =1). The outer rectangle is an MBR over the outer boundary of the vague region. This approach allows the consideration of common indexing approaches such as grid files or R-trees. A vague region is one whose boundaries are or can not be precisely defined and we can consider them as being of two main components: the core and the boundary. The core and the boundary are approximated by their minimum bounding rectangle (MBR) respectively. A fuzzy representation, called Fuzzy Minimum Bounding Rectangles (FMBR) [49], can represent the different degrees of membership of the point located inside the vague region. Geographic features are a direct representation of geographic entities rather than geometric elements such as a point, line or polygon. A feature is then defined as an entity with common attributes and relationships. The FMBR [48, 49] represents the generalization of the underlying irregular polygon delimiting the fuzzy region since the FMBR encloses all the points of the map space where our feature of interest is located. The FMBR can be also considered as the circumscribed rectangle (CR) of the underlying fuzzy polygon. Iterative generation of inner bounding rectangles is performed until we have the inscribed rectangle (IR) of the underlying object. So, the IR is the maximum inner rectangle inside the object, and it corresponds to the core of the fuzzy region. Distances between the IR and the FMBR are used to represent the fuzzy boundary. A spatial membership function based on Euclidean distance will be used to determine the degree of belonging of a feature to the fuzzy set. Thus, features inside the IR or core will have degree of membership of 1. This degree will be gradually decreased while we move away from the core. Points located outside of the FMBR will have a membership degree of 0. An FMBR is a natural representation for many commonly occurring spatial situations. The problems of identifying a spatial boundary have been under considerable attention for the GIS area [11]. For example consider photo-interpreters who are trying to label a forest in an image. There is clearly a region (core) which
122
T. Beaubouef and F.E. Petry
all agree is the heart of the forest and merits the specific labeling. However, as the forest thins out into meadows all around, there is no sharp boundary delimiting the forest area. Rather the density of the trees decreases gradually until there is just open meadow land. It is just such a situation that we are trying to model by means of an FMBR. A graphical representation of the fuzzy minimum bounding rectangle, as described above, is illustrated in Figure 7. The underlying vague region Ậ is approximated by the FMBR (Ậ). This first approximation is also called the circumscribed rectangle (CR) of the fuzzy region. In other words, the FMBR or CR corresponds to the minimal rectangle with edges parallel to the x and y axes that optimally enclose the vague region Ậ. Boundary ¨ Ã
Exterior ï FMBR(Ã)
ĮMBR
Fig. 7 FMBR Representation
αMBR-cuts allow us to make finer distinctions inside the fuzzy region since αMBR-cuts are individual crisp regions inside the FMBR Thus, we can think of a fuzzy structured region as an aggregation of crisp α-level regions. αMBRs start to be defined from the edge of the FMBR(Ậ) to the core of (Ậ). The more external the αMBR-cut the lower the degree of membership in the fuzzy set representing (Ậ) as locations which are closer to the core will have higher membership degrees. The shadowed rectangle labeled as Core corresponds to the inscribed rectangle. Since the IR is totally inside (Ậ) we assume that the points in the core belong to the fuzzy region with a membership 1.0. Details about the representation and spatial relationships of FMBRs can be found in [49], [50]. Now we can discuss an approach to an indexing structure that could be used to represent FMBRs. One commonly used index structure in spatial data bases is the R-tree [26] which is the basis of all R-tree variants. Each node corresponds to a disk page and a n-dimensional rectangle. Any entry in the tree is a pair (ref, rect), where ref is the address of the child node and rect is the MBR of all entries in that child node. The root has at least 2 children if not a leaf node. The number of entries in each node is between m (fill-factor) and M (number of entries that can fit in a node), where 2 ≤ m ≤ M/2. All leaves are at the same level. Leaves contain entries of the same format, where ref points to a database object, and rect is the MBR of that object. An object appears in one, and only one of the tree leaves. Rtrees are dynamic structures since insertion and deletion can be intermixed with
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
123
queries and no periodic global reorganization is required. The external memory structure is multi-way and it is indexed by MBRs. R-trees present several weaknesses mainly due to the overlap between buckets regions at the same tree level. Moreover, the region perimeters should be minimized in order to avoid insertion problems. Insertion requires multiple paths of the tree, since the inserted spatial feature may intersect more than one intermediate node, and its clipping parts should be inserted in leaves under all such nodes. R*trees are variations that avoid some of these problems. Representing FMBRs using an R*-tree structure was found very suitable since we can take advantage of the MBR representation of the objects in this model. Figure 8 corresponds with our FMBR R*-tree description.
Fig. 8 Spatial Representation of αMBR-cuts
Since we are interested in treating each αMBR-cut independently we have located each of them as root nodes of the tree. This structure allows us to access the features inside the vague region with a specific degree of membership following a unique path from the root. In addition, geographically close features belonging to the same αMBR-cut can be grouped in MBRs to improve the retrieval process. MBR1
mbr11
mbr121
mbr12
mbr122
MBR2
mbr21
mbr221
mbr22
mbr222
MBR3
mbr31
mbr311
mbr32
mbr312
MBR4
mbr41
mbr411
MBR5
mbr42
mbr43
mbr412
mbr51
mbr511
mbr52
mbr53
mbr512
Fig. 9 FMBR R*-Tree Representation
The R*-Tree of the Figure 9 contains five nodes at the root corresponding to the core, and the four αMBRs approximating the boundary. The core αMBR1 has two MBRS: mbr11 and mbr12, and mbr12 contains mbr121 and mbr 122. A similar structure is maintained in the remaining nodes.
124
T. Beaubouef and F.E. Petry
5.3 Rough Object Oriented Spatial Database Object-oriented databases have become quite popular for many reasons. Classes and inheritance allow for code reuse through specialization and generalization, and encapsulation packages the data and methods that act on the data together in an object. Objects can be defined to represent very complex data structures and to model relationships in the data, as is often the case with spatial data. Object modeling helps in understanding the requirements of an enterprise, and object-oriented techniques lead to high quality systems that are easy to modify and to maintain. Because many newer applications involving CAD/CAM, multimedia and GIS are not suitable for the standard relational database model, object-oriented databases may be developed to meet the needs of these more complex applications. A formal generalized model for object-oriented databases was extended to incorporate rough set techniques in [5] where the rough set concepts of indiscernibility and approximation regions were integrated into a rough object-oriented framework. In this model there is a type system, ts, containing literal types Tliteral, which can be base types, collection literal types, or structured literal types. It also contains Tobject, which specifies object types, and Treference, the set of specifications for reference types. Each domain is a subset of the set of domains, domts Dts. This domain set, along with a set of operators Ots and a set of axioms Ats, capture the semantics of the type specification. The type system is then defined based on these type specifications, the set of all programs P, and the implementation function mapping each type specification for a domain onto a subset of the powerset of P that contains all the implementations for the type system. Of particularly interested are object types defined as :
∈
Class id(id1:s1; …; idn:sn)
or
Class id: id 1 , …, id n(id1:s1; …; idn:sn) where id, an identifier, names an object type, { id i | 1 ≤ i ≤ m} is a finite set of identifiers denoting parent types of t, and { idi:si | 1 ≤ i ≤ n} is the finite set of characteristics specified for object type t within its syntax. This set includes all the attributes, relationships and method signatures for the object type. The identifier for a characteristic is idi and the specification is si for each of the idi:si. Consider a GIS which stores spatial data concerning water and land forms, structures, and other geographic information. If simple types are previously defined for string, set, geo, integer, etc., then one may specify an object type as Class ManMadeFeature ( Location: geo; Name: string; Height: integer; Material: Set(string));
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
125
An example instance of the object type ManMadeFeature might be [oid1, Ø, ManMadeFeature, Struct(0289445, “KXYZ radio tower”, 60, Set(steel, plastic, aluminum))] following the definition of instance of an object type [15], the quadruple o = [oid, N, t, v] consisting of a unique object identifier, a possibly empty set of object names, the name of the object type, and for all attributes, the values (vi domsi) for that attribute, which represent the state of the object. The object type t is an instance of the type system ts and is formally defined in terms of the type system
∈
and its implementation function t = [ts, f
type impl
(ts)].
Rough set uncertainty is modeled through the indiscernibility relations specified for domains and class methods for approximation region results. Each domain class i in the database, domi Di, has methods for maintaining the current level of granulation, changing the partitioning, adding new domain values to the hierarchy, and for determining equivalence based on the current indiscernibility relation imposed on the domain class. Every domain class, then, must be able to not only store the legal values for that domain, but to maintain the grouping of these values into equivalence classes. This can be achieved through the type implementation function and class methods, and can be specified through the use of generalized constraints as in [15] for a generalized object-oriented database. The semantics of rough set operations discussed for relational databases in [6] apply similarly for the object database paradigm. However, the implementation of these operations is done via methods associated with the individual object classes. The incorporation of rough set techniques into an object database model allow not only for the management of uncertainty in spatial data, but also for the representation of complex data relationships and the defining of methods for special cases that often exist in GIS.
∈
6 Conclusions and Future Directions Fuzzy and rough set approaches are increasingly being applied to many areas of spatial data. In this chapter we presented ways in which rough and fuzzy set uncertainty management may be integrated into applications involving spatial data. We reviewed rough sets, an important mathematical theory, applicable to many diverse fields. Rough sets have predominantly been applied to the area of knowledge discovery in databases, offering a type of uncertainty management different from other methods such as probability, fuzzy sets, and others. Both rough set and fuzzy set theory can also be applied to database models. The chapter also discussed the use of rough and fuzzy set techniques for the representation of spatial data relationships, terrain modeling, gridded data, triangulated irregular networks, and spatial interpolation. Their use in the modeling of topological spatial relationships for vague regions was presented, and their integration into and data mining of object-oriented and other spatial databases discussed. The main
126
T. Beaubouef and F.E. Petry
emphasis for future work is the incorporation of some of these research topics into mainstream GIS commercial products. Acknowledgments The authors would like to thank the Naval Research Laboratory’s Base Program, Program Element No. 0602435N for sponsoring this research.
References [1] Allen, J.: Maintaining Knowledge about Temporal Intervals. Comm. of the ACM 26(11), 832–843 (1983) [2] Ahlqvist, O., Keukelaar, J., Oukbir, K.: Rough classification and accuracy assessment. International Journal of Geographical Information Science 14(5), 475–496 (2000) [3] Beaubouef, T., Ladner, R., Petry, F.: Rough Set Spatial Data Modeling for Data Mining. Int. Jour. of Intelligent Systems 19, 567–584 (2004) [4] Beaubouef, T., Petry, F., Ladner, R.: Spatial Data Methods and Vague Regions: A Rough Set Approach. Applied Soft Computing Journal 7, 425–440 (2007) [5] Beaubouef, T., Petry, F.: Uncertainty in an OODB Modeled by Rough Sets. In: The 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2002), Annecy, France, July 2002, pp. 1697–1703 (2002) [6] Beaubouef, T., Petry, F., Buckles, B.: Extension of the Relational Database and its Alge-bra with Rough Set Techniques. Computational Intelligence 11(2), 233–245 (1995) [7] Bellman, R., Giertz, M.: On the Analytic Formalism of the Theory of Fuzzy Sets. Information Sciences 5, 149–156 (1973) [8] Bittner, T.: Rough sets in spatio-temporal data mining. In: Roddick, J., Hornsby, K.S. (eds.) TSDM 2000. LNCS (LNAI), vol. 2007, pp. 89–104. Springer, Heidelberg (2001) [9] Bittner, T., Stell, J.: Approximate Qualitative Spatial Reasoning. Spatial Cognition and Computation 2(4), 435–466 (2002) [10] Bittner, T., Stell, J.: Stratified Rough Sets and Vagueness. In: Kuhn, W., Worboys, M.F., Timpf, S. (eds.) COSIT 2003. LNCS, vol. 2825, pp. 286–303. Springer, Heidelberg (2003) [11] Burrough, P.: Natural objects with indeterminate boundaries. In: Burrough, P.A., Frank, A. (eds.) Geographic Objects with Indeterminate Boundaries, pp. 3–28. Taylor & Francis, London (1996) [12] Cobb, M., Petry, F.: Modeling Spatial Data within a Fuzzy Framework. Jour. of Amer. Soc. Information Science 49(3), 253–266 (1998) [13] Cohn, A., Gotts, N.: The ‘Egg-Yolk’ Representation of Regions with Indeterminate Boundaries. In: Burrough, P., Frank, A. (eds.) Geographic Objects with Indeterminate Boundaries GISDATA II. European Science Foundation, ch. 12 (1996) [14] Cross, V., Firat, A.: Fuzzy objects for geographical information systems. Fuzzy Sets and Systems 113, 19–36 (2000) [15] De Tré, G., De Caluwe, R.: A Generalized Object-Oriented Database Model with Gen-eralized Constraints. In: Proc. of NAFIPS 1999, New York, pp. 381–386 (1999) [16] Dragicevic, S., Marceau, D.: A fuzzy set approach for modelling time in GIS. International Journal of Geographical Information Science 14(3), 225–245 (2000)
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
127
[17] Dragićević, S.: Multi-Dimensional Interpolations with Fuzzy Sets. In: Petry, F., Robinson, V., Cobb, M. (eds.) Fuzzy Modeling with Spatial Information for Geographic Problems, pp. 143–158. Springer, Heidelberg (2005) [18] Dubois, D., Prade, H.: Fuzzy Sets and Systems: Theory and Applications. Academic Press, NY (1980) [19] Duckham, M., Mason, K., Stell, J., Worboys, M.: A formal ontological approach to imperfection in geographic information. Computers, Environment and Urban Systems 25, 89–103 (2001) [20] Dutta, S.: Approximate spatial reasoning: Integrating qualitative and quantitative constraints. International Journal of Approximate Reasoning 5(3), 307–331 (1991) [21] Egenhofer, M., Franzosa, R.: Point set Topological Spatial Relations. Int. Journal of Geographical Information Systems 5(2), 161–174 (1991) [22] Fisher, P.: Boolean and fuzzy regions. In: Burrough, P.A., Frank, A. (eds.) Geographic Objects with Indeterminate Boundaries, pp. 87–94. Taylor and Francis, London (1996) [23] Gale, S.: Inexactness, fuzzy sets, and the foundations of behavioral geography. Geographical Analysis 4, 337–349 (1972) [24] Gedeon, T., Wong, K., Wong, P., Huang, Y.: Spatial Interpolation using Fuzzy GIS Ency Reasoning. Transactions in GIS 7(1), 55–66 (2003) [25] Guesgen, H.: Fuzzifying Spatial Relations. In: Matsakis, P., Sztandra, L. (eds.) Applying Soft Computing in Defining Spatial Relations, pp. GR 99-122. Physica Verlag, Heidelberg (2002) [26] Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 47–57 (1984) [27] Irvin, B., Ventura, S., Slater, B.: Fuzzy and isodata classification of landform elements from digital terrain data in Pleasant Valley, Wisconsin. Geoderma 77, 137–154 (1996) [28] Keukelaar, J.: Topics in Soft Computing, unpublished Ph.D. Dissertation, Royal Institute of Technology, Stockholm Sweden (2002) [29] Klir, G., Folger, T.: Fuzzy Sets, Uncertainty, and Information. Prentice-Hall, NY (1988) [30] Kulik, L., Eschenbach, C., Habel, C., Schmidtke, H.: A graded approach to directions between extended objects. In: Egenhofer, M.J., Mark, D.M. (eds.) GIScience 2002. LNCS, vol. 2478, pp. 119–131. Springer, Heidelberg (2002) [31] Ladner, R., Petry, F., Cobb, M.: Fuzzy Set Approaches to Spatial Data Mining of Association Rules. Transactions on GIS 7#1, 123–138 (2003) [32] Li, Y., Li, S.: A fuzzy topology for computing the interior, boundary, and exterior of spatial objects quantitatively in GIS. In: Shi, W., Liu, K. (eds.) Computers and Geosciences, vol. 33(7), pp. 898–915 (2007) [33] MacMillan, R., Pettapiece, W., Nolan, S., Goddard, T.: A generic procedure for automatically segmenting landforms into landform elements using DEMs, heuristic rules and fuzzy logic. Fuzzy Sets and Systems 113, 81–109 (2000) [34] Matsakis, P.: Understanding the Spatial Organization of Image Regions by Force Histograms. In: Matsakis, P., Sztandra, L. (eds.) Applying Soft Computing in Defining Spatial Relations, pp. 1–16. Physica Verlag, Heidelberg (2002) [35] Morris, A.: A framework for modeling uncertainty in spatial databases. Transactions in GIS 7, 83–101 (2003)
128
T. Beaubouef and F.E. Petry
[36] Papadias, D., Karacapilidis, N., Arkoumnais, D.: Processing Fuzzy Spatial Queries: A Configuration Similarity Approach. Int. J. Geographical Information Science 13, 93–128 (1999) [37] Pawlak, Z.: Rough Sets. Int. Journal of Man-Machine Studies 21, 127–134 (1984) [38] Pedrycz, W., Gomide, F.: An Introduction to Fuzzy Sets –Analysis and Design. MIT Press, Cambridge (1998) [39] Robinson, V., Frank, A.: About Different Kinds of Uncertainty in Geographic Information Systems. In: Proceedings AUTOCARTO 7 Conference, pp. 440–449. American Society for Photogrammetry and Remote Sensing, Falls Church (1985) [40] Roy, A., Stell, J.: Spatial Relations Between Indeterminate Regions. International Journal of Approximate Reasoning 27, 205–234 (2001) [41] Schneider, M.: Spatial Data Types for Database Systems Doctoral Thesis, Fern Universität Hagen, GR (1995) [42] Schockaert, S., De Cock, M., Cornelis, C., Kerre, E.: Fuzzy region connec-tion calculus: Representing vague topological information. International Journal of Approximate Reasoning 48, 314–331 (2008) [43] Schockaert, S., De Cock, M., Cornelis, C., Kerre, E.: Fuzzy region connec-tion calculus: An interpretation based on closeness. International Journal of Approximate Reasoning 48, 332–347 (2008) [44] Schockaert, S., De Cock, M., Kerre, E.: Spatial reasoning in a fuzzy region connection calculus. Artificial Intelligence 173(2), 258–298 (2009) [45] Shi, X., Xing, A., Zhu, A., Wang, R.: Fuzzy Representation of Special Terrain Features Using a Similarity-based Approach. In: Petry, F., Robinson, V., Cobb, M. (eds.) Fuzzy Modeling with Spatial Information for Geographic Problems, pp. 233–252. Springer, Heidelberg (2005) [46] Shi, W., Wang, S., Li, D., Wang, X.: Uncertainty-based Spatial Data Mining. In: Proceedings of Asia GIS Association, Wuhan, China, pp. 124–135 (2003) [47] Skidmore, A.: Terrain position as mapped from a gridded digital elevation model. Int. J. Geographical Information Systems 4, 33–49 (1990) [48] Somodevilla, M.: Fuzzy MBRs Modeling for Reasoning about Vague Regions. PhD thesis, Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA (2003) [49] Somodevilla, M., Petry, F.: Approximation of Topological Relations on Fuzzy Regions: An Approaah using Minimal Bounding Rectangles. In: Proceedings of the NAFIPS 2003 Conference, Chicago, vol. Il, pp. 1–23 (2003) [50] Somodevilla, M., Petry, F.: Fuzzy Minimum Bounding Rectangles. In: DeCaluwe, R., DeTre, G., Bordogna, G. (eds.) Flexible Querying and Reasoning in Spatio-temporal Databases. Theory and Applications. Studies in Fuzziness and Soft Computing, pp. 237–264. Springer, Heidelberg (2004) [51] Stoms, D.: Reasoning with Uncertainty in Intelligent Geographic Information Systems. In: Proceedings GIS 1987 2nd Annual International Conference on Geographic Information Systems, pp. 693–699. American Society for Photogrammetry and Remote Sensing, Falls Church (1987) [52] Tang, X., Fang, Y., Kainz, W.: Fuzzy Topological Relations Between Fuzzy Spatial Objects. In: Wang, L., et al. (eds.) FSKD 2006. LNCS (LNAI), vol. 4223, pp. 324–333. Springer, Heidelberg (2006) [53] Vazirgiannis, M.: Uncertainty Handling in Spatial Relationships. In: ACM-SAC 2000 Proceedings, Como Italy, pp. 215–221 (2000)
Fuzzy and Rough Set Approaches for Uncertainty in Spatial Data
129
[54] Verstraete, J., De Tré, G., De Caluwe, R., Hallez, A.: Field Based Methods for the Modeling of Fuzzy Spatial Data. In: Petry, F., Robinson, V., Cobb, M. (eds.) Fuzzy Modeling with Spatial Information for Geographic Problems, pp. 41–70. Springer, Heidelberg (2005) [55] Wang, F.: A Fuzzy Grammar and Possibility Theory-based Natural Language User Interface for Spatial Queries. International Journal of Fuzzy Sets and Systems 113, 147–159 (2000) [56] Wang, S., Li, D., Shi, W., Wang, X.: Rough Spatial Description. In: International Archives of Photogrammetry and Remote Sensing, vol. XXXII, Commission II, pp. 503–510 (2002) [57] Wang, S., Yuan, H., Chen, G., Li, D., Shi, W.: Rough Spatial Interpretation. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 435–444. Springer, Heidelberg (2004) [58] Worboys, M.: Imprecision in Finite Resolution Spatial Data. Geoinformatica 2(3), 257–280 (1998) [59] Worboys, M.: Computation with imprecise geospatial data. Computers, Environment and Urban Systems 22(2), 85–106 (1998) [60] Yager, R.: On a General Class of Fuzzy Connectives. Fuzzy Sets and Systems 3, 235–242 (1980) [61] Zadeh, L.: Fuzzy Sets. Information and Control 8, 338–353 (1965) [62] Zhan, F.B.: Approximate analysis of binary topological relations between geographic regions with indeterminate boundaries. Soft Computing 2(2), 28–34 (1998) [63] Zhang, J., Goodchild, M.: Uncertainty in Geographical Information. Taylor & Francis, London (2002)
Part 2: Symbolic Reasoning and Information Merging
An Exploratory Survey of Logic-Based Formalisms for Spatial Information Florence Dupin de Saint-Cyr, Odile Papini, and Henri Prade
Abstract. This chapter presents a tentative survey of logic-based formalisms for representing various aspects of spatial information ranging from the expression of spatial relationships between regions to the attribution of properties to definite regions. The first main part of the paper reviews the logic-based representations of mereotopologies in classical or modal logics, and in fuzzy and rough sets settings, as well as modal logic representations of geometries. The second main part is devoted to the handling of properties associated to regions. The association either relates properties to a current region of interest, or to explicitly named regions. Properties may be attached to a whole region and hold “everywhere”, or hold “somewhere”, or “elsewhere”. Properties and their localization may be also pervaded with uncertainty. This overview reveals that the many existing formalisms address different issues, and when they deal with the same issue they do it differently. However, it seems that in practice there is a need for a combination of representational capabilities, which could cover both spatial relationships and localized properties, possibly in presence of uncertainty.
1 Introduction Qualitative representations of spatial information attempt to model various spatial aspects: i) topology with the notions of contact, connection and inclusion, ii) orientation with the notions of alignment, projection, positions of different objects, iii) distance between objects, iv) shape with the notions of convexity and dimensions of objects. Odile Papini LSIS UMR-CNRS 6168. ESIL, Universit´e de la M´editerran´ee e-mail:
[email protected] Florence Dupin de Saint-Cyr · Henri Prade IRIT UMR-CNRS e-mail: {bannay,prade}@irit.fr R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 133–163. springerlink.com
c Springer-Verlag Berlin Heidelberg 2010
134
F.D. de Saint-Cyr, O. Papini, and H. Prade
Most of the logical representations of spatial information, contrary to geometry, do not consider points as basic objects but spatial regions, and were initially dedicated to particular reasoning tasks, such as reasoning about spatial relationships between regions. However, reasoning about spatial information may also require other capabilities for expressing how some property spatially applies. The first formalisms studied in the literature for representing spatial relations between regions have used predicate logic. Most of them rely on first order theories based on a family of partial preorders between regions, called mereologies. Although the term “mereology” suggests that parthood is the basic notion, some approaches stem from the primitive notion of connection following Clarke [14]. Others focus on richer structures [34], like mereotopologies which in addition include topological concepts, or like mereogeometries which handle regions with geometric properties. Modal logics have also been proposed for representing spatial information. The spatial interpretations of modalities have been provided for capturing various spatial concepts qualitatively [34] with a topological or geometric flavor such as nearness or distance, for example. Besides, all these formalisms do not usually handle uncertainty about spatial information. Still spatial information, as any kind of information, may be pervaded with uncertainty and gradedness. Indeed, uncertainty may affect spatial information in many ways. One may be uncertain if some spatial relation holds between two regions. Spatial connection relations may become a matter of degree, and regions may also become fuzzy. One may be also uncertain if some property is true or not in a region. Fuzzy properties, which are a matter of degree, may be used also. Association rules pertaining to spatial data have confidence degrees as any association rules, and moreover the rules may be mined from data that is by itself uncertain. So the handling of uncertainty in spatial information representation and spatial reasoning has many facets. In the following, we will only mention some works based on fuzzy set or rough set theories dealing with some of these issues, before reviewing how uncertainty is handled in a possibilistic logic manner in the setting of a simple attributive language associating properties to regions. Section 2 surveys the most significant approaches for representing binary relations between regions while Section 3 reviews approaches dealing with single regions and their properties. Concluding remarks in Section 4 outline directions for further research. This paper is mainly an attempt at providing an organized survey of existing formalisms. More precisely, the classical logic-based formalisms surveyed in Section 2 is based on Muller and Dugat’s chapter 2 of [34], while the modal logic-based formalism reviewed in Section 2 and 3 mainly rely on Papini’s chapter 3 in [34]. The last subsection of Section 3 about the handling of spatial information by means of attributive formulas includes a summary of recent works by the two other authors of this chapter. Still we hope that it
An Exploratory Survey of Logic-Based Formalisms
135
may be used as a starting point for having a better understanding of the basic representation needs, and maybe of how to handle them in a joint manner.
2 Spatial Relations between Regions This section surveys the most significant approaches for representing spatial relations. It starts with the logic-based representations of topology since the first logic-based representations of spatial information stem from predicate logic and are dedicated to the representation of mereologies then to mereotopologies. In order to obtain decidability for computational issues, mereotopologies have been encoded thanks to modal logics. More recently, in order to represent uncertainty, topological relation have been located in the fuzzy framework. However, many situations involve orientation, alignment and metric notions and mereotopologies are insufficient for representing spatial information. In order to represent these notions, mereogeometries, modal logics for geometry and modal logics for distance have been proposed.
2.1 Classical Logic for Topology This brief review is based on materials that can be found in the chapter 2 of [34]. In the following, we survey the most significant proposed first order theories for representing spatial information. They ontologically differ since some stem from the basic notion of point while others stem from the basic notion of region. In this section we use first order logic, with usual connectives ¬, ∧, ∨, →, ↔ and quantifiers ∀, ∃. The upper case Roman letters are used to denote the predicates, the lower case Roman letters x, y, · · · are used to denote variables while p, q, · · · denote propositions. The symbols and ⊥ denote tautology and contradiction respectively. The symbol denotes classical inference relation. 2.1.1
Mereologies
Mereologies are partial preorders proposed for formalizing the generic notion of the part of relation between regions considered as basic objects. The first axiomatization is due to Lesniewski [36] who proposed an alternative to the set theory. Within this framework, the order relation is the inclusion relation between entities. The most recent versions used in artificial intelligence have been proposed by Varzi [56] where the primitive relation, denoted by P , is the part of relation between two entities. P (x, y) means that x is a part of y. The axiomatic consists of the three following basic axioms:
136
F.D. de Saint-Cyr, O. Papini, and H. Prade
P1 : P (x, x) P2 : P (x, y) ∧ P (y, x) → x = y P3 : P (x, y) ∧ P (y, z) → P (x, z) From these three basic axioms, the notions of proper part of, denoted by P P , overlap, denoted by O and partial overlap, denoted by P O are defined as follows: P P (x, y) =def P (x, y) ∧ ¬P (y, x) O(x, y) =def ∃z (P (z, x) ∧ P (z, y)) P O(x, y) =def O(x, y) ∧ ¬P (y, x) ∧ ¬P (x, y) One of the characteristics of these theories is the supplementation principle that states that if a region is not included in another, there exists a region which allows for making the difference, i.e., being included in the first region but not intersecting the second one. More formally: P4 : ¬P (x, y) → ∃z (P (z, x) ∧ ¬O(z, y)) This principle implies the extensionality of the O relation (and also of the P P relation) which states that all the entities that cover the same entities are equal. More formally: P1 , P2 , P3 , P4 (∀z (O(x, z) ↔ O(y, z))) ↔ (x = y) Operations are defined within the theory, the sum, denoted by +, the intersection denoted by ., and difference, or complement, denoted by −. The closure principles state the existence of the sum of two entities, of their intersection when they overlap, of their difference and the existence of one universal individual which contains all the others. More formally, the axioms P5 , P6 , P7 give the definition of the sum, the intersection and the difference respectively: P5 : (∃z (P (x, z) ∧ P (y, z))) → (∃u ∀v(O(v, u) ↔ (O(v, x) ∨ O(v, y)))) (sum : +) 1 P6 : O(x, y) → ∃z ∀w(P (w, z) ↔ (P (w, x) ∧ P (w, y))) (intersection : .) P7 : (∃z(P (z, x)∧¬O(z, y))) → (∃u ∀w(P (w, u) ↔ (P (w, x)∧¬(O(w, y)))) (complement : −) P8 : ∃z ∀xP (x, z) (universe) Most of the mereologies2 do not accept the existence of the empty set that could be a null individual satisfying the following axiom: P9 : ∃z ∀xP (z, x) 1
2
In the following, x + y denotes the entity u such that ∀v(O(v, u) ↔ (O(v, x) ∨ O(v, y)). Except [11] and [37] cited by [56].
An Exploratory Survey of Logic-Based Formalisms
137
Some theories satisfy the Unrestricted Fusion principle, also called General Sum Principle. Let φ be a first order formula having one free variable, then the fusion of all objects satisfying φ can be defined as follows: P10 : (∃x φ(x)) → ∃z ∀y(O(y, z) ↔ ∃x (O(x, y) ∧ φ(x))) Denote by σxφ(x) the entity z which existence is stated by P10 , the property that states that the sum of two entities is the fusion of their parts holds. More formally: P1 − P4 , P10 u + v = σx(P (x, u) ∨ P (x, v)) The main characteristics of the mereological theories do not have a necessary spatial interpretation; these are general theories, alternatives to set theories. This is the reason why richer structures have been introduced. 2.1.2
Mereotopologies
In order to represent spatial notions, mereological theories are not sufficient by themselves; they are too weak with respect to the structures they characterize and topological notions are fundamental for spatial representation. The standard mathematical definition of a topological theory is characterized by a set of open (or closed) sets such that the intersection of two open sets is an open set and such that the union of open sets is an open set. The notion of open set allows for the definition of the notion of connectedness. For a region connectedness means that there does not exist two disjoint open sets such that the region equals the union of these open sets. Topology characterizes the notion of a “whole” (one piece, self-contained whole as opposed to scattered entities made up of several disconnected parts) that cannot be formulated in mereological terms. First order theories, called mereotopologies, embodying both mereological concepts, based on the notion of parthood, and topological concepts, based on the notion of connection, have been proposed. Mereotopologies for regions There are several ways for introducing mereotopological concepts; the first one is to consider as primitive relations the part of relation P and the connection relation between regions, denoted by C. The three basic axioms are the following: C1 : C(x, x) C2 : C(x, y) → C(y, x) C3 : P (x, y) → ∀z C(z, x) → C(z, y) An alternative way is to only consider the connection relation, as primitive relation, and to replace the C3 axiom by the following one: C4 : ∀x ∀y (∀z (C(x, z) ↔ C(y, z))) → x = y
138
F.D. de Saint-Cyr, O. Papini, and H. Prade
In this approach the relation P is defined by: P (x, y) =def ∀z (C(x, z) → C(y, z)) From these three basic axioms, in both cases, the external connection, denoted by EC, the tangential part, denoted by T P and the non tangential part, denoted by N T P are defined as follows: EC(x, y) =def C(x, y) ∧ ¬O(x, y) T P (x, y) =def P (x, y) ∧ ∃z (EC(z, x) ∧ EC(z, y)) N T P (x, y) =def P (x, y) ∧ ¬T P (x, y) The set of axioms P1 − P8 and C1 − C3 characterizes the mereotopology defined by [12] while the mereotopology defined by [2, 14, 46] stems from the axioms C1 , C2 , C4 . Within these mereotopologies the notions of non tangential proper part, denoted by N T P P , tangential proper part, denoted by T P P , and their inverses, denoted by N T P P I or N T P P −1 , T P P I or T P P −1 respectively, disconnected denoted by DC and equality, denoted by EQ are defined. N T P P (x, y) =def P P (x, y) ∧ ¬∃z (EC(z, x) ∧ EC(z, y)) T P P (x, y) =def P P (x, y) ∧ ∃z (EC(z, x) ∧ EC(z, y)) N T P P −1(x, y) =def N T P P (y, x) T P P −1(x, y) =def T P P (y, x) DC(x, y) =def ¬C(x, y) EQ(x, y) =def P (x, y) ∧ P (y, x) The eight possible basic relations between regions DC, EC, P O, N T T P , T P P , N T P P −1, T P P −1 and the equality, denoted by EQ are illustrated in a 2D space in Figure 1. They are known as RCC relations, where RCC stands for Region Connection Calculus. See [22] for an overview on connection (or contact) relation algebras, and [61] for its extension to Boolean combinations of regions. The mereologic operations sum, intersection and complement, can be reformulated using the connection relation, as primitive relation, in mereologies stemming from the axioms C1 , C2 , C4 [34]. The axioms P5 − P8 can also be rephrased as follows: C5 : (∀x ∀y ∃z ∀w(C(w, z) ↔ (C(w, x) ∨ C(w, y)) (sum : +) C6 : O(x, y) → ∃z ∀w(C(w, z) ↔ ∃v(P (v, x) ∧ P (v, y) ∧ C(v, w)) (intersection : .) C7 : ∀x ∃z ¬C(x, z) → ∃u ∀w(C(w, u) ↔ ∃v (¬C(v, x) ∧ C(v, w)) (complement : −) C8 : ∃z ∀xC(x, z) (universe)3 3
In the following, a will denote the universe, i.e., the entity such that ∀x, C(x, a).
An Exploratory Survey of Logic-Based Formalisms
DC
EC
PO
139
TPP
TPP −1
NTPP
NTPP −1
EQ
Fig. 1 The eight basic relations between regions in mereotopologies
The mereotopologies defined in [2, 14] consist of the axioms C1 , C2 , C4 , C5 − C8 ; one of the consequences is the theorem C1 , C2 , C4 , C7
∀x ¬C(x, −x) while the mereotopology defined in [46] consists of the axioms C1 , C2 , C4 , C5 , C6 , C8 and ∀x C(x, −x) holds. The characterization of a topological theory can be reformulated within the framework of mereotopologies. More formally, OP (x) denotes the fact that x is an open set and CL(x) denotes the fact that x is a closed set. C9 : (OP (y) ∧ OP (x)) → (z = x.y → OP (z)) C10 : ∀x (φ(x) → OP (x)) → OP (σxφ(x)) The axioms C9 expresses that the intersection of two open sets is an open set and the axioms C10 expresses that the union of open sets is an open set. Alternative axioms could be stated for the closed sets, defining the open sets with OP (x) =def CL(−x), the finite sum of closed set is a closed set and the intersection of open sets is an open set. These notions can be defined from the already introduced relations, with the following definitions, where i(x) denotes the interior of x and c(x) denotes the closure of x: OP (x) =def i(x) with i(x) =def σz(N T P (z, x)) (interior of x) CL(x) =def c(x) with c(x) =def −i(−x) The notion of boundary can be defined by b(x) =def −(i(x) + i(−x)), however in the theories [2, 14, 46] the following axiom states the existence of an interior for each element, which avoids the notion of boundary.
140
F.D. de Saint-Cyr, O. Papini, and H. Prade
C11 :∀x ∃yN T P (y, x) Vieu and Asher [2] state that any region posesses a unique and non tangential part which is its interior: C11 :∀x ∃y ∀u(C(u, y) → ∃v(N T P (v, x) ∧ C(v, u)) They add the axiom C9 and a closure condition of the universe: C12 : c(a) = a This set of axioms allows us to recover the other properties of the topological operators. In Smith and Varzi 1997 [51] the topology and the links between an entity and its boundaries are axiomatized, directly using the notion of boundary with the predicate B(x, y) that expresses that x is a part of the boundary of y. In this approach C11 is avoided and their theory stems from a mereology consisting of P1 − P4 , P8 . b(x) =def σz(B(z, x)) b(x) = b(−x) b(b(x)) = b(x) b(x.y) + b(x + y) = b(x) + b(y) In this approach the closure is defined as the sum of an entity and its boundary, and the connection relation is defined as follows: c(x) =def x + b(x) C(x, y) =def O(c(x), y) ∨ O(c(y), x) Another property concerning the mereotopologies is the atomicity of the regions, i.e., the fact that a region cannot split in regions. More formally At(x) =def ¬∃y P P (y, x) Most of the mereotopologies select one the two following options: ∀x ∃y (P (y, x) ∧ At(y)) (atomicity) ∀x ∃y P P (y, x) (non − atomicity) For example, RCC [45, 46] chooses the non-atomicity in order to preserve the consistency of the theory. Mereotopologies for points Spatial regions can be considered as sets of spatial points. However another point of view for spatial representation is to define points, as entities, from
An Exploratory Survey of Logic-Based Formalisms
141
primitive spatial regions. More precisely, a point is viewed as the set of regions that contain it. The links between a theory stemming from regions and a structure of points defined from these regions is due to Clarke [13]. However, the first axiomatization providing a spatial representation based on points equivalent to the one stemming from regions is due to Asher and Vieu [59]. They distinguish two kinds of points, the ones introduced by an external connection, called border points and the ones introduced by an overlap relation, called interior points. The formal definitions are the following [34]: α is an interior point4 , denoted by IP (α) iff it satisfies the following conditions (where x and y are regions): 1) 2) 3) 4)
∀x ∀y ((x ∈ α ∧ y ∈ α) → (O(x, y) ∧ x.y ∈ α)) ∀x ∀y ((x ∈ α ∧ P (x, y)) → y ∈ α) α = ∅ α is maximum (with respect to set inclusion) according to the previous conditions
α is a border point, denoted by BP (α) iff it satisfies the following conditions (where x, y, z and t are regions): 1) ∃x ∃y (x ∈ α ∧ y ∈ α ∧ EC(x, y)) 2) ∀x ∀y ((x ∈ α ∧ y ∈ α) → ((O(x, y) ∧ x.y ∈ α) ∨ ∃t ∃z (z ∈ α ∧ t ∈ α ∧ P (z, x) ∧ P (t, y) ∧ EC(z, t))) 3) ∀x ∀y ((x ∈ α ∧ P (x, y)) → y ∈ α) 4) α is maximum according to the previous conditions According to these definitions, it is possible to construct a correspondence between a region and the set of points belonging to it. The properties that hold for a topology based on the notion of regions can also hold for a topology induced from the basic notion of points.
2.2 Modal Logic for Topology Several approaches have been proposed for representing topological relations in modal logic. We briefly present the topological interpretation of modal logic and the RCC encoding in modal logic. 2.2.1
Topological Interpretations of Modal Logic
Several semantics have been proposed for modal logic, among them topological semantics is one of the oldest modal semantics for the S4 modal logic. The language of S4 is composed of a countable set of propositional variables, the constants and ⊥, the connectives ¬, ∧, ∨, →, and the modal operators and ♦. The axiomatic system of S4 consists of the following axioms: 4
α is a set of regions.
142
A1 A2 A3 K T 4
F.D. de Saint-Cyr, O. Papini, and H. Prade
: (α → (β → α)) : ((α → (β → γ)) → ((α → β) → (α → γ))) : ((¬ α → ¬ β) → (β → α)) : ((α → β) → (α → β)), :α→α : α → α (positive introspection).
The topological interpretation of modal logic S4 proposed by Tarski [29] stems from topological spaces. A topological space is a pair < X , O > where O is a family of subsets of X containing the empty set and X is closed under finite intersections and arbitrary unions. A model over a topological space < X , O > is a pair M =<< X , O >, v > where < X , O > is equipped with a valuation function v from the set of propositional variables to 2X ; M is called a topo-model. Informally, the formula represents the universe, the formula ⊥ represents the empty region, the formula ¬φ represents the complement of the region φ, the formula φ ∧ ψ represents the intersection of the regions φ and ψ, the formula φ ∨ ψ represents the union of the regions φ and ψ, φ represents the interior of the region φ and ♦φ represents the closure of the region φ. The formula φ expresses that φ is true at point x iff it is true at any open neighborhood of x. More formally: Let p be a proposition, • • • • • •
M, x |= p iff x ∈ v(p). M, x |= ¬φ iff M, x |= φ. M, x |= φ → ψ iff M, x |= φ or M, x |= ψ. M, x |= φ ∧ ψ iff M, x |= φ and M, x |= ψ. M, x |= φ iff ∃o ∈ O s.t. x ∈ o and ∀y ∈ o : M, y |= φ. M, x |= ♦ψ iff ∀o ∈ O s.t. x ∈ o, ∃y ∈ o : M, y |= φ.
The topological interpretation of S4 is complete and decidable. Each formula of S4 denotes a region of the topological space being modeled. The topological interpretation of S4 provides a local vision of the world and allows for spatial reasoning on regions. A more global vision of the world is provided by adding the modal operator U which expresses the accessibility to any point and its dual modal operator E, without losing decidability [1]. More formally : • M, x |= Eφ • M, x |= U φ
iff iff
∃y ∈ X : M, y |= φ. ∀y ∈ X : M, y |= φ.
The new language is a topological interpretation of S5 which is composed of the axioms of S4 and the negative introspection axiom, denoted by 5: ♦ φ → ♦ φ. The topological interpretation of S5 is complete and decidable.
An Exploratory Survey of Logic-Based Formalisms
2.2.2
143
Encoding RCC in Modal Logic
A topological space can also be described [6] by a structure < U, i > where U denotes the universe and i is an interior operator satisfying the following axioms, where X and Y are subsets of U: 1) 2) 3) 4)
i(X) ⊆ X i(i(X)) = i(X) i(U) = U i(X ∩ Y ) = i(X) ∩ i(Y )
A region is represented by an open set of points. Two regions partially overlap if they share at least one point and two regions are connected if their closure shares at least one point. The closure of a region is defined by c(X) =def i(X)5 . In order to translate the RCC relations into the S4 modal logic [6], a new modality, denoted by I, corresponding to the interior operator i, has been defined and leads to the following axioms: 1) 2) 3) 4)
I I I I
X →X IX↔IX ↔ (where is a tautology) (X ∧ Y ) ↔ I X∧ I Y
They correspond exactly to the following axioms of S4 modal logic: T: X → X 4: X → X N: R: (X ∧ Y ) ↔ X ∧ Y Indeed, the axioms T and 4 correspond to required properties of the interior operator. The encoding of the RCC relations in S4 modal logic is provided in terms of positive and negative interpretations, called model constraints and entailment constraints respectively illustrated in Table 1. The model constraints come from the corresponding set-term denoting the equality of the entire universe and the entailment constraints follow from the corresponding set-term which does not denote the entire universe. For example, X is a tangential proper part of Y , the model constraint comes from X ∪ Y = U which is translated in X → Y , and the entailment constraints come from Y ∪ X = U, X∪ IY = U, X = U and Y = U which are translated in Y → X, X → I Y , ¬X, ¬Y respectively. Bennett and Cohn [8] then proposed an encoding of the RCC relations using both the modal logic S4 and the modal logic S5 . This allows them to avoid the inference constraints. The usual operators, and ♦, of S5 are used, X means that X is valid for each spatial point and ♦X means that there 5
X denotes the complement of X.
144
F.D. de Saint-Cyr, O. Papini, and H. Prade
Table 1 Encoding of the RCC8 relations encoding in modal logic S4 relations
model constraints entailment contraints
DC(X, Y )
¬(X ∧ Y )
¬X, ¬Y
EC(X, Y )
¬(I X ∧ I Y )
¬X ∨ ¬Y, ¬X, ¬Y ¬(I X ∧ I Y ), X → Y, Y → X, ¬X, ¬Y
P O(X, Y )
T P P (X, Y )
X→Y
X → I Y, Y → X, ¬X, ¬Y
T P P −1(X, Y )
Y →X
Y → I X, X → Y, ¬X, ¬Y
N T P P (X, Y )
X→IY
Y → X, ¬X, ¬Y
N T P P −1(X, Y ) Y → I X
X → Y, ¬X, ¬Y
EQ(X, Y )
X↔Y
¬X, ¬Y
exists a spatial point for which X is valid. The modal operators of S4 are denoted by I and C for the interior and closure operator respectively. The key relations from which the eight RCC relations can be expressed are given in Table 2. Table 2 Encoding of the RCC8 relations in modal logics S4 and S5 relations
f ormulas of S4 , S5
C(X, Y )
♦(X ∧ Y )
DC(X, Y )
(¬X ∨ ¬Y )
O(X, Y )
♦( I X∧ I Y )
P (X, Y )
(X → Y )
¬P (X, Y )
♦(X ∧ ¬Y )
T P (X, Y )
(X → Y ) ∧ ♦(X∧ C ¬Y )
N T P (X, Y ) (X → I Y ) N E(X)
♦X
An Exploratory Survey of Logic-Based Formalisms
145
A set of RCC relations is satisfiable in a topological space iff the corresponding modal formulas are satisfiable; the complexity of this satisfiability problem is NP-hard [47]. See also [62].
2.3 Fuzzy and Rough Sets-Based Mereotopologies Schockaert [48], [49], in a recent PhD Thesis, has extensively studied the representation of fuzzy spatial information, and how to handle this information in reasoning. Starting with fuzzy closeness relations between points, he then extends fuzzy spatial relations to fuzzy regions. He also handles fuzzy directions. In particular, fuzzy RCC relations are defined, their composition, and their transitivity studied. For instance, using a fuzzy connection relation C as a primitive (maybe defined in practice from a distance or a pointwise closeness relation), the fuzzy relation part of P for instance is then defined between regions as P (x, y) = infz IT (C(x, z), C(y, z)), while the overlap relation O is obtained as O(x, y) = supz T (P (z, x), P (z, y)), where T and IT respectively denote a multiple valued conjunction (modeled by means of a triangular norm operation) and the multiple valued implication associated to T by residuation. Note that such definitions exactly parallel the ones of the non-fuzzy case. Rough sets have been used as another basis for defining an approximate mereology, called rough mereology [44], [42], [43], [15]. The idea of rough sets [41] starts from the idea that a set of objects can usually be partitioned into classes of indiscernible objects (for instance, because they are described in terms of a finite set of properties, and then several objects have the same description). An object may be, for instance, a region. Thus, indiscernible objects belong to the same equivalence class, in the sense of the equivalence relation associated with the partition. Given an arbitrary set X of objects, it can only be approximated in terms of such equivalence classes, from below by the set of classes included in X, and from above by the set of classes overlapping X. Then a given object x can be associated with a rough membership degree w. r. t. X computed as the relative cardinality of the intersection of X with the equivalence class of x with respect to the cardinality of the equivalence class of x. This degree can be interpreted as a uniform distribution-based probability that x belongs to X, or equivalently as a degree of inclusion of the equivalence class of x into X. In the same spirit, rough mereology is then associated to a part of (in the broad sense) relation then graded under the form of a relation expressing that x is part of y to degree at least r. The degree r, in practice, may be elaborated from the (weighted) number of properties that are common in the description of x and y. Proper choices of inclusion degrees
146
F.D. de Saint-Cyr, O. Papini, and H. Prade
between object descriptions leads to part of relations that may be max-T transitive fuzzy relations (for some triangular norm T ). Note that the mereological view is then no longer applied to the regions themselves, but to their description in terms of properties. The interest of rough sets has also been advocated in qualitative spatial reasoning for handling approximate views of regions and approximate part of relations in [23]. Note that fuzzy approaches have been proposed for evaluating spatial relationships of topological nature (set inclusion, adjacency, · · · ) as well as metrical relations [9], however these fuzzy relationships have been mainly used for assessing situations for decision making rather than genuine spatial inference purposes.
2.4 Mereogeometries The mereogeometric theories enrich the mereotopologies with morphologic and/or metric notions. They attempt at providing a representation setting based on primitive relations between cognitively meaningful entities. The first axiomatization of mereogeometries has tried to generalize the notions of alignment and distance between points to regions, since these notions are crucial in classic geometry. However, this generalization raises problems due to the heterogeneity of the shapes of regions and several theories have been proposed according to different primitives [33], [39], [55], [56], [17]. Among the approaches based on the notion of sphere, a first family, [7], [21] [53], uses as primitives the predicates part of and sphere, while a second one defines a sphere from the part of, connection, congruence primitives [10]. Mereotopologies have also been extended with the notion of orientation like in compass logic [57], [38]. This logic uses two pairs of linear temporal operators over a Cartesian product of linear orders and allows for the representation of cardinal directions. 2.4.1
Describing Shape in Terms of Spheres
Shape is an important characteristic that is difficult to represent in mereotopological theories, but extensions have been proposed. The first proposal stems from the structure of the region through its discontinuties [56]. However few statements can be made such as whether a region has holes, or interior voids or whether it is one piece or not. The other extension directly stems from the characterization of the shape of the region. They rely on the notion of sphere [21] and uses as primitives the predicates part of and sphere. More precisely, the shape of a region is described in terms of a set of spheres contained in it and organized in a graph structure. It enables for instance to determine the position of a sphere with respect to a complex shape. Note that here, the shape of a region is not viewed as a property associated to a region but is rather obtained in terms of a set of spheres of different sizes organized through spatial relations.
An Exploratory Survey of Logic-Based Formalisms
2.4.2
147
Modal Logic of Distance
Among metrical relations, only the notion of distance has been cast in a logical setting. Modal logics of distance have been proposed in [52] and [32] for a qualitative representation of distance. In those logics the notions of distance are expressed according to modalities. The formulas A≤a and A>a can be read as “everywhere in the circle of radius a” and “everywhere outside the circle of radius a” respectively and the formulas E ≤a and E >a can be read as “somewhere in the circle of radius a” and “somewhere outside the circle of radius a” respectively. These logics extend the propositional calculus with two lists of modalities: {A≤a , a ∈ M } and {A>a , a ∈ M }, and two lists of dual modalities {E ≤a , a ∈ M } and {E >a , a ∈ M }6 where M is a subset of IR+ . The semantics are defined in terms of distance spaces, denoted by W, d where W is a set of points and d is a function from W × W to IR+ satisfying the following properties: 1) d(x, y) = 0 iff x = y, 2) d(x, z) ≤ d(x, y) + d(y, z), 3) d(x, y) = d(y, x). A distance space satisfying the 3 above properties is a metric space and the most common function d is the Euclidean distance. However within the framework of spatial qualitative representation, the function d does not always verify the properties 2) and 3). The semantics of the modal logics of distance are defined in terms of Kripke’s “possible worlds”and M = (W, R, v) where W is the set of points from the distance space W, d, R is the accessibility relation and v is a valuation mapping the set of propositions P to 2W . The accessibility relation is interpreted in terms of distance (expressed by the function d of the distance space). Informally, the formula A≤a φ is true if φ is true at any position which is at a distance lower or equal to a from the current position. The formula E ≤a φ is true if φ is true at some position which is at a distance lower or equal to a from the current position. M, ω |= φ expresses that φ is true at the point ω for the model M and the truth value of the formulas is formally defined as follows. d(x, y) ≤ a is denoted by xRa y, a ∈ M and d(x, y) > a is denoted by xRa¯ y. Let M = (W, R, v) be a model: • • • • • 6
M, ω M, ω M, ω M, ω M, ω
|= p iff ω ∈ v(p), ∀p ∈ P. |= ¬φ iff M, ω |= φ. |= φ → ψ iff M, ω |= φ or M, ω |= ψ. |= φ ∧ ψ iff M, ω |= φ and M, ω |= ψ. |= A≤a φ iff ∀ω ∈ W such that ωRa ω ,
M, ω |= φ.
The two following equalities hold E ≤a ϕ = ¬A≤a ¬ϕ and E >a ϕ = ¬A>a ¬ϕ.
148
• M, ω |= A>a φ • M, ω |= E ≤a φ • M, ω |= E >a φ
F.D. de Saint-Cyr, O. Papini, and H. Prade
iff iff iff
∀ω ∈ W such that ωRa¯ ω , ∃ω ∈ W such that ωRa ω , ∃ω ∈ W such that ωRa¯ ω ,
M, ω |= φ. M, ω |= φ. M, ω |= φ.
New modalities can be defined, A≤a φ ∧ A>a φ for the universal modality, E φ∨E >a φ for the existential modality and E >0 for the difference modality. According to whether the function d satisfies the properties 1), 2) and/or 3), different modal logics for distance have been defined. Let M be a subset of IR+ , MS(M ) denotes the logic of the standard distance spaces, i.e., the distance spaces for which d satisfies the property 1). In this case Ra and Ra¯ satisfy the following properties: ≤a
R1 ) Ra ∪ Ra¯ = W × W, R2 ) Ra ∩ Ra¯ = ∅, R3 ) if xRa y and a ≤ b then xRb y, R3 b) if xRa¯ y and a ≥ b then xR¯b y, R4 ) ∀x, y ∈ W, xR0 y iff x = y, R4 b) ∀x, y, z ∈ W, if xRa y and xRa¯ z then yR¯0 z. MS s (M ) denotes the logic of distance spaces for which d satisfies the properties 1) and 3). In this case Ra and Ra¯ satisfy the additional properties for symmetry: R5 ) xRa y iff yRa x, R5 b) xRa¯ y iff yRa¯ x. MS t (M ) denotes the logic of distance spaces for which d satisfies the properties 1) and 2). In this case Ra and Ra¯ satisfy the additional properties for triangular inequality: R6 ) if xRa y and yRb z then xRa+b z, R7 ) if xRa y and xRa+b z then yR¯b z. MS m (M ) denotes the logic of distance spaces for which d satisfies the properties 1), 2) and 3). In this case Ra and Ra¯ satisfy the properties R1 ), R2 ), R3 ), R3 b), R4 ), R4 b), R5 ), R5 b), R6 ), R7 ). The axiomatic system for the standard distance space, denoted by M S(M ), [31] consists of the following axioms: A1 : (α → (β → α)), A2 : ((α → (β → γ)) → ((α → β) → (α → γ))), A3 : ((¬ α → ¬ β) → (β → α)), ≤a KA (φ → ψ) → (A≤a φ → A≤a ψ) (a, b ∈ M, a ≥ b), ≤ : A >a KA> : A (φ → ψ) → (A>a φ → A>a ψ) (a, b ∈ M, a ≤ b), TA≤0 : A≤0 φ → φ, TAc ≤0 : φ → A≤0 φ, Dif f : E ≤a A>0 φ → A>a φ, U1 : 0 φ → a φ,
An Exploratory Survey of Logic-Based Formalisms
149
U2 : a φ → 0 φ, 4 : a φ → a a φ, B : φ → a ♦a φ. The inference rules are substitution, modus ponens, necessitation rules for Γ φ Γ φ >a A≤a (N1 rule) : Γ A (N2 rule) : Γ A >a φ , (a ∈ M ). ≤a φ and for A s The axiomatization of MS (M ) leads to the formal system M S s (M ) and requires the additional axioms: BA≤ : φ → A≤a E ≤a φ BA> : φ → A>a E >a φ
(a ∈ M ), (a ∈ M ),
The axiomatization of MS t (M ) requires the additional axioms: T r1 : A≤a+b φ → A≤a A≤b φ T r2 : E ≤a A>b φ → A>a+b
(a, b ∈ M ), (a, b ∈ M ),
Finally, the axiomatization of MS m (M ) requires the 4 additional axioms: BA≤ , BA> , Tr1, Tr2. The satisfiability problem of the modal logics of distance is decidable and is EXPTIME-complete. The “relative nearness” notion has been introduced by van Benthem [55] which specifies that y is nearer from x than z if the distance from y to x is lower than the distance from z to x. Van Benthem defined a modal logic for capturing this notion [55, 1]. The relative nearness notion stems from the distance between two positions. Such an attempt is to be related to the modal logics defined for qualitatively representing the notion of distance.
2.5 Modal Logic of Geometry Different modal logics have been proposed for a qualitative representation of incidence, collinearity, parallelism between points and lines in the geometric space. The modal logic of incidence introduced by Balbiani [4, 3] and Venema [58] distinguishes two kinds of spatial entities: the points and the lines. For each of them the modalities and ♦ are introduced and are denoted by P , ♦P for the points and d , ♦d for the lines, respectively. Let A be a point and a be a line, the formula (P a) expresses that “each point is incident with a”, the formula (♦P a) expresses that “there exists a point which is incident with a”, the formula d A expresses that “each line lies on A”, the formula ♦d A expresses that “there exists a line which lies on A”. Moreover Venema [58] and Balbiani [3] have proposed a modal logic of projective geometry stemming from the modal logic of incidence. In order to propose a modal logic of parallelism, Balbiani [5] introduced two new modalities and ♦ to capture the notion of parallelism between lines. Moreover he proposed a modal logic of affine geometry with modalities P , ♦P for the points, d , ♦d for the lines and the modalities for the parallelism, and ♦ , for representing the affine geometry. For more details see [5].
150
F.D. de Saint-Cyr, O. Papini, and H. Prade
The satisfiability problem in the modal logics of geometry is decidable [58] and the complexity is NEXPTIME-complete for the modal logic of incidence, NP-complete for the modal logic of parallelism,and NEXPTIME-hard for the modal logic of affine geometry.
3 Handling Properties Associated to Regions Clearly spatial relations between regions are not the only type of spatial information to be represented. Many pieces of information refer to a single region. In fact it is important to notice that different types of properties can be associated to a region. We can distinguish at least between global properties that apply to the region as a whole (e.g., the population of the region is greater than half a million of inhabitants, the maximal altitude of the region is 1500m) and properties that are “localized”. By localized, we mean that the property holds at least in some points (or subparts) of the region (e.g., the landcover of this area is made of woods). This distinction is justified by the fact that the two kinds of properties have not the same inheritance mechanisms (w.r.t. subparts). Global properties, which usually refer to measurement of some kind, require some numerical structure to be processed (e.g., [63]), and are not considered in the following. Localized properties may hold everywhere in a subregion or only somewhere. Moreover the association of a property to a region can be stated in an absolute or a relative way. In the second case, the localization is implicitely made with respect to the position of the agent that states the piece of information. This latter situation is usually encountered in modal logics based approaches that are surveyed in the next subsection. The first case explicitely uses named regions and is a topic of the second subsection.
3.1 Modal Logics of Localization Several modal logics have been proposed that focus on the localization of a property. In the following we survey three proposals: i) von Wright’s logic of “elsewhere” able to represent the fact that a property holds in every (or some) region accessible from where we are, ii) a logic of “nearness” semantically based on accessibility relation having characteristics distinct from the relation underlying elsewhere logic iii) Jeansoulin and Mathieu’s modal logic of spatial inference which aims at expressing that a property holds everywhere (or somewhere) inside the region where we are, or in the regions in contact with the region where we are. 3.1.1
The Logic of Elsewhere
von Wright [60] proposed thirty years ago, a modal logic of place linking modalities and spatial concepts. The formula A is supposed to be read
An Exploratory Survey of Logic-Based Formalisms
151
as “elsewhere A holds” i. e. everywhere else A holds, and the formula ♦A is read as “somewhere A holds”. This logic extends propositional logic with the modalities and ♦ and is called “logic of elsewhere”, denoted by E or L= . The axiomatics of the logic of elsewhere is an extension of modal logic system K (for more details see [50]) and consists of the following axioms: A1 : (α → (β → α)), A2 : ((α → (β → γ)) → ((α → β) → (α → γ))), A3 : ((¬ α → ¬ β) → (β → α)), K : ((α → β) → (α → β)), A : A ∧ A → A, S : A → ♦A. The proposed semantics for E is a relational semantics in a Kripke style [30]. A modal formula is evaluated within a “universe” of possible worlds that are linked by an accessibility relation. The triple M = (W, R, v) where W is the set of the interpretations of E, i. e. the set of possible worlds, R is an accessibility relation and v is a valuation, i.e., a mapping from the set of propositions P to the power set of W. Informally, the formula A is true at the current position if A is true at every other position. In a dual way, the formula ♦A is true at the current position if A is true at some other position. M, ω |= A denotes that A holds in the possible world ω for the model M and the truth value of the formulas are formally defined as follows : Let M = (W, R, v) be a model, • • • • • •
M, ω |= p iff ω ∈ v(p), ∀p ∈ P. M, ω |= ¬A iff M, ω |= A. M, ω |= A → B iff M, ω |= A or M, ω |= B. M, ω |= A ∧ B iff M, ω |= A and M, ω |= B. M, ω |= A iff ∀ω ∈ W\{ω}, M, ω |= A. M, ω |= ♦A iff ∃ω ∈ W\{ω}, M, ω |= A.
The accessibility relation between worlds is such that: ωRω iff ω = ω [50], this relation is symmetric and weakly transitive, i.e., ωRω and ω Rω then ωRω or ω = ω . The logic E is complete and decidable. The potential expressivity of this logic maybe more completely understood once we notice that the idea of “everywhere” can be captured by the formula A∧A (“everywhere” equals “here and elsewhere”). Indeed, two new modalities have been introduced, denoted by U and E respectively. The universal modality U A = A ∧ A expresses that A holds everywhere and existential modality EA = A ∨ ♦A expresses that A holds somewhere (including here).
152
3.1.2
F.D. de Saint-Cyr, O. Papini, and H. Prade
The Logic of Nearness
As the previous one, this logic extends the propositional logic with the modalites and ♦ and its axiomatics is an extension of system K (for more details see [50]) and consists of the following axioms : A1 : (α → (β → α)), A2 : ((α → (β → γ)) → ((α → β) → (α → γ))), A3 : ((¬ α → ¬ β) → (β → α)), K : ((α → β) → (α → β)), S : A → ♦A. As for the previous logic, the proposed semantics for this logic is also a relational semantics in a Kripke style [30]. Informally, the formula A is true at the current position if A is true at every nearby accessible position. In a dual way, the formula ♦A is true at the current position if A is true at some nearby accessible position. M, ω |= A denotes that A holds in the possible world ω for the model M. The truth value of the formulas are formally defined as follows: Let M = (W, R, v) be a model, • • • • • •
M, ω |= p iff ω ∈ v(p), ∀p ∈ P. M, ω |= ¬A iff M, ω |= A. M, ω |= A → B iff M, ω |= A ou M, ω |= B. M, ω |= A ∧ B iff M, ω |= A and M, ω |= B. M, ω |= A iff ∀ω ∈ W such that ωRω , M, ω |= A. M, ω |= ♦A iff ∃ω ∈ W\{ω} such that ωRω , M, ω |= A.
The relation R is reflexive and symmetric but is not transitive. The expressivity of this logic, has been considered as not very rich according to a geometrical point of view and its suitability to represent spatial concepts has been criticized. Indeed, Lemon and Pratt [35] gave the following criterium in order to determine the adequacy of a logic for space: “A logic is suitable for space if there exists a configuration of objects of the standard Cartesian space that satisfies the formulas of this logic”. For the logic of nearness, they gave an example that shows that this logic does not capture the notion of a formula of the logic of nearness defined by nnearness. Let usconsider i=j i=1 ♦♦((pi ∧ ¬( 1≤j≤n pj ))), saying that there are n locations near to nearby locations and in each location i, a formula pi is true and no other formula pj is true. This formula is consistent. However, when n ≥ 6, considering that “near” means “strictly less than distance d away”, there does not exist any configuration of points in the plane that can be an interpretation of the formula, as illustrated in Figure 2. The reason is that one would have to fit at least 6 non-overlapping circles of radius d around a circle of a radius strictly less than d. For more details see [35].
An Exploratory Survey of Logic-Based Formalisms
153
P3 P4
P2
P1
P5 P6
Fig. 2 Limit of the logic of nearness: example of configuration
3.1.3
Modal Logic of Spatial Inference
The logic of spatial inference has been proposed by Jeansoulin and Mathieu [28] for representing and reasoning about incomplete geographic information. This logic introduces two kinds of modalities for representing inclusion and contact. The formula i A can be read as “Everywhere inside, A holds” and the formula ♦i A can be read as “Somewhere inside, A holds”. The formula v A can be read as “Everywhere in contact, A holds” and the formula ♦v A can be read as “Somewhere in contact, A holds”. The extension of the propositional calculus with the modalities i , ♦i is called the logic of inclusion and the extension of the propositional calculus with the modalities v , ♦v is called the logic of contact. The axiomatics for the logic of inclusion is the one of S4 because the accessibility relation Ri is reflexive and transitive. The axioms of the logic of inclusion are: A1 : (α → (β → α)), A2 : ((α → (β → γ)) → ((α → β) → (α → γ))), A3 : ((¬ α → ¬ β) → (β → α)), K : (i (α → β) → (i α → i β)), T : i α → α 4 : i α → i i α. The axiomatics for the logic of contact is K because the accessibility relation Rv is not reflexive but is symmetric. The axioms of the logic of contact are:
154
F.D. de Saint-Cyr, O. Papini, and H. Prade
A1 : (α → (β → α)), A2 : ((α → (β → γ)) → ((α → β) → (α → γ))), A3 : : ((¬ α → ¬ β) → (β → α)) K : (v (α → β) → (v α → v β)), S : A → v ♦v A. The proposed semantics for this logic is a relational semantics in a Kripke style [30]. The triple M = (W, {Ri , Rv }, v) where W is the set of the spatial regions, i.e, open sets from IR × IR, the relations Ri and Rv are the accessibility relations corresponding to the notions of inclusion and contact respectively, and v the valuation, i.e., an application from the set of propositions P to 2W . Informally, the formula i A is true for a region r if A holds everywhere inside r, and the formula ♦i A is true for a region r if A is true somewhere inside r. Similarly, the formula v A is true for a region r if A is true for every other region in contact with r, and the formula ♦v A is true for a region r if A is true for some regions in contact with r. M, ω |= A expresses that A is true for the region ω for the model M the truth value of formulas is formally defined as follows: Let M = (W, {Ri , Rv }, v) be a model, • • • • • • • •
M, ω M, ω M, ω M, ω M, ω M, ω M, ω M, ω
|= p iff ω ∈ v(p), ∀p ∈ P. |= ¬A iff M, ω |= A. |= A → B iff M, ω |= A or M, ω |= B. |= A ∧ B iff M, ω |= A and M, ω |= B. |= i A iff ∀ω ∈ W such that ωRi ω , M, ω |= A. |= ♦i A iff ∃ω ∈ W such that ωRi ω , M, ω |= A. |= v A iff ∀ω ∈ W such that ωRv ω , M, ω |= A. |= ♦v A iff ∃ω ∈ W such that ωRv ω , M, ω |= A.
The accessibility relation Ri is reflexive, antisymmetric and transitive. The accessibility relation Rv is irreflexive, symmetric, transitive, reproductive7 and is weakly dense8 . The logic of inclusion and the logic of contact correspond to modal logic S4 and K respectively, they are complete and decidable. The expressivity of these logics has been criticized by Lemon and Pratt [35]. According to their adequacy criterium, i.e., the existence of a configuration of objects of the standard cartesian space that satisfy the formulas of the logic for space, they show that there are formulas of S4 that are consistent but their interpretation leads to a cyclic graph where the vertices are labelled by regions and the edges are labelled by inclusion relations (for more details see [35]). 7 8
The relation R is reproductive iff ∀x ∃y(xRy). The relation R is weakly dense if ∀x ∀y(xRy → ∃z(xRz ∧ zRy).
An Exploratory Survey of Logic-Based Formalisms
155
3.2 Handling Spatial Information by Means of Attributive Formulas In [25], Dupin de Saint-Cyr and Prade have argued that the core of spatial information is attributive in nature. Indeed, in geographical information systems, pieces of spatial information generally express a link between the description of a parcel and some property. This constrasts with the modal logics considered above where there was no explicit reference to named places where a property holds. Attributive logic is based on two vocabularies, one for the properties and one for the regions (or parcels). Attributive formulas are then viewed as pairs (property, region), and this writing can be regarded as a kind of spatial reification, as often done with time. This also allows us to associate each of the two vocabularies with some ontology-like relations between their formulas. The use of a distinct vocabulary for regions also agrees with a need for a specific handling of their geometric / mereo-topological relations. Moreover attributive formulas provide a convenient format for dealing with uncertain properties attached to regions. Besides, the attribution of a property may hold everywhere, or only somewhere, in a set of parcels. We now survey these different points. 3.2.1
Basic Attributive Formulas
Attributive formulas use two propositional logical languages: Li (i stands for “information”) based on a vocabulary Vi designed for describing properties and Ls (s stands for “spatial”) based on a vocabulary Vs designed for describing regions or parcels. Then the representational language is built on ordered pairs of formulas of Li × Ls , here denoted (ϕ, p). Such formulas should be understood as formulas of Li reified by association with a set of parcels described by a formula of Ls . In other words, to each formula is attached a set of parcels, where this formula applies. More precisely, (ϕ, p) expresses that ϕ is true for each parcel satisfying p. Alternatively, (ϕ, p) can be seen as equivalent to the material implication ¬p ∨ ϕ in the language based on the union of the two vocabularies Vi and Vs . In a first order logic language view, this may also be understood as ∀x, p(x) → ϕ(x); here p(x) means that the parcel x satisfies p, equating formula p with the union of elementary parcels x0 satisfying p. More formally, An attributive formula f , denoted by a pair (ϕ, p), is a formula of the propositional language based on the vocabulary Vi ∪ Vs , where p contains only variables of the vocabulary Vs (p ∈ Ls ) and ϕ contains only variables of the vocabulary Vi (ϕ ∈ Li ). At the semantic level, f holds if and only if ¬p ∨ ϕ holds. The intuitive meaning of f = (ϕ, p) is that for the set of elementary parcels that satisfy p, the formula ϕ is true. Note that any propositional formula involving the two vocabularies can be rewritten in conjunctive normal form and then each clause of this formula can be expressed by an attributive formula.
156
F.D. de Saint-Cyr, O. Papini, and H. Prade
However, the rewriting of the whole propositional formula may require more than one attributive formula. From the above definition of (ϕ, p) as being equivalent to ¬p ∨ ϕ, several inference rules straightforwardly follow from classical logic: 1. (¬ϕ ∨ ϕ , p), (ϕ ∨ ϕ , p ) (ϕ ∨ ϕ , p ∧ p ) 2. (ϕ, p), (ϕ , p) (ϕ ∧ ϕ , p) 3. (ϕ, p), (ϕ, p ) (ϕ, p ∨ p ) 4. if p p then (ϕ, p) (ϕ, p ) 5. if ϕ ϕ then (ϕ, p) (ϕ , p) Note that due to 5, the converse of 2 holds: (ϕ ∧ ϕ , p) (ϕ, p), (ϕ , p). Note also that due to 5 and 3 (ϕ, p), (ψ, p ) (ϕ ∨ ψ, p ∨ p ) holds as well. Due to 2 and 4, we have (ϕ, p), (ψ, p ) (ϕ ∧ ψ, p ∧ p ). Example: Let us consider the two facts (Cereals, p1 ) and (Orchards, p1 ∨p2 ) where p1 and p2 are two elementary parcels and Cereals and Orchards are two literals of Vi . Note that (Orchards, p1 ∨ p2 ) means that Orchards is true for p1 and for p2 , at least. In particular, the second formula entails (Orchards, p1 ) and thus we have (Cereals ∧ Orchards, p1 ). Assume now that we also have the formula (¬Orchards ∨ ¬Cereals, ). This formula means that for any parcel, Orchards and Cereals are mutually exclusive (due to some practical knowledge stating that there is no parcel containing both cereals and orchards). Then one can now deduce that (⊥, p1 ) which expresses inconsistency about the information pertaining to p1 . Note that the inconsistency in the above example does not affect the information (Orchards, p2 ) pertaining to p2 . The reification allows us to keep inconsistency local. Besides, we can observe in the above example that (Orchards, p1 ∨ p2 ) does not mean that there are orchards in p1 or in p2 . This latter piece of information would be expressed by the disjunction (Orchards, p1 ) ∨ (Orchards, p2 ). Similarly, (¬Orchards, p1 ∨ p2 ) means that neither in p1 nor in p2 there are orchards, while ¬(Orchards, p1 ∨ p2 ) means that at least one parcel of {p1 , p2 } has no orchards. 3.2.2
Uncertain Attributive Formulas
The attributive formula-based language has been extended in a possibilistic logic manner, by allowing uncertainty on properties. Let us recall that a standard propositional possibilistic formula [19] is a pair made of a logical proposition (which can be only true or false), associated with a certainty level. More precisely, the semantic counterpart of a possibilistic formula (ϕ, α) is a constraint N (ϕ) ≥ α that expresses available knowledge by stating that α is a lower bound of the necessity measure N [20] of logical formula ϕ. Possibilistic logic has been proved to be sound and complete with respect to a semantics expressed in terms of the greatest possibility distribution π underlying N
An Exploratory Survey of Logic-Based Formalisms
157
(N (ϕ) = 1 − supω|=¬ϕ π(ω)). This distribution rank-orders interpretations according to their plausibility [19]. Note that a possibilistic formula (ϕ, α) can be viewed at the meta level as being only true or false, since either N (ϕ) ≥ α or N (ϕ) < α. Thus, a possibilistic formula is introduced instead of a propositional formula inside an attributive pair, and led to the following definition. An uncertain attributive formula is a pair ((ϕ, α), p) meaning that for the set of elementary parcels that satisfy p, the formula ϕ is certain at least at level α. The inference rules of possibilistic logic [19] straightforwardly extend into the following rules for reasoning with uncertain attributive formulas: 1. ((¬ϕ ∨ ϕ , α), p), ((ϕ ∨ ϕ , β), p ) ((ϕ ∨ ϕ , min(α, β)), p ∧ p ) 2. ((ϕ, α), p), ((ϕ , β), p) ((ϕ ∧ ϕ , min(α, β)), p) 3.A. ((ϕ, α), p), ((ϕ, β), p ) ((ϕ, min(α, β)), p ∨ p ) 3.B. ((ϕ, α), p), ((ϕ, β), p ) ((ϕ, max(α, β)), p ∧ p ) 4. if p p then ((ϕ, α), p ) ((ϕ, α), p) 5. if ϕ ϕ then ((ϕ, α), p) ((ϕ , α), p) The fourth and the third rules (3.B. and 3.A.) correspond respectively to the fact that either i) we locate ourselves in the parcels that satisfy both p and p and then the certainty level of the formula ϕ can reach the maximal upper bound of the certainty levels known in p or in p , or that ii) we consider any parcel in the union of the models of p and p and then the certainty level is only guaranteed to be greater than the minimum of α and β. Note that this representation formalism is able, in particular, to express a greater uncertainty about a rather specific label for a parcel than about a more general label as in the following example. Example: In order to express that parcel p1 has either orchards or ornamental trees and more plausibly orchards, we can use the two uncertain attributive formulas with p1 : ((Orchards, α1 ), p1 ) and ((Ornamental trees ∨ Orchards, α2 ), p1 ) where α1 ≤ α2 . At the semantic level, this is represented by the possibility distribution associated with p1 : ⎧ if ω |= Orchards, ⎨1 π(ω) = 1 − α1 < 1 if ω |= Ornamental trees ∧ ¬Orchards, ⎩ 1 − α2 otherwise. 3.2.3
Localization of Attributive Knowledge
Attributive information may have two different intended meanings, namely when stating (ϕ, p) one may want to express: • that everywhere in each parcel satisfying p, ϕ holds as true. For instance, in such a case, (V egetation, p) cannot be consistent with (Lakes, p) if “Vegetation” and “Lakes” are considered mutually exclusive.
158
F.D. de Saint-Cyr, O. Papini, and H. Prade
• that somewhere in each parcel satisfying p, ϕ holds as true. In that case, the two previous formulas should not be considered as inconsistent, since in each parcel of p there may exist different regions covered by “Vegetation” and “Lakes” respectively. Note that the two above situations should not be confused with still another case where two distinct mutually exclusive labels such as “Vegetation” and “Lakes” might be attached to the same region because they are intimately mixed in this region, as in a “swamp”. More formally, viewing an elementary parcel p as a collection of more elementary objects o, when we assert (ϕ, p, somewhere), it really means that ∃o ∈ p, ϕ(o). If the parcel p is not elementary then the formula (ϕ, p, somewhere) has to be understood as for all p such that p p, (ϕ, p , somewhere) holds. Thus, it should be clear that the above inference rules, which hold for the “everywhere” understanding, no longer necessarily hold in the “somewhere” reading. Indeed, the inference rule 2 (ϕ, p), (ψ, p)
(ϕ ∧ ψ, p) is no longer compatible with this reading of attributive formulas since ∃o ∈ p, ϕ(o) and ∃o ∈ p, ψ(o ) does not entail ∃o ∈ p, ϕ(o ) ∧ ψ(o ). More generally, here are the counterparts of previous inference rules that hold for the “somewhere” reading: 1’. (¬ϕ ∨ ϕ , p ∧ p , e), (ϕ ∨ ϕ , p , s) (ϕ ∨ ϕ , p ∧ p , s) 2’. (ϕ, p, s), (ϕ , p, e) (ϕ ∧ ϕ , p, s) 3’. (ϕ, p, s), (ϕ, p , s) (ϕ, p ∨ p , s) 4’. if p p then (ϕ, p, s) (ϕ, p , s) 5’. if ϕ ϕ then (ϕ, p, s) (ϕ , p, s) where (ϕ, p, s) stands for all p such that p p ∃o ∈ p , ϕ(o), and (ϕ, p, e) for ∀o ∈ p, ϕ(o). Moreover, we have the following relation between “somewhere” and “everywhere” formulas: 6’. ¬(ϕ, p,s) ≡ (¬ϕ, p,e) Note that when two properties ϕ and ψ are mutually exclusive it does not prevent from having (ϕ, p, s) ∧ (ψ, p, s), since the latter means ∃o ∈ p, ϕ(o) ∧ ∃o ∈ p, ψ(o ), and, in general, p may contain at least two distinct objects o and o . The above treatment of “somewhere” and “everywhere” is somewhat reminiscent of the Jeansoulin and Mathieu’ logic of spatial inference [28]. In this latter logic regions only appear in the semantics, and not in the language as above. Indeed, the attributive formulas enables us to explicitly refer to regions. Since the relations between regions are stated by propositions, this only allows us to express intersection, union, and inclusion of regions, thus providing an expressivity somewhat similar to the one of the “logic of inclusion”. One might even think of handling contact, or nearness, by using special predicates, possibly graded, which only apply to the spatial vocabulary, by dealing with extended attributive formulas of the form, e.g., ¬near(p, p ) ∨ ¬(ϕ, p) ∨ (ϕ, p ), or even as a possibilistic attributive formula of the form ¬(ϕ, p)∨((ϕ, nearness(p, p )), p ), where ‘nearness’ returns a degree.
An Exploratory Survey of Logic-Based Formalisms
159
Generally speaking, attributive formulas are suitable for expressing properties pertaining to definite regions of the space. However as said above, it is also possible to represent topological and geometrical properties about parcels in this formalism. Indeed, this kind of property should be encoded by formulas of the spatial vocabulary language Ls . A formula p ∈ Ls can be encoded by a particular attributive formula of the form (⊥, ¬p) meaning that “parcels that do not respect the topological formula p are not consistent”. Hence, formulas about topology and geometry can be handled conjointly with attributive information. Besides, it is possible to express ontological relation between, e.g., two properties ϕ and ψ, by means of particular attributive formulas of the form (¬ϕ ∨ ψ, ). Attributive formulas have been proved useful for handling spatial information fusion problems [25], [24].
4 Concluding Discussion Many logical formalisms have been proposed for addressing different reasoning tasks with spatial information. What is still missing is the design of tractable fragments of representation languages that enable us to mix spatial relation reasoning and attributive information. For instance, knowing that regions p and p overlap, and that ϕ is true everywhere in p, one should be able to conclude that ϕ is true (at least)somewhere in p . Such representation languages should also handle uncertainty about spatial relations, as well as for properties held as true for some regions. Indeed, as advocated by Bennett in Chapter 2, uncertainty and vagueness are crucial features for spatial information. Most of the logic-based formalisms for spatial information surveyed in the paper either do not deal with uncertainty, or address it implicitely only through the notions of somewhere or nearness, for example. Recently, some approaches directly handle uncertainty in terms of fuzzy mereologies [48], or attribute formulas [25] which seems to open a new investigation direction consisting of directly addressing uncertainty within logic-based formalisms. Uncertainty could apply on various spatial concepts: distance, nearness, localization as well as on regions or existence of regions. Uncertain attributive formulas allow for uncertainty on properties, but do not deal with uncertainty on regions. Fuzzy RCC deals with uncertain spatial relations between regions, but does not address spatial relations between fuzzy regions. Moreover, how to handle uncertainty about distances between regions, or distances between fuzzy regions? What are the difficulties to overcome when combining uncertainty and logic-based formalism for spatial information? To which extent dealing with spatial uncertainty requires logic-based formalisms? Another important concern are knowledge representation capabilities in terms of data analysis and data mining. This concern is not proper to spatial information handling, but is however clearly of particular interest in this
160
F.D. de Saint-Cyr, O. Papini, and H. Prade
setting. Modal decision logic languages [54], whose models are collections of data tables consisting of a finite set of objects (e.g. regions) described by a finite set of properties, serve such a purpose. These languages have often close connections with rough sets, as it was already the case for DAL logic, a pioneering logic in data analysis proposed by Farinas and Orlowska [26], which was directly inspired from Orlowska and Pawlak’s work [40]. In this later approach, the intersection and the transitive closure of pairs of indiscernibility relations R and S (which are still equivalence relations obtained from the equivalence classes of R and S by making their intersections and their unions respectively) can be processed in the logic. More generally, see Demri and Orlowska’ monograph [16] for an overview of formal methods for data analysis and inference inspired by the concept of rough set. In practice, for instance, from pieces of data stating on the one hand that regions p1 , · · · , pn are covered with oaks for some of them, and by fir-trees for the others, and that these regions are in contact with regions that are lakes, one should be able to conclude that regions in contact with lakes are woody. Besides, formal concept analysis [27] (which enables the conceptual identification of regions in terms of a set of characteristic properties, when possible), has been recently related to rough sets concerns and possibility theory by Dubois, Dupin de Saint-Cyr and Prade [18], which may also be of interest in a data analysis perspective. Lastly, let us also mention spatial information fusion and the handling of temporal spatial information as problems raising new representation and reasoning issues. See [24] for an approach and discussions about the fusion of uncertain spatial information provided by different sources that use different ways of partitioning the same space, and different vocabularies for expressing properties associated with regions. Acknowledgments.This work was funded by the Conseils R´egionaux of MidiPyr´en´ees and of Provence-Alpes-Cˆ ote d’Azur in the framework of an Inter-Regional Action Project no 05013992 “GEOFUSE: Fusion d’informations g´eographiques incertaines”. The authors thank Robert Jeansoulin for fruitful discussions on spatial information representation.
References 1. Aiello, M., van Benthem, J.: A modal walk through space. Journal of Applied Non Classical Logic 12(3-4), 319–364 (2002) 2. Asher, N., Vieu, L.: Toward a Geometry of Common Sense: A Semantics and a Complete Axiomatization of Mereotopology. In: International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, pp. 846–852. Morgan Kaufmann Publishers, San Francisco (1995) 3. Balbiani, P.: The modal multilogic of geometry (1998) (manuscript ) 4. Balbiani, P., Fari˜ nas del Cerro, L., Tinchev, T., Vakarelov, D.: Modal logics for incidence geometries. Journal of Logic and Computation 7(1), 59–78 (1997)
An Exploratory Survey of Logic-Based Formalisms
161
5. Balbiani, P., Goranko, V.: Logics for parallelism, orthogonality, and affine geometries. Journal of Applied Non Classical Logic 12(3-4), 365–398 (2002) 6. Bennett, B.: Modal logics for qualitative spatial reasoning. Bulletin of the Interest Group in Pure and Applied Logic 3(7), 1–22 (1995) 7. Bennett, B.: A categorical axiomatisation of region-based geometry. Fundamenta Informaticae 36(1-2), 145–158 (2001) 8. Bennett, B., Cohn, A.: Consistency of topological relations in the presence of convexity constraints. In: Proceedings of the ‘Hot Topics in Spatio Temporal Reasoning’ workshop, IJCAI 1999, Stockholm (1999) 9. Bloch, I.: Fuzzy spatial relationships for image processing and interpretation: a review. Image Vision Comput. 23(2), 89–110 (2005) 10. Borgo, S., Guarino, N., Masolo, C.: A pointless theory of space based on strong connection and congruence. In: Carlucci Aiello, L., Doyle, J. (eds.) Principles of Knowledge Representation and Reasoning: Proc. 5th Intl. Conf (KR 1996), pp. 220–229. Morgan Kaufman, San Francisco (1996) 11. Bunge, M.: On null individuals. The Journal of Philosophy 63(24), 776–778 (1966) 12. Casati, R., Varzi, A.: Holes and Other Superficialities. MIT Press, Cambridge (1994) 13. Clarke, B.L.: Individuals and points. Notre Dame J. of Formal Logic 26, 61–75 (1985) 14. Clarke, B.L.: A calculus of individuals based on connection. Notre Dame J. of Formal Logic 22, 204–218 (1981) 15. Cungen, C., Yuefei, S., Zaiyue, Z.: Rough Mereology in Knowledge Representation. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) RSFDGrC 2003. LNCS (LNAI), vol. 2639. Springer, Heidelberg (2003) 16. Demri, S., Orlowska, E.: Incomplete Information: Structure, Inference, Complexity. Springer, New York (2002) 17. Donnelly, M.: An Axiomatic Theory of Common-Sense Geometry. The University of Texas at Austin (2001) 18. Dubois, D., Dupin de Saint-Cyr, F., Prade, H.: A possibility-theoretic view of formal concept analysis. Fundamenta Informaticae 75, 195–213 (2007) 19. Dubois, D., Lang, J., Prade, H.: Possibilistic logic. In: Gabbay, D.M., Hogger, C.J., Robinson, J.A. (eds.) Handbook of logic in Artificial Intelligence and logic programming, vol. 3, pp. 439–513. Clarendon Press, Oxford (1994) 20. Dubois, D., Prade, H.: Possibility Theory. Plenum Press, New York (1988) 21. Dugat, V., Gambarotto, P., Larvor, Y.: Qualitative geometry for shape recognition. Applied Intelligence 17(3), 253–263 (2002) 22. D¨ untsch, I.: Contact relation algebras. In: Orlowska, E., Szalas, A. (eds.) Relational Methods in Algebra, Logic, and Computer Science, pp. 113–134. PhysicaVerlag, Heidelberg (2001) 23. D¨ untsch, I., Orlowska, E., Wang, H.: Algebras of approximating regions. Fundamenta Informaticae 46, 71–82 (2001) 24. Dupin de Saint-Cyr, F., Jeansoulin, R., Prade, H.: Fusing uncertain structured spatial information. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 174–188. Springer, Heidelberg (2008) 25. Dupin de Saint-Cyr, F., Prade, H.: Logical handling of uncertain, ontologybased, spatial information. Fuzzy Sets and Systems, Advances in Intelligent Databases and Information Systems 159(12), 1515–1534 (2008)
162
F.D. de Saint-Cyr, O. Papini, and H. Prade
26. Fari˜ nas del Cerro, L., Orlowska, E.: Dal– a logic for data analysis. Theoretical Computer Science 36, 251–264 (1985) 27. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Springer, Heidelberg (1999) 28. Jeansoulin, R., Mathieu, C.: Une logique des inf´erences spatiales. Revue internationale de g´eomatique 4(3-4), 369–384 (1994) 29. Mc Kinsey, J.C.C., Tarski, A.: The algebra of topology. Annals of Mathematics 45, 141–191 (1944) 30. Kripke, S.: Semantical analysis of Intuitionnist logic I. In: Crossley, J., Demmett, M. (eds.) Formal Systems and Recursive Functions. North Holland, Amsterdam (1963) 31. Kutz, O., Sturm, H., Suzuki, N., Wolter, F., Zakharyaschev, M.: Axiomatizing distance logics. Journal of Applied Non Classical Logic 12(3-4), 425–440 (2002) 32. Kutz, O., Sturm, H., Suzuki, N.-Y., Wolter, F., Zakharyaschev, M.: Logics of metric spaces. ACM Transactions on Computational Logic (TOCL) 4(2), 260–294 (2003) 33. de Laguna, T.: Point, line and surface as sets of solids. The Journal of Philosophy 19, 449–461 (1922) 34. Le Ber, F., Ligozat, G., Papini, O.: Raisonnements sur l’Espace et le Temps: des Mod`eles aux Applications. Hermes, Lavoisier eds (2007) 35. Lemon, O., Pratt, I.: On the incompleteness of modal logics of space. In: Advances in Modal Logic, pp. 115–132. CSLI publications, Standford (1998) 36. Lesniewski, S.: Sur les fondements de la mathematique. traduit du polonais par Kalinowski. Hermes, Paris (1989) 37. Martin, R.: Of time and null individuals. The Journal of Philosophy 62, 723–736 (1965) 38. Marx, M., Reynolds, M.: Undecidability of compass logic. Journal of Logic and Computation 9(6), 897–914 (1999) 39. Nicod, J.: La geometrie dans le monde sensible. In: English translation in: Geometry and Induction 1969, Presses Unitaires de France, Routledge and Kegan Paul (1962) 40. Orlowska, E., Pawlak, Z.: Expressive power of knowledge representation systems. International Journal of Man-Machine Studies 20, 485–500 (1984) 41. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht (1991) 42. Polkowski, L.: Rough mereology: A rough set paradigm for unifying rough set theory and fuzzy set theory. Fundamenta Informaticae 54, 67–88 (2003) 43. Polkowski, L.: Rough mereology as a link between rough set and fuzzy set theories. a survey. In: Peters, J.F., Skowron, A., Dubois, D., Grzymala-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds.) Transactions on Rough Sets II. LNCS, vol. 3135, pp. 253–277. Springer, Heidelberg (2004) 44. Polkowski, L., Skowron, A.: Rough mereology: A new paradigm for approximate reasoning. International Journal of Approximate Reasoning 15, 333–365 (1996) 45. Randell, D.A., Cui, Z., Cohn, A.: Naive topology: modeling the force pump. In: Faltings, B., Struss, P. (eds.) Recent Advances in Qualitative Reasoning. MIT Press, Cambridge (1992) 46. Randell, D.A., Cui, Z., Cohn, A.: A spatial logic based on regions and connection. In: Nebel, B., Rich, C., Swartout, W. (eds.) KR 1992. Principles of Knowledge Representation and Reasoning: Proceedings of the Third International Conference, San Mateo, California, pp. 165–176. Morgan Kaufmann, San Francisco (1992)
An Exploratory Survey of Logic-Based Formalisms
163
47. Renz, J., Nebel, B.: On the complexity of qualitative spatial reasoning: A maximal tractable fragment of the region connection calculus. Artificial Intellegence 108(1-2), 69–123 (1999) 48. Schockaert, S.: Reasoning about Fuzzy Temporal and Spatial Information from the Web. PhD dissertation. Universiteit Gent, Gent, Belgium (2008) 49. Schockaert, S., De Cock, M., Kerre, E.: Spatial reasoning in a fuzzy region connection calculus. Artificial Intelligence 173(2), 258–298 (2009) 50. Segerberg, K.: A note on the logic of elsewhere. Theoria 47, 183–187 (1981) 51. Smith, B., Varzi, A.C.: Fiat and bona fide boundaries. Philosophy and Phenomenological Research 60(2), 401–420 (2001) 52. Sturm, H., Suzuki, N.-Y., Wolter, F., Zakharyaschev, M.: Semi-qualitative reasoning about distances: preliminary report. In: Brewka, G., Moniz Pereira, L., Ojeda-Aciego, M., de Guzm´ an, I.P. (eds.) JELIA 2000. LNCS (LNAI), vol. 1919, pp. 37–56. Springer, Heidelberg (2000) 53. Tarski, A.: Logique, smantique, mta-mathmatique, Vol 1. Armand Colin (1972) 54. Tuan-Fang, F., Churn-Jung, L., Yiyu, Y.: On modal and fuzzy decision logics based on rough set theory. Fundamenta Informaticae 52, 323–344 (2002) 55. van Benthem, J.: The Logic of Time. In: Synthese Library, vol. 156. Kluwer Academic Publishers, Dordrecht (1983) (Reidel, revisited and expanded in 1991) 56. Varzi, A.: Parts, wholes, and part-whole relations: The prospects of mereotopology. The Prospects of Mereotopology, Data and Knowledge Engineering 20, 259–286 (1996) 57. Venema, Y.: Expessiveness and completeness of an interval tense logic. Notre Dame Journal Formal Logic 31(4), 529–547 (1990) 58. Venema, Y.: Points, lines and diamonds: a two-sorted modal logic for projective planes. Journal of Logic and Computation 9(5), 601–621 (1999) 59. Vieu, L.: Semantique des relations spatiales et inferences spatio-temporelles. PhD dissertation. Universite Paul Sabatier, Toulouse (1991) 60. von Wright, G.H.: A modal logic of place. The Philosophy of Nicholas Rescher, 65–73 (1979) 61. Wolter, F., Zakharyaschev, M.: Spatial Reasoning in RCC-8 with Boolean Region Terms. In: Horn, W. (ed.) Proceedings of the 14th European Conference on Artificial Intelligence (ECAI 2000), Berlin, pp. 244–250. IOS Press, Amsterdam (2000) 62. Cristani, M.: The Complexity of Reasoning about Saptial Congruence. J. Artif. Intell. Res. (JAIR) 11, 361–390 (1999), http://dx.doi.org/10.1613/jair.641 63. Gerevini, A., Renz, J.: Combining topological and size information for spatial reasoning. Artif. Intell. 137, 1–42 (2002)
Revising Geographical Knowledge: A Model for Local Belief Change Omar Doukari, Robert Jeansoulin, and Eric Würbel
Abstract. The revision problem is known to be a very hard problem. There is no revision algorithm which is able to handle a large amount of data. Geographical information systems (GIS) are characterized by a huge amount of data, often gathered from different sources of information, which are imperfect and whose quality can differ. Therefore GIS require the definition, and the use of belief revision. Literature on the topics shows that all known revision operators (i.e. from a global point of view), are unable to solve problems of this size. In this chapter, we show how to take advantage of the geographical context to define local revision operators that can be combined to handle the global revision problem. For this purpose, we define a postulate that may be assumed with geographical data — the containment assumption — and we show how this postulate can be captured by a new knowledge Omar Doukari CNRS UMR 6168 LSIS, Campus de Saint-Jérôme, Avenue Escadrille Normandie-Niemen 13397 Marseille Cedex, France e-mail:
[email protected] Robert Jeansoulin CNRS UMR 4089 LabInfo IGM, Université Paris-Est Marne-la-Vallée 77454 Marne-la-Vallée Cedex, France (on leave at the Embassy of France, Washington, DC) e-mail:
[email protected] Eric Würbel CNRS UMR 6168 LSIS, Université du Sud Toulon-Var 83957 La Garde Cedex, France e-mail:
[email protected]
R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 165–188. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
166
O. Doukari, R. Jeansoulin, and E. Würbel
representation model, the G-structure model. Then we define a revision operation on this model, which can be run locally, and we apply this operation on a real experiment, with real data, which we succeed to process, correctly, though the global revision always failed.
1 Introduction Agents facing incomplete, uncertain, and inaccurate information must use rational belief revision in order to manage belief changes. Belief revision restores the consistency, while modifying initial information as little as possible. This sounds pretty useful for geographic information, because many different actors are collecting many different pieces of information, and merging them often reveals inconsistent. Alas, in the general case, the theoretic complexity of revision is high [13, 23, 24]. The rare applications which have been developed for belief revision are highly inefficient [33, 35]. Since formal complexity cannot be reduced, reducing the set of beliefs to a manageable size seems to be an interesting approach. In most geographical applications, the inconsistency is due to the presence of a “few” contradictory pieces of information, among a huge amount of data, mostly not conflicting, since not related. Hence, revision should be restricted to local, relevant parts of the belief base: this idea of containment goes back to S. Ja´skowski [18], and relates to Relevance-Sensitivity (R-S) approaches. R-S can be introduced by means of the logic: a local inference considers only the relevant part of the belief base, with respect to a given formula, the “compartment” [17, 5, 18, 22, 14, 32]. Although intuitively adequate, and requiring no extra entities, finding a relevant part of the belief base is computationally as hard as the belief revision. R-S can be introduced as an addition to the logic, so that one belief set can induce different relevant compartments. Parikh [25] has defined the language splitting (LS) model: in practice, since beliefs do have some overlap, the partition of the main set of beliefs cannot be so strict. Parikh’s original model, and followers [6, 26, 21] have been extended, by allowing for such overlap, in the B-structures model [7, 8]. This model relies on a notion of partial language splitting and tolerates some amount of inconsistency while retaining classical logics locally. While satisfactory, this model does not guaranty global consistency: revision soundness is not guaranteed. The B-structures model also lacks a semantic characterization so far. In order to circumvent these problems, a new model for belief representation and local belief revision has been proposed: the C-structure model [12]. It allows some overlap between the different belief subsets and preserves all the desirable properties of the language splitting model. We propose an adaptation of the C-structure model to the geographical context, namely, in Section 2: the G-structure model. The “containment assumption” [11] is an additional constraint that allows to define compartments and overlaps, from a spatial viewpoint. Section 3 defines a revision operator on G-structures.
Revising Geographical Knowledge: A Model for Local Belief Change
167
Section 4 presents an experiment on a real world application. The G-structure model can be applied because the containment assumption is realistic. Indeed, the geographical data collection process, when done by professionals, tends to limit spreading of conflictual records. Either the confidence is pretty important: data are rather precise and errors are sparse, then conflicts do not spread too far; either confidence is low: data are rather imprecise, then seldom false, and conflict do not occur, though overall information is weak. In both case, it is reasonable to think that conflicts should be “spatially contained”.
2 Local Revision of Geographical Information This chapter deals with information revision in Geographic Information Systems (GIS). GIS use incomplete and uncertain information leading to inconsistency, and revision becomes mandatory. Most belief revision operations are highly complex, and practically intractable, since GIS use large amounts of data. What can we do to perform a revision when the size of the data disqualifies any available algorithm? We have two choices : • to process the problem as a whole, using approximate revision [9, 19, 35, 34] ; • to split the problem into sub-problems, and process them separately [20, 36]. In both cases, an exact solution (respecting consistency and the minimal change principle [2, 16]) is generally not available. We can’t perform a revision in most real applications using an even modest size database. That’s why we want to know if local representation frameworks can be of some help for such a difficult situation. The case of GIS is particularly interesting for two reasons : • it presents a revision problem, because information is always somewhat uncertain, imprecise or incomplete : data are inherently an interpretation of the reality, beliefs rather than facts. In this respect, a GIS can be globally inconsistent. • it possesses a spatial nature. The geographical aspect of the information provides a hard constraint, the uniqueness of the geographical space. Besides, it is rather easy to define a notion of local information, which can be implement by means of relatively simple models of space, depending on the application : topological model, euclidian space, etc. When we process real data corresponding to geophysical phenomena, social behaviours, etc. we pretend that it is reasonable to make an assumption on the distribution of the imperfection (imprecision, uncertainty) in the data collection process. Assumption: the distribution of the imperfection has no reason to be spread apart extremely remotely, to the point of hiding inconsistencies which could have been detected by almost any random distribution. In other words, the specialist of the data collection, should be able to say: if the data gathered in a sufficiently large area do not show up inconsistencies, nor
168
O. Doukari, R. Jeansoulin, and E. Würbel
with their neighbourhood, then data gathered farther will not either. This term of “sufficiently large area” denotes the limit beyond which new data — even inconsistent ones — should not influence the revision of data in the area on which we focus. Claiming the existence of such a limit is, sure enough, a postulate which we introduce as a supplemental constraint on the actual data in the database. We name this postulate the “containment assumption”, as proposed in [11]. Let’s now define a new model for the representation and the revision of information in GIS. A model that fully takes into account the spatial nature of this information: the G-structure model.
2.1 The G-Structure Model in a Geographic Framework Geographic information is information represented in a language, e.g. propositional logic, allowing to say " this is true within a certain zone " (point, land parcel p, etc.). To be " true in p " can be represented by different means (modality, reification) that we don’t address here. What is worth to notice is: 1. any given zone1 Z identifies a subset of literals SB(Z), involved in these parcels, 2. given an information subset, it delineates a spatial footprint, and any minimal zone2 Z encompassing this footprint, is said to "map the subset". Therefore, it makes sense to talk about performing a "revision within Z". In the sequel, we say Z is a spatial extent of SB(Z), and we use set inclusion as the preorder relation between spatial extents. The intuition behind the construction of the G-structure model is that: • an inconsistent subset of formulas can be mapped by a spatial extent containing all [space-located] literals involved in these formulas, • the spatial extent of a minimal inconsistent subset gives the size of the zone where to perform the revision. • Then, if we can bound (contain) the spatial extent of all minimal inconsistent sets, this gives us the largest size for a local revision (containment range). This is a slightly more formal version of the “containment assumption”. Let’s introduce the objects illustrated by Figure 1: • geographical space ξ (background) • Core: a connected area in ξ , containing at least one piece of information; • Covering: a topological neighbourhood of the Core, containing at least one extra piece of information, not in the Core. 1 2
We use the word zone to denote a connected set of parcels, any shape. Not necessarily unique, but if SB(Z1 ) = SB(Z2 ), Z1 and Z2 are similar in size; class of equivalence.
Revising Geographical Knowledge: A Model for Local Belief Change
169
Fig. 1 Core, Covering and Containment range
The containment assumption can be phrased as: Under the postulate that a finite containment range exists (assumption): - if a Core [the set of encompassed information] is consistent (locally consistent) - if the Core doesn’t contradict any formula of its Covering; - if the formulas of the Covering, and external to the Core, are consistent; Then the Core is consistent with any other information (globally consistent). We say for short: the Core is "contained". Note: the Covering must be larger than the containment range (or equal), but the Core can be smaller, hence improving the tractability of its revision, but increasing the number of Cores to process over ξ . Example 1. Figure 2 illustrates a partition of ξ .
Fig. 2 Covering of the plane by means of juxtaposed cores
The property that we want to prove, is: Under the containment assumption, if the cores B1, B2 et B3 are contained, then SB(B1) ∪ SB(B2) ∪ SB(B3) is consistent. Note: the information which lays outside one core, but inside its covering, will be processed later with another core.
170
O. Doukari, R. Jeansoulin, and E. Würbel
Fig. 3 Graph induced by the external connection relation
Any geographical application deals with finite information, and a finite number of parcels. For sake of simplicity, we consider these parcels as elements of a partition of ξ . Without loss of generality, we also consider that all cores and covering can keep the same size throughout ξ . Parcels are indexed with an order given by spatial adjacency based on the external connexion relation, as defined in [3, 29, 30]. Definition 1 (Space adjacency relation). Let pi , p j be parcels in ξ . We define the spatial adjacency relation ℜm (m ≥ 0) as follows: 1. ℜm (pi , p j ) if and only if ∃p1 , p2 , . . . , pm ∈ ξ such that ℜ0 (pi , p1 ), ℜ0 (p1 , p2 ), . . ., ℜ0 (pm−1 , pm ), and ℜ0 (pm , p j ). 2. ℜ0 (pi , p j ) if and only if pi and p j are externally connected3 (direct neighbours). If ℜm (pi , p j ), then pi and p j are called m-neighbours. Example 2. In figure 3, p1 , p2 ... p5 are parcels such that ℜ0 (p1 , p2 ), ℜ0 (p2 , p3 ), ... and ℜ1 (p1 , p3 ), ... and ℜ3 (p1 , p5 ). Definition 2 (Gdistance). Let pi , p j be parcels in ξ . ⎧ ⎨0 if pi = p j , Gdist(pi , p j ) = min m | ℜm (pi , p j ) if pi are neighbours, ⎩ ∞ otherwise. trivially verifies the properties: • Gdist(pi , p j ) ≥ 0 (positivity); • Gdist(pi , p j ) = 0 if and only if pi = p j (separation); • Gdist(pi , p j ) = Gdist(p j , pi ) (symetry); • Gdist(pi , p j ) ≤ Gdist(pi , pk ) + Gdist(pk , p j ) (triangular inequality). Remark 1. In practice, we use the shortest path pi -p j in the graph induced by the external connection relation, with parcels as nodes, and unweighted edges (fig.3). Definition 3 (G-core). Let p be a parcel of ξ . B = {pi ∈ ξ | Gdist(pi , p) ≤ r} is called the G-core of radius r (r ≥ 0) and center p. 3
In the RCC formalism, which is used for spatial information representation, this relation is denoted by EC.
Revising Geographical Knowledge: A Model for Local Belief Change
171
Definition 4 (G-covering). GCovk (B) = {pi ∈ ξ | r < Gdist(pi , p) ≤ k} is called the G-covering of thickness k of the G-core B of radius r and of center p. Definition 5 (G-structure) Be Gi = (Bi , GCovk (Bi ), SB(GCovk (Bi ))) a triple made of a G-core, its G-covering, and related sub-base. G = {G1 , . . . , Gn } is a G-structure of ξ defined as follows: 1. {B1 , . . . , Bn } are G-cores forming a partition of ξ (subspace) ; 2. GCovk (Bi ) are their G-coverings of maximal thickness k. 3. SB(GCovk (Bi )) are the associated information subsets (subbase). Informally: a G-covering of Bi should be a zone where the minimal inconsistent subsets (MIS) containing at least one element of SB(Bi ) can intersect MIS containing no literal from SB(Bi ): any information beyond, either is related to one of these MIS, or is not in conflict with SB(Bi ). We note I (Bi ) the collection of the MIS of SB(Bi ), and I (ξ ) the collection of all the MIS. The thickness k of the G-coverings of G, must be (slightly) greater than the spatial extent of the largest MIS of I (ξ ), in order to catch them all.
2.2 Maximal Spatial Extent of the Minimal Inconsistent Subsets The containment assumption has been qualified as “reasonable” for an engineer. Let us think in terms of quality of information: in a reasonable model, if we detect an inconsistency between formulas, this is because at least one of them is based on bad quality information. If the quality cannot be improved, the engineer increases the uncertainty attached to this information, hence increasing the number of acceptable cases. Sure enough, the number of solutions, and the complexity of the problem, rise accordingly. Consider the proposition “an inconsistent subset has a maximal spatial extent”: no reason to think that this is true. But the proposition “a minimal inconsistent subset has a maximal spatial extent”: looks reasonable. Why? because its negation implies that two contradictory beliefs can be infinitely distant, without being inconsistent with any other in between! In other words, an inconsistent subset can be very large, but then it is reasonable to think that there exists intermediate faulty belief to explain this spreading, and the inconsistent subset is not minimal. You can be both precise and wrong, but not both over a too large distance: a good engineer would prefer to be imprecise than wrong. A fair assessment of the distribution of the data quality all over ξ would help him to figure out what could be the maximal extent of the minimal inconsistent subsets. This is the base postulate of our approach. Definition 6 (Maximal spatial extent of a MIS) Let M be a MIS; V (M) is the set of all variables involved in M. The maximal spatial extent of M is:
172
O. Doukari, R. Jeansoulin, and E. Würbel
GExt(M) = max Gdist(pi , p j ) | pi , p j ∈ξ , V (M) ∩ SB({pi }) = 0, / and V (M) ∩ SB({p j }) = 0/ . The only assumption done by the G-structure model is that all MIS have a spatial extent, one being the largest. Here is the new version of the Containment Assumption: Let G be a G-structure; the thickness k of the G-coverings of G must verify: ∀ MIS M, GExt(M) ≤ k. Algorithm 1 builds G-structures for a geographical space ξ . Starting with an empty G-structure, each loop 2–17 adds a (G-core, G-covering) couple: the loop 5– 9 computes a G-core B of radius r centred on a parcel p; the loop 10–14 computes a G-covering of thickness k of B. Finally, the new couple is added to the structure (line 15) and the used zone is removed from the space ξ0 to consider. The random choice of the initial p allows to build different G-structures for a same radius r. If ξ contains m parcels, this algorithm has a complexity of O(m5 ). The most expensive calls are those to Gdist, which relies on a Bellman-Ford algorithm4: shortest path between parcels p and pi .
Algorithm 1: BuildG
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 4
Data: ξ - Set of parcels building up the space r - radius of G-cores k - thickness of the G-coverings G ← 0; / ξ0 ← ξ ; while ξ0 = 0/ do randomly choose a parcel p of ξ0 ; B ← 0, / GCovk (B) ← 0; / foreach pi of ξ0 do if Gdist(p, pi ) ≤ r then B = B ∪ {pi }; end end foreach pi of ξ0 do if r < Gdist(p, pi ) ≤ k then GCovk (B) = GCovk (B) ∪ {pi }; end end G ← G ∪ {Structure(B, GCovk (B))}; ξ0 ← ξ0 \ B; end return G; Bellman-Ford algorithm’s complexity is O(nm) for a graph with m vertices ans n arcs. In our case, the graph of the external connection relation is not directed, but we can consider that each edge pi − p j corresponds to two opposite arcs, hence complexity is O(m3 ) with simple dense graphs.
Revising Geographical Knowledge: A Model for Local Belief Change
173
In the sequel, for sake of simplicity: we omit to specify the radius r, and the thickness k; we assume that k > max(GExt(M)), and r ≈ k/2; we illustrate the different notions by examples on a one dimension space segmented into parcels.
3 Local Revision Based on the G-Structure Model Let S1 , S2 be two information sources, we define an external revision operator on the G-structure model: • perform an expansion S1 + S2, a possibly inconsistent S1 ∪ S2 . • build a G-structure G on S1 ∪ S2 . • perform a local contraction of the G-structure G by ¬S2 . Acronyms: MIS for “minimal inconsistent sets”, MHS for “minimal hitting sets”5 . Moreover, we say that a MIS M is “relevant” for a zone Z, if V (M) ⊂ SB(Z).
3.1 Processing the Minimal Inconsistent Sets of One Structure Let Gi = (Bi , GCov(Bi ), SB(Bi ∪ GCov(Bi ))) be a structure of a G-structure of ξ . We distinguish three types of relations between the MIS of each Gi , depending on the type of intersection with GCov(Bi ): (1) space independence, (2) information independence, (3) dependency. Space independence is a particular case of (2). 3.1.1
Space Independent Minimal Inconsistent Sets
In figure 4, there is no MIS intersecting both B1 and GCov(B1 ). We say that the MIS of I (B1 ) are space independent. Their MHS can be performed independently of any other MIS, then we can safely ignore the structure (B1 , GCov(B1 ), SB(B1 ∪ GCov(B1 ))), and safely remove B1 from ξ .
Fig. 4 Space independence: Conflicts= {{a,b,c}, {g,f}}
We design REMlocal, adapted from the REM procedure [35], with the same worst case complexity, in order to compute the MHS of I (Bi ). 5
MHS: the smallest subsets which intersect all the sets of a collection of sets.
174
O. Doukari, R. Jeansoulin, and E. Würbel
Definition 7 (REMlocal Procedure). REMlocal computes the MHS of the collection of MIS of a set of clauses C1 ∪ C2 containing at least one clause from C2 . The next lemma shows how to detect the space independence of I (B1 ). Lemma 1. Let F be a collection of MIS, and E be the collection of MHS of F . Then, ∀F ∈ F , ∀ f ∈ F, ∃H ∈ E such that f ∈ H. Proof. Let F ∈ F and f ∈ F. Suppose that ∀H ∈ E, { f } ∩ H = 0. / Then ∀H ∈ E, (F \ { f }) ∩ H = 0. / Consequently, (F \ { f }) ∈ F : contradiction! because F is a collection of MIS, hence F and F \ { f } cannot be both in F .
Then for any clause f in a MIS F ∈ F , there exists a MHS H ∈ E, s.t. f ∈ H. So, in order to detect the space independence of I (Bi ∪ GCov(Bi )): 1. we compute the MHS of the MIS of I (Bi ∪ GCov(Bi )) containing at least one clause from SB(Bi ). 2. if the intersection of these MHS with SB(GCov(Bi )) is empty, then, by lemma 1, we infer that the MIS of I (Bi ∪ GCov(Bi )) are space independent. Proposition 1. Let Gi be a structure of G. The MIS of I (Bi ) are space independent if and only if ∀H ∈ REMlocal(SB(GCov(Bi)∪Bi ), SB(Bi )), H ∩SB(GCov(Bi )) = 0. / Proof. Trivial.
Algorithm 2: IndependenceS
1 2 3 4 5 6
Data: Gi a structure (Bi , GCov(Bi ), SB(Bi ∪ GCov(Bi ))) of G Con f - a set of clauses containing already processed MIS Var EMB - backup variable of the MHS EMB ← REMlocal(SB(GCov(Bi ) ∪ Bi ) \Con f , SB(Bi )); if ∃emb ∈ EMB such that emb ∩ SB(GCov(Bi )) = 0/ then return false; else return true; end
Algorithm 2 implements proposition 1, and its complexity is O C3 × 22×C : • line 1 worst case is O C3 × 22×C , where C = cardinality of SB(GCov(Bi ) ∪ Bi ). This is exactly the complexity of the REM algorithm. • line 2 test is at most O C × 2C , because the maximum size of EMB is 2C , and the size of each of its elements is, at most, equal to C. Negligible vis a vis line 1. 3.1.2
Information Independent Minimal Inconsistent Sets
In figure 5, there are MIS intersecting B1 and GCov(B1 ). But, there is no intersection between these MIS and those of the collection I (B1 ).
Revising Geographical Knowledge: A Model for Local Belief Change
175
Fig. 5 Information independence: Conflicts= {{a,b,c}, {d,e,f}, {g,h,i}}
We say that the MIS of the collection I (B1 ) are “information independent". Computing the MHS of I (B1 ) can be done independently of the other MIS. Moreover, we can ignore (B1 , GCov(B1 ), SB(B1 ∪ GCov(B1 ))), because any remaining MIS which may intersect B1 will be checked later, during the processing of another structure G. Nevertheless, we cannot remove B1 from ξ , because relevant MIS may still exist. In order to detect this kind of independence, we propose the following lemma. Lemma 2. Let F1 , F2 be two collections of MIS, E1 and E2 be respectively the two collections of MHS of F1 and F2 . ∀F1 ∈ F1 , ∀F2 ∈ F2 , F1 ∩ F2 = 0/ if and only if ∀H1 ∈ E1 , ∀H2 ∈ E2 , H1 ∩ H2 = 0. / Proof. We conduct the proof in the two directions of the implication. ⇒: trivial. ⇐: suppose F1 , F2 dependent: ∃F1 ∈ F1 , F2 ∈ F2 s.t. F1 ∩ F2 = 0. / Let f ∈ F1 ∩ F2 . By lemma 1, ∃H1 ∈ E1 , ∃H2 ∈ E2 s.t. f ∈ H1 ∧ f ∈ H2 . Then, H1 ∩ H2 = 0. /
Lemma 2 states that: the independence of two collections of MIS, is implied by the independence between their respective collections of MHS. Proposition 2. Let Gi be a structure of G. The MIS of I (Bi ) are information independent if and only if ∀H ∈ REM(SB(Bi )), ∀H ∈ REMlocal(SB(GCov(Bi ) ∪ Bi ), SB(GCov(Bi ))), H ∩ H = 0. / Proof. Trivial using lemma 2.
The algorithm 3 implements the detection of “information independence": • line 1 is a REM algorithm, its worst case complexity is O D3 × 22×D , where D is the cardinality of the set of clauses SB(B i ). • line 2, the worst case complexity is O C3 × 22×C , where C is the cardinality of the set of clauses SB(GCouv(B i) ∪ Bi ). • line 3 test is performed in O C × 2D × 2C , because the maximum size of the set EMB is 2D , the maximum size of EMS is 2C and their elements have size which doesn’t exceed C in the worst case. Then, the complexity of algorithm 3 is : O C3 × 22×C .
176
O. Doukari, R. Jeansoulin, and E. Würbel
Algorithm 3: IndependenceI
1 2 3 4 5 6 7
Data: Ci a structure (Bi , GCov(Bi ), SB(Bi ∪ GCov(Bi ))) of G Con f - set of clauses representing already processed MIS. Var EMB - backup variable of the collection of MHS. EMB ← REM(SB(Bi ) \Con f ); EMS ← REMlocal(SB(GCouv(Bi ) ∪ Bi ) \Con f , SB(GCouv(Bi ))); if ∃emb ∈ EMB, ems ∈ EMS such that emb ∩ ems = 0/ then return false; else return true; end
3.1.3
Processing Independent MIS
To check space and information independence between two collections of MIS, may ease to compute the MHS of their union. Thus, we compute the hitting sets of each collection, and then we perform a pairwise concatenation of the result (see figure 6). More formally, this operation is based on the following proposition: Proposition 3. Let F be a collection of MIS, and E the collection of their MHS. If F = ni=1 Fi s.t. ∀F ∈ Fi , ∀F ∈ F j (i = j), F ∩ F = 0/ (i.e. Fi , F j are pairwise information independent), then E = {H1 ∪ ...Hn |(H1 , ...Hn ) ∈ E1 × ...En , such that Ei is the collection of MHS of Fi , respectively. Proof. In order to simplify the proof, we restrict it to two sets F1 , F2 . -Soundness. Let H ∈ E. By definition of E, ∃H1 ∈ E1, ∃H2 ∈ E2 s.t. H = H1 ∪ H2 . Moreover, ∀F ∈ F1 ∪ F2 , either F ∈ F1 , and F ∩ H = 0, / because F ∩ H1 = 0, / or F ∈ F2 , and F ∩ H = 0/ because F ∩ H2 = 0. / Then, ∀F ∈ F1 ∪ F2 , F ∩H = 0, / which implies that H is a hitting set of the set F1 ∪ F2 . Suppose: ∃H hitting set of F1 ∪ F2 , s.t. H ⊂ H. We state that: H = H1 ∪ H2 , with H1 = H ∩ F1 and H2 = H ∩ F2 . Then, H1 ∩ H2 = 0. / ∀F1 ∈ F1 , F1 ∩ H1 = 0/ because H = H1 ∪ H2 and H is a hitting set of F1 ∪ F2 and F1 ∩ H2 = 0, / i.e. H1 is a hitting set of F1 . The same applies to H2 : hitting set of F2 . H ⊂ H =⇒ (H1 ∪H2 ) ⊂ (H1 ∪H2 ), which in turn implies (H1 ⊂ H1 ) or (H2 ⊂ H2 ): contradiction, because H1 and H2 are MHS of F1 and F2 , respectively. -Completeness. Let H a MHS of F1 ∪ F2 . Be H = H1 ∪ H2 , s.t. H1 = H ∩ F1 and H2 = H ∩ F2 . From above, we know: H1 , H2 are hitting sets of F1 , F2 respectively. Their minimality is easy, because F1 and F2 are independent. If we suppose that ∃H1 (resp. H2 ), hitting set of F1 , it contradicts: (H1 ∪ H2 ) ⊂ H = (H1 ∪ H2 ) (resp. (H1 ∪ H2 ) ⊂ H = (H1 ∪ H2 )) and H = H1 ∪ H2 is a minimal hitting set of F1 ∪ F2 . So, ∀H, minimal hitting set of F1 ∪ F2 , H ∈ E.
Remark 2. A reciprocal proposition doesn’t stand: for two collections F1 , F2 , and their collections of MHS E1 , E2 , such that the collection of MHS of F1 ∪ F2 is E = {H1 ∪ H2|(H1, H2) ∈ E1 × E2}, we can’t say that F1 and F2 are independent.
Revising Geographical Knowledge: A Model for Local Belief Change
177
Counter example: F1 = {{a, b}, {b, c}} and F2 = {{ f , d}, {a, f }}. We have E1 = {{b}, {a, c}} and E2 = {{ f }, {a, d}}. The collection of MHS of F1 ∪ F2 is E = {{b, f }, {b, a, d}, {a, c, f }, {a, c, d}}, but F1 and F2 are not independent. Example 3. Let’s use collection Con f licts = {{a, b, c} , {d, e, f } , {g, h, i}} from Fig.5, and process as follows: • To test the independence between the MIS in B1 and those in GCov(B1 ): apply REMlocal(SB(B1 ∪ GCov(B1 )), SB(GCov(B1 ))), which gives the collection of MHS of MIS in B1 ∪ GCov(B1 ) containing at least a clause from GCov(B1 ): EGCov(B1 ) = {{d} , {e} , { f }}. Then apply REM(SB(B1 )), which gives the MHS of MIS in B1 : EB1 = {{a} , {b} , {c}}. Clearly, EGCov(B1 ) and EB1 are independent. Idem: EB2 = {{g} , {h} , {i}}, and EGCov(B2 ) = {{d} , {e} , { f }}, are independent. • After proposition 3, the set of MHS of the two collections of MIS of B1 and B2 , is the pairwise concatenation of the elements of EB1 and EB2 : E = {{a, g} , {a, h} , {a, i} , {b, g} , {b, h} , {b, i} , {c, g} , {c, h} , {c, i}}
n1 :{a,b,c} b
a n2 : I
n3 : I
n4 : I
h
i
Concaténation
Concaténation
Concaténation
n1 :{g,h,i} g
n1 :{g,h,i} h i
g
n3 : I n4 : I n2 : I n3 : I
n2 : I
c
g
n4 : I n2 : I
n1 :{g,h,i} h i n3 : I
n4 : I
Concaténation
d n2 :
I
n1 :{d,e,f} e f n3 :
I
n4 :
I
d n2 :
I
n1 :{d,e,f} e f n3 :
I
n4 :
I
d e n2 :
I
n3 :
f
I
n4 :
n1 :{d,e,f}
n1 :{d,e,f}
n1 :{d,e,f}
d e
I
n2 :
I
n3 :
f
I
n4 :
d
I
n2 :
e
I
n3 :
n1 :{d,e,f} f
I n4 : I
d n2 :
I
n3 :
e
f
I
n4 :
d
I
n2 :
I
n1 :{d,e,f} e f n3 :
I
n4 :
I
d n2 :
I
n1 :{d,e,f} e f n3 :
I
n4 :
I
d n2 :
I
n1 :{d,e,f} e f n3 :
I
n4 :
I
Fig. 6 Search tree of exemple 3
Did we solve the problem ? The result doesn’t take into account the MIS {d, e, f } which mixes clauses from B1 and from B2 . The MIS which overlap several G-cores are not part of this decomposition. We must shift the G-cores, that is, to generate all the possible G-structures and process them one after the other. In a second partitioning G (figure 7), we don’t detect any MIS in B1 , B2 and B3 . In B1 and its covering GCov(B1 ), we detect a space independence, which allows to remove B1 for the sequel (pointed by grey colour), because SB(B1 ) doesn’t intersect any MIS of other structures of G . Concerning the remaining G-cores, we detect an information independence. In the third partitioning (figure 8), we obtain EB = {{d} , {e} , { f }}, the collec2 tion of MHS of MIS contained in B2 . Space independence is detected on these three
178
O. Doukari, R. Jeansoulin, and E. Würbel
Fig. 7 A second partitioning for example 3, Conflicts’= {{a,b,c}, {b,d}, {d,e,f}, {f,h,i}}
Fig. 8 A third partitioning for example 3, Conflicts”= {{d,e,f}}
G-cores: we can withdraw them, and we can say that all MIS have been processed. So, the collection of global MHS is :
E = H1 ∪ H2 ∪ H3 | (H1 , H2 , H3 ) ∈ EB1 × EB2 × EB 2 = {{a, g, d} , {a, g, e} , {a, g, f } , {a, h, d} , {a, h, e} , {a, h, f } , {a, i, d} , {a, i, e} , {a, i, f } , {b, g, d} , {b, g, e} , {b, g, f } , {b, h, d} , {b, h, e} , {b, h, f } , {b, i, d} , {b, i, e} , {b, i, f } , {c, g, d} , {c, g, e} , {c, g, f } , {c, h, d} , {c, h, e} , {c, h, f } , {c, i, d} , {c, i, e} , {c, i, f }} This example illustrates the result of “shifting” the G-cores, i.e. to perform all possible partitioning of space: (1) it takes into account the MIS overlapping several G-cores, (2) it detects possible independence between the collections of MIS which could be missed by a bad partitioning. Clearly, the computation of the global MHS from the local MHS is performed without worrying about the minimality, which is preserved by the concatenation. But in the following section, we have a case where we are forced to explicitly verify the minimality after the processing of each G-structure. The worst case complexity for each task (independence detection, concatenation of local MHS) is bounded by O C3 × 22×C , C = |SB(GCov(Bi ) ∪ Bi )|, where Bi is the G-core containing the greatest volume of data. Thus, the global complexity is O(NPP × (C3 ×22×C )), with NPP being the number of possible partitionings of the G-structure. In a two dimensions ξ space, NPP is equal to the maximal spatial 2 [10]. extent of MIS of S1 ∪ S2 squared, which we denote by Tmax
Revising Geographical Knowledge: A Model for Local Belief Change
179
Fig. 9 Dependent MIS: Conflicts’= {{d,e,f}}
3.2 Processing Dependent MIS Figure 9 illustrates MIS, relevant for B1 and for GCov(B1 ). Their intersection with the MIS of I (B1 ) is not empty. Then, we name them dependent: they need a special processing. Processing dependent MIS relies on a simple approach: to compute the MHS of MIS which are relevant for each G-core of the G structure G, will build the MHS of all MIS in G. Indeed, ∀(Bi , GCov(Bi ), SB(Bi ∪ GCov(Bi ))) ∈ G, we compute the MHS of the MIS of I (Bi ∪ GCouv(Bi )) containing at least one clause of SB(Bi ), ignoring MIS already processed. Then, we concatenate these MHS with the collection previously computed. Finally, we verify the minimality, and we keep only minimal hitting sets. Then, the global MHS is built incrementally, by the computation of the local MHS in the different structures. The following result proves that this approach is valid. Proposition 4. let F be a collection of MIS, and {F1 , ...Fn } a partition of F . If E1 , . . . En are collections of MHS of F1 , . . . Fn respectively, then the collection of MHS of F is E = min{H1 ∪ ... ∪ Hn |(H1 , . . . Hn ) ∈ E1 × . . . × En}6 .
Proof. Let H be a MHS of F . If we set H = ni=1 Hi s.t. Hi = Fi ∈Fi Fi ∩ H, then: n 1. either H = i=1 Hi , and in this case H ∈ E1 × ... × En. 2. either ∃i ∈ {1..n} s.t. Hi ∈ Ei , and Hi is a non minimal hitting set of Fi . So, ∃Hi ∈ Ei s.t. Hi ⊂ Hi . We must have H1 ∪ ...Hi ∪ ...Hn = ni=1 Hi = H otherwise we contradict: H is minimal. So ∃(H1 , ...Hi , . . . Hn ) ∈ E1 × ...Ei × ...En s.t. H = H1 ∪...Hi ∪...Hn , and then H ∈ E1 × ...En. Consequently, the collection of global MHS E is such that E ⊆ {H1 ∪ ...Hn |(H1 , ...Hn ) ∈ E1 × ...En }.
Definition 8 (The • operator). Let F be a collection of sets of clauses, and F a set of clauses. The • operator concatenates a set and a collection of sets: F • F = {C |C = F ∪ F , F ∈ F } Algorithm 4 is used when a dependency is detected among the MIS of G: line 1 worst case: O m5 , with m = |ξ |. Complexity of algorithm 1 ; loop 3-14 iterates at most m times. This happens when r = 0, where each G-core contains one parcel, so the number of structures in G is m ; 6
In this context, minimality is understood in the sense of inclusion.
180
O. Doukari, R. Jeansoulin, and E. Würbel
Algorithm 4: ProcessingD
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data: G - a G-structure of ξ Tmax - max geogaphical extension of MISes of the collections I (ξ ) Con f - set of clauses representing already processed MIS G ← BuildG(ξ , Tmax , Tmax ); EMG ← 0; / foreach (Bi , GCov(Bi ), SB(Bi ∪ GCov(Bi ))) in G do EMT ← 0; / EMB ← REMlocal(SB(GCov(Bi ) ∪ Bi ) \Con f , SB(Bi )); foreach emg ∈ EMG do EMT ← EMT ∪ (emg • EMB); end if (∃emt, emt ∈ EMT ) and (emt ⊂ emt ) then delete emt ; end EMG ← EMT ; G ← G \ {(Bi , GCouv(Bi ), SB(Bi ∪ GCouv(Bi )))}; end return EMG;
line 5 worst case: O C3 × 22×C , where C = cardinality of SB(Bi ∪ GCov(Bi )) ; loop 6-8 iterates at most 2|S1 ∪S2 | times, maximal number of MHS in EMG. line 7 at most: 2C , maximal cardinality of EMB. Complexity of the • operator. line 9 at most: O |S1 ∪ S2 | × 22×|S1∪S2 | , because EMT constains, at most, 22×|S1∪S2 | elements whose size is, at most, equal to |S1 ∪ S2 |. Then, the worst case global complexity of algorithm 4 is, after simplification: O m × |S1 ∪ S2 | × 22×|S1 ∪S2 | .
3.3 An Algorithm for Local Revision First, we try to detect space and information independence: it may reduce the cardinality of the processing of the second step, but not its complexity. Second, we process the remaining (dependent) MIS using the dedicated algorithm 4. Then we concatenate the resulting MHS. Example 4. Look back at Figure 9, which depicts a G-structure of ξ G = {(B1 , GCov(B1 ), SB(B1 ∪ GCov(B1 ))), (B2 , GCov(B2 ), SB(B2 ∪ GCov(B2 )))} Con f licts = {{a, b, c}, {b, d}, {d, e, f }, { f , h, i}} is the collection of MIS of S1 ∪ S2 . We compute the collection of MHS of Con f licts as follows (search tree in fig. 10). • run REMlocal(SB(B1 ∪GCov(B1 )), SB(B1 )) and obtain the collection of minimal hitting sets E1 = {{a, d}, {b, d}, {b, e}, {b, f }, {c, d}} of B1 . • remove (B1 , GCov(B1 ), SB(B1 ∪ GCov(B1 ))) from G: Con f licts = {{ f , h, i}}, and proceed with the remaining structure, to obtain E2 = {{ f }, {h}, {i}}.
Revising Geographical Knowledge: A Model for Local Belief Change
181
n1 :{a,b,c} a
b
n2 :{b,d} b
u
u
n2 :
I
e
n6 : I n7 : I
h
n3 :
I
f
n8 : I
u
n1 :{f,h,i} f
d
n4 :{b,d} d
b
n9 : I n10 : I
n11 :{d,e,f}
Concaténatio
e f
n3 :{d,e,f}
d
n5 :{d,e,f} d
c
n1 :{f,h,i}
i
n4 :
f
I
n2 :
I
h
n3 :
I
n1 :{f,h,i}
i
n4 :
f
I
n2 :
I
h i
n3 :
I
n4 :
I
n2 :
d e
f
u
u
u
n1 :{f,h,i}
n1 :{f,h,i}
f
f
I
h
n3 :
I
i
n4 :
I
n2 :
I
h
n3 :
I
i
n4 :
I
Non minimal
u
u
u
u
Fig. 10 Search tree for example 4
Now we compute the MHS of MIS which are relevant for the sub-space delimited by the union of B1 ∪ B2 (i.e. ξ as a whole). This boils down to : 1. Concatenate all the elements H1 and H2 such that (H1 , H2 ) ∈ E1 × E2 . We obtain {{a, d, f }, {b, d, f }, {b, e, f }, {b, f }, {c, d, f }, {a, d, h}, {b, d, h}, {b, e, h}, {b, f , h}, {c, d, h}, {a, d, i}, {b, d, i}, {b, e, i}, {b, f , i}, {c, d, i}}. 2. Delete non minimal elements (nodes “×” in figure 10), and obtain the collection of global MHS (nodes “φ ” in figure 10), E: E = { {a, d, f }, {b, f }, {c, d, f }, {a, d, h}, {b, d, h}, {b, e, h}, {c, d, h}, {a, d, i}, {b, d, i}, {b, e, i}, {c, d, i} }
3.4 The G-Structurerevision Algorithm, and Its Complexity Let m = |ξ | be the cardinality of ξ , and C = |S1 ∪ S2| the quantity of information. The complexity of the algorithm can be summed up as follows: 2 times ; max number of G-structures that can be loop 2-22 iterates at most Tmax built out of ξ . line 3 worst case: O m5 , complexity of algorithm 1.
182
O. Doukari, R. Jeansoulin, and E. Würbel
loop 4-21 iterates at most m times. This happens when r = 0, one parcel per Gcore, so the number of structures in G is m. C loop 6-8 iterates at most D 2 times, maximal number of MHS in EMG. line 7 worst case: O 2 , with D = |SB(Bi )|, because the maximal cardinality of EMB is equal to 2D . loop 13-16 equivalent to the loop 6-8. loop 25-27 iterates at most 2C times, maximal cardinality of the set EMG. line 26 worst case: O(2C ), complexity of the • operator. After simplifications, the worst case complexity is: Algorithm 5: G-structureRevision 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Data: ξ - geographical space; Tmax - max spatial extent of MIS of I (ξ ) I ← 0; / EMG ← 0;Con / f ← 0; / EMB ← 0; / repeat G ← BuildG(ξ , Tmax , Tmax ); foreach Ci = (Bi , GCov(Bi ), SB(Bi ∪ GCov(Bi ))) in G do if IndependanceS(Ci,Con f , EMB) then foreach em ∈ EMG do EMG ← em • EMB; end EMG ← EMG ; ξ \ Bi ; else if IndependanceI(Ci,Con f , EMB) then foreach em ∈ EMG do EMG ← em • EMB; Con f ← Con f ∪ EMB; end EMG ← EMG ; end end I ← I + 1; end until ξ = 0/ or I ≥ NBDecompositionsPoss ; if ξ = 0/ then EMT ← ProcessingD(ξ ,Con f , Tmax ); foreach em ∈ EMG do EMG ← em • EMT ; end EMG ← EMG ; end return EMG;
Proposition 5. The worst case complexity of the algorithm G-structureRevision is O(m× C × 22×C ), with C = |S1 ∪ S2|.
Revising Geographical Knowledge: A Model for Local Belief Change
183
It is possible to further improve the global efficiency by improving the algorithm REMlocal. A proposition presented in [35] ensures that here exists minimal hitting sets of M ∈ N (I (S1 ∪ S2)) such that M ∩ S2 = 0. / This proposition allows a local pruning in the search tree of each of the minimal hitting sets of each structure of G. Moreover, we must remind that this is the worst case complexity: in real applications, the distribution of independent versus dependent MIS, can actually reduce the computation time. A more detailed study could provide several possible scenarios.
4 Experimentation We implement the G-structureRevision on an extensively studied application: the assessment of water heights in the flooded valley of the Herault river (southern France, 1994). The initial study is by Raclot and Puech [27] from the CEMAGREF7 , then extended in an artificial intelligence perspective by the PhD thesis of Würbel [34] and Khelfallah [19], during the european project REVIGIS [1, 31, 15]. One peculiarity is that water heights behave linearly, due to gravity: we can apply linear algorithms [28, 19]. Another solution, answer set programming [4], uses stratification, which is also -somehow- a local approach in the information domain, based on an application dependent priority given to subsets of clauses. But, if we voluntarily ignore these application dependent aspects8 , the basic revision problem is intractable [34]. For instance, for a zone of 47 parcels, the propositional logic encoding requires 1753 propositional variables, and creates about 55, 000 clauses. Think about an exponent of such a magnitude!
4.1 The Flooding Problem The flood of the Herault river was studied, using two information sources (Fig.11): • S1 : assessments of water levels in the flooded parcels (mostly vineyards), using a priori knowledge of the vegetation height. Visibility of the vegetation informs about minimal or maximal water height values. Quality of S1 is questionable. • S2 : graph of hydraulic relations observed from an aerial picture. Between adjacent parcels, we can note: oriented flux, hydraulic equilibrium, or none. Quality issues: uncertainty (vegetation presence, type, probable height), imprecision (estimated heights), incompleteness (unobservable flux). But the overall quality of S2 is rather good. In addition, data must obey some constraints: (i) a flux p1 → p2 imposes the maximal water level in p1 to be greater than the minimal level in p2 , (ii) an equilibrium imposes both maximal levels to be above both minimal levels. Each source is consistent, but S1 ∪ S2 is not, as it has been observed. Indeed, an inconsistency arises when assessments from S1 contradict relations from S2 . 7 8
Center for management of forest and water resources, http://www.cemagref.fr The linear behaviour of one variable does not impact the general structure of this problem, which remains relevant for the demonstration.
184
O. Doukari, R. Jeansoulin, and E. Würbel
Fig. 11 Estimation of water level and hydraulic flux at parcels p1 and p2
Example 5. Figure 12 shows the (transitive) flux from parcel pm to parcel pn , and their respective water level assessments: pm → pn =⇒ hm max > hn min, what contradicts the assessment hm max < hn min.
Fig. 12 Flux and water heights at parcels pm and pn are inconsistent, and how to detect it
Since hydraulic relations are more reliable than height assessments [27], S1 is revised by S2 , what results: either in increasing hm max, or decreasing hn min. The uncertainty about water height, i.e. about intervals of minimal and maximal heights, is rather randomly distributed. But errors about this uncertainty9 have no reason to be random. Sometimes a random feature can lead to such errors, but such 9
Here, error means: to record supposedly certain data -i.e. a tiny interval- based on a wrong guess.
Revising Geographical Knowledge: A Model for Local Belief Change
185
second degree errors remain local. In the general case, uncertainty just translates into a wider interval which can’t cause a conflict. Moreover, the way data have been collected by the engineers, provides an empirical assessment for Tmax . In this application: Tmax = 3 (beyond neighbours of neighbours of neighbours, no involvement in a conflict: MIS are bounded to 3).
4.2 Experimental Results For this experiment, we ran different G-structures, gradually increasing the radius of the G-cores, and the thickness of the G-coverings. Tests were conducted with a TM computer, 2.53GHz, 4Mb RAM. R IntelCoreDuo Table 1 Results obtained by the G-structureRevision algorithm for the flooding problem radius r thickness k # structures Max. size # MHS computing time (s) 0 0 47 1 0 44,23 0 1 47 8 256 463,15 Algorithm 5 1 1 18 8 64 169,40 1 2 18 16 256 826,57 2 3 8 23 256 1221,96 r k # structures Max size # MHS
: : : : :
radius of the G-cores. thickness of the G-coverings. number of structures in the G-structure. Size (in # of parcels) of the biggest structure in the G-structure. number of minimal hitting sets generated
At the local level, we have the same performance as the algorithm REM [35]. We start with 0-radius G-cores, and 0-thickness G-coverings: i.e., each parcel is a core and its covering. Consequently, the sub-bases of this G-structure are disjoint. Then, no minimal hitting set is generated, because the S2 (hydraulic relations) is barely ignored. It confirms that S1 is consistent, but this isn’t a revision solution. Notice that for (r = k = 1), the number of MHS is less than the number found with (r = 0, k = 1). As a consequence, the number of MIS we were able to process is also less. This is explained by the fact that we consider independent structures (that is, r = k). Because of this choice, some MIS, which lie between these two structures, are missed. Only cases (r = 0, k = 1), (r = 1, k = 2), (r = 2, k = 3) are interesting, because they allow overlapping. In this application, the estimated Tmax value is 3, according to the spatial distribution of uncertainty errors. Therefore, if Tmax = 3 is a postulate, the result obtained for (r = 2, k = 3) is the same as the result that we would have obtained with a global revision. The flooding problem has been "revised" within about 20 minutes, what would have been out of reach with a global approach.
186
O. Doukari, R. Jeansoulin, and E. Würbel
Containment Condition. Let Tmax be the maximal spatial extent of MIS of a set of spatial information S1 ∪ S2 , and let kmax the maximal thickness which characterizes a tractable G-structure built on S1 ∪ S2 . The local revision based on the G-structure model is tractable, sound and complete if and only if Tmax ≤ kmax .
5 Conclusion A global approach of revision on real application is a very difficut task, due to: 1. the huge volume of information contained in real spatial applications 2. the exponential complexity of available reision algorithms Several solutions have been proposed. The salient ones are based on: (a) partial processing, (b) decomposition of the problem into sub-problems. In this chapter, we have proposed a new model for the representation and the local revision of spatial information: the G-structure model. We define a very efficient local revision strategy, but which requires one additional hypothesis: that the minimal inconsistencies are local. This strategy adresses the problem of the huge volume of information, and respects the minimal change principle. We compare the complexity of our local revision: O(m×|S1 ∪ S2 |×22×|S1 ∪S2 | ), to a global revision [35]: O(|S1 ∪ S2 |3 × 22×|S1 ∪S2 | ), with m = the cardinality of the geographical space, and |S1 ∪ S2 | = the quantity of information attached to this space. 2
The improvement factor is r = |S1∪S2| . The gain is in cardinality, not in complexity, m for r is polynomial, but it is quite fair when we are reaching the tractability threshold in applications whose data can be split into parts smaller than the containment range. The experiment gives preliminary results which are encouraging. They show that the application falls into a tractable case, though it was previously impossible to process by a regular revision operation. This opens the doors to further investigations. In particular, the algorithm generating the minimal hitting sets uses a procedure wich generates arbitrary inconsistent subsets. We think that this can be dramatically improved by the use of recent results in the domain of SAT solvers.
References 1. European project revigis, phase 2, no ist–1999–14189, revision of uncertain geographic information (2000-2004), http://www.cmi.univ-mrs.fr/revigis/ 2. Alchourrón, C.E., Gärdenfors, P., Makinson, D.: On the logic of theory change: Partial meet contraction and revision functions. J. Symb. Logic 50(2), 510–530 (1985) 3. Asher, N., Vieu, L.: Toward a geometry of common sense: A semantics and a complete axiomatization of mereotopology. In: International Joint Conference on Artificial Intelligence, IJCAI 1995, Montral, Canada, pp. 846–852. Morgan Kaufmann, San Francisco (1995), http://www.mkp.com/, ftp://ftp.irit.fr/IRIT/LILAC/AV-ijcai95.pdf
Revising Geographical Knowledge: A Model for Local Belief Change
187
4. Ben-Naim, J., Benferhat, S., Papini, O., Würbel, E.: An answer set programming encoding of prioritized removed sets revision: application to GIS. Applied Intelligence 32(1), 60–87 (2010) 5. Benferhat, S., Dubois, D., Prade, H.: Some syntactic approaches to the handling of inconsistent knowledge bases: A comparative study part 1: The flat case. Studia Logica 58(1), 17–45 (1997) 6. Chopra, S., Georgatos, K., Parikh, R.: Relevance sensitive non-monotonic inference on belief sequences. J. of Applied Non-Classical Logics 11(1-2), 131–150 (2001) 7. Chopra, S., Parikh, R.: An inconsistency tolerant model for belief representation and belief revision. In: IJCAI 1999: Proc. of the 16th International Joint Conference on Artificial Intelligence, pp. 192–199. Morgan Kaufmann Publishers Inc., San Francisco (1999) 8. Chopra, S., Parikh, R.: Relevance sensitive belief structures. Annals of Mathematics and Artificial Intelligence 28(1-4), 259–285 (2000) 9. Chopra, S., Parikh, R., Wassermann, R.: Approximate belief revision. Logic J. IGPL 9(6), 755–768 (2001) 10. Doukari, O.: Révision de l’information spatiale par confinement: Heuristiques pour hitting-sets. Master thesis, Université de Provence, Marseille (2006) 11. Doukari, O., Jeansoulin, R.: Space-contained conflict revision, for geographic information. In: 10th AGILE International Conference on Geographic Information Science, Aalborg, Danmark (2007) 12. Doukari, O., Würbel, E., Jeansoulin, R.: A new model for belief representation and belief revision based on inconsistencies locality. In: 19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007, Patras, Greece, pp. 262–269. IEEE Computer Society, Los Alamitos (2007) 13. Eiter, T., Gottlob, G.: On the complexity of propositional knowledge base revision, updates, and counterfactuals. In: PODS, pp. 261–273. ACM Press, New York (1992) 14. Fagin, R., Halpern, J.Y.: Belief, awareness, and limited reasoning. Artif. Intell. 34(1), 39–76 (1987) 15. Fisher, P., Frank, A.U., Gorte, B., Jeansoulin, R., Lingham, J., Molenaar, M., Navratil, G., Roy, A.J., Papini, O., Stell, J., Timpf, S., van der Vlugt, M., Worboys, M.F., Wurbel, E.: Revigis project: First step final report Tech. rep., Esprit project 27781 (1999), http://www.cmi.univ-mrs.fr/REVIGIS 16. Gärdenfors, P., Makinson, D.: Revisions of knowledge systems using epistemic entrenchment. In: TARK 1988: Proc. of the 2nd conference on Theoretical aspects of reasoning about knowledge, USA, pp. 83–95. Morgan Kaufmann Publishers Inc., San Francisco (1988) 17. Hansson, S.O., Wassermann, R.: Local change: A preliminary report. In: Fourth Symposium on Logical Formalizations of Commonsense Reasoning (1998) 18. Ja´skowski, S.: Propositional calculus for contradictory deductive systems. Studia Logica 24(1), 143–157 (1969) 19. Khelfallah, M.: Révision et fusion d’informations à base de contraintes linéaires: Application à l’information géographique et temporelle. PHD thesis, Université de Provence, Marseille (Décembre 2005) 20. Khelfallah, M., Benhamou, B.: A local fusion method of temporal information. In: Godo, L. (ed.) ECSQARU 2005. LNCS (LNAI), vol. 3571, pp. 477–488. Springer, Heidelberg (2005) 21. Kourousias, G., Makinson, D.: Parallel interpolation, splitting, and relevance in belief change. J. Symb. Logic 72(3), 994–1002 (2007) 22. Lewis, D.: Logic for equivocators. Noûs 16(3), 431–441 (1982)
188
O. Doukari, R. Jeansoulin, and E. Würbel
23. Liberatore, P., Schaerf, M.: The complexity of model checking for belief revision and update. In: AAAI/IAAI, vol. 1, pp. 556–561 (1996) 24. Nebel, B.: How hard is it to revise a belief base? In: Dubois, D., Prade, H. (eds.) Handbook of Defeasible Reasoning and Uncertainty Management Systems. Belief Change, vol. 3, pp. 77–145. Kluwer Academic Publishers, Dordrecht (1998) 25. Parikh, R.: Beliefs, belief revision, and splitting languages. Logic, language and computation 2, 266–278 (1999) 26. Peppas, P., Chopra, S., Foo, N.Y.: Distance semantics for relevance-sensitive belief revision. In: 9th International Conference on Principles of Knowledge Representation and Reasoning, KR 2004, Canada, pp. 319–328 (2004) 27. Raclot, D., Puech, C.: Photographies aérienne et inondations: globalisation d’informations floues par un système de contraintes pour définir les niveaux d’eau en zone inondée. Revue internationale de géomatique 8(1-2), 191–206 (1998) 28. Raclot, D., Puech, C.: What does ai contribute to hydrology? aerial photographs and flood levels. Applied Artificial Intelligence 17(1), 71–86 (1998) 29. Randell, D.A., Cohn, A.G.: Modelling topological and metrical properties of physical processes. In: Brachman, R.J., Levesque, H.J., Reiter, R. (eds.) 1st International Conference on Principles of Knowledge Representation and Reasoning, KR 1989, pp. 357–368. Morgan Kaufmann, Los Atlos (1989) 30. Randell, D.A., Cui, Z., Cohn, A.: A spatial logic based on regions and connection. In: Nebel, B., Rich, C., Swartout, W. (eds.) 3rd International Conference on Principles of Knowledge Representation and Reasoning, KR 1992, pp. 165–176. Morgan Kaufmann, San Mateo (1992) 31. REVIGIS Rapport du projet REVIGIS (2002), http://www.univ-tln.fr/papini/sources/REVIGIS-YEAR2/ REVIGIS-Y2.htm 32. Tennant, N.: Perfect validity, entailment and paraconsistency. Studia Logica 43(1-2), 181–200 (1984) 33. Williams, M.A.: Applications of belief revision. In: ILPS 1997: International Seminar on Logic Databases and the Meaning of Change, Transactions and Change in Logic Databases, pp. 287–316. Springer, London (1998) 34. Würbel, E.: Révision de connaissances géographiques. PHD thesis, Université de Provence, Marseille (2000) 35. Würbel, E., Papini, O., Jeansoulin, R.: Revision: an application in the framework of GIS. In: 7th International Conference on Principles of Knowledge Representation and Reasoning, KR 2000, pp. 505–516. Breckenridge, Colorado (2000) 36. Würbel, E., Papini, O., Jeansoulin, R.: Spatial information revision: A comparison between three approaches. In: Benferhat, S., Besnard, P. (eds.) ECSQARU 2001. LNCS (LNAI), vol. 2143, pp. 454–465. Springer, Heidelberg (2001)
Merging Expressive Spatial Ontologies Using Formal Concept Analysis with Uncertainty Considerations Olivier Cur´e
Abstract. In this chapter, we present a solution to the problem of merging structures that represent the conceptual layer of some information systems. The kind of structures we are studying correspond to expressive ontologies formalized in Description Logics. The proposed approach creates a merged ontology which captures the knowledge of a set of source ontologies. A main constraint to our solution consists in the fact that instances associated to the concepts of the source ontologies are available. Then it is possible to apply the techniques associated to Formal Concept Analysis. The main contributions of this work are (i) enabling the creation of concepts not originally in the source ontologies, (ii) providing a definition to these concepts in terms of elements of the source ontologies and (iii) handling the creation of merged ontologies based on the uncertainties encountered at the object and alignment levels. This approach is particularly useful in domains where ontologies are intensively exploited. This is typically the case for spatial information where for instance, the nature of land parcels can be characterized by a geographical ontology.
1 Introduction The information stored in Geographical Information Systems (GIS) usually needs to be exchanged and integrated between multiple applications. These tasks raise several important issues due to format and semantics heterogeneity but also to handle several forms of uncertainty that are encountered. For instance, we can distinguish between uncertainties at the application domain level and uncertainties at the integration/exchange level. Concerning the Olivier Cur´e Universit´e Paris-Est, IGM Terre Digitale, Marne-la-Vall´ee, France e-mail:
[email protected]
R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 189–209. springerlink.com
c Springer-Verlag Berlin Heidelberg 2010
190
O. Cur´e
first type of uncertainty, the information stored into GIS are almost always sampled. Some sampling uncertainties are related to geodesy’s positional accuracy or semantical accuracy when characterizing the nature of a sample. Uncertainty occurring in matching and mapping operations are considered as an important issue and several solutions have already been proposed, see [16] for a survey. In this chapter, we are also interested in the semantic issues. The integration of ontologies within the information system, usually to represent its conceptual layer, has been one approach to respond to these concerns. This is the case in GIS and the spatial information domain in general where several ontologies emerged recently. For instance, the Semantic Web for Earth and Environmental Terminology (SWEET) [29] provides an upper-level ontology for Earth system science. In the context of space parcels, the CORINE land cover [15] and the ATKIS catalogue [1] are terminologies that enable to characterize land-use types. Application designers also frequently generate their own ontologies to reply to some special needs. These ontologies may be created from scratch or by an alignment and extension from existing ontologies. Hence, it is well-known that many ontologies coexist in some specific domains, e.g. geographical information and medicine. With so many ontologies being produced, it is inevitable that some of their content overlaps and possibly disagrees on some concepts. In order to support ontology interoperability, it is required that these ontologies can be semantically related. Thus ontology mediation [14] becomes a main concern. Ontology mediation enables to share data between heterogeneous knowledge bases, and allows applications to reuse data from different knowledge bases. Ontology mediation takes two distinguished forms: (i) ontology mapping, where the correspondences between elements of two ontologies are stored separately from the ontologies. The correspondences are generally represented using axioms formulated in a peculiar mapping language. (ii) ontology merging, which consists in creating a new ontology from the union of source ontologies. The merged ontology is supposed to capture all the knowledge of the sources. Ontology mediation is an active research field where many kinds of solutions have been proposed: schema-based, instance-based, machine learninginspired, hybrid approaches; see [20], [16] for surveys on this domain. Thus the methods used in ontology mediation usually depend on the kind of information one can access about the local ontologies. For instance, the availability of instance datasets are highly desirable and generally ensure good mediation results. But the efficiency of these methods also depends on the kind of source ontologies the system is dealing with. In [23], the author presents an ontology spectrum which characterizes the expressiveness of several ontology solutions. In this paper, we are interested in declarative and logic-based formalisms to represent ontologies. In fact, we consider one of the currently most popular formalism, i.e. Description Logics (DLs). Popular because DLs are underpinning the Web Ontology Language (OWL) [7] proposed by the World
Merging Expressive Spatial Ontologies Using FCA
191
Wide Web Consortium. Thus this language is being used to represent a large number of ontologies in domains as diverse as social networking, medicine, bioinformatics and spatial information. Part of this popularity is also due to the availability of a number of ontology tools such as editors, e.g. Prot´eg´e, Swoop, and reasoners, e.g. Pellet, Racer, Fact, HermiT, to infer, usually with sound and complete methods, implicit knowledge from the explicitly represented one. In this chapter, we propose a solution to the ontology merging problem which is based on the techniques of Formal Concept Analysis (FCA) [17]. It extends [8] by dealing with expressive ontologies and their concept descriptions. FCA algorithms are machine learning techniques that enable the creation of a common structure, which may reveal some associations between elements of the original structures. Thus it requires that some elements from the source ontologies can be attached to a same observable item. Starting from this assumption, the processing of our FCA-based algorithms provides a merged ontology. Our solution extends existing FCA-based systems for ontology merging in the following way: (i) we provide a method to create concepts not originally in the source ontologies, (ii) we define emerging concepts in terms of elements of the source ontologies and (iii) we handle the creation of merged ontologies based on the uncertainty underlying the extension and alignment of source concepts. Step (i) is the classical approach named ontology alignment in FCA literature. Steps (ii) and (iii) are an extension of this alignment and exploit concept descriptions, DL reasoner functionalities and notions from possibility theory. The paper is organized as follows: in Section 2, we present some basic notions about FCA, the ALC description logic and possibilistic logic. In Section 3, we detail our method which enables to create an expressive merged ontology. The main steps are: concept generation, axiomatization of emerging concepts and optimization of the resulting ontology. Section 4 proposes a solution to deal with the different forms of uncertainty encountered. Section 5 relates our work with existing systems in ontology merging and collaborations between FCA methods and DLs. Section 6 concludes this chapter.
2 Background 2.1 Formal Concept Analysis and Galois Connection FCA is the process of abstracting conceptual descriptions from a set of objects described by attributes [17]. We use some of the methods associated to FCA to merge geographical ontologies. Intuitively, this means that we merge several ontologies in a context consisting of a set of objects (the extent), a set of attributes (the intent), one for each ontology, and a set of correspondences between objects and attributes. FCA is based on the notion of a formal context.
192
O. Cur´e
Definition 1. A formal context is a triple K = (G, M, I), where G is a set of objects, M is a set of attributes and I is a binary relation between G and M, i.e. I ⊆ G × M . For an object g and an attribute m, (g, m) ∈ I is read as “object g has attribute m”. Given a formal context, we can define the notion of formal concepts: Definition 2. For A ⊆ G, we define A = {m ∈ M |∀g ∈ A : (g, m) ∈ I} and for B ⊆ M , we define B = {g ∈ G|∀m ∈ B : (g, m) ∈ I}. A formal concept of K is defined as a pair (A, B) with A ⊆ G, B ⊆ M , A = B and B = A. The hierarchy of formal concepts is formalized by (A1 , B1 ) ≤ (A2 , B2 ) ⇐⇒ A1 ⊆ A2 and B2 ⊆ B1 . The concept lattice of K is the set of all its formal concepts with the partial order ≤. This hierarchy of formal concepts obeys the mathematical axioms defining a lattice, and is called a concept lattice (or Galois lattice) since the relation between the sets of objects and attributes is a Galois connection. We now introduce the notion of Galois connection which is related to the idea of order and plays an important role in lattice theory, universal algebras and recently in computer science [5]. Let (P, ) and (Q, ) be two partially ordered sets (poset). A Galois connection between P and Q is a pair of mappings (Φ, Ψ ) such that Φ : P → Q, Ψ : Q → P and: • x x implies Φ(x) Φ(x ), • y y implies Ψ (y) Ψ (y ), • x Ψ (Φ(x)) and y Ψ (Φ(y)) for x,x’ ∈ P and y,y’ ∈ Q. Several algorithms have been proposed to compute a concept lattice, some optimized ones are proposed in [6]. Intuitively, such an algorithm starts with the complete lattice of the power set of all individuals (the extent), respectively for attributes (the intent) and retains only the nodes closed under the connection. That is beginning with a set of attributes, the algorithm determines the corresponding set of objects which itself provides an associated set of attributes. If this set is the initial one, then it is closed and preserved otherwise the node is removed from the lattice.
2.2 The Description Logic ALC DLs are a family of knowledge representation formalisms allowing to reason over domain knowledge, in a formal and well-understood way. Central DL notions are concepts (unary predicates), roles (binary predicates) and individuals. A concept represents a set of individuals while a role determines a binary relationship between concepts. DLs are a fragment of first-order logic and thus concepts and roles are designed according to a syntax and a semantics. Some of the main assets of this family of formalisms are decidability, efficient reasoning algorithms and the ability to propose a hierarchy of languages with various expressive power.
Merging Expressive Spatial Ontologies Using FCA
193
A key notion in DLs is the separation of the terminological (or intensional) knowledge, called a TBox, to the assertional (or extensional) knowledge, called the ABox. The TBox is generally considered to be the ontology. Together, a TBox and an ABox represent a Knowledge Base (KB), denoted KB = T Box, ABox . The TBox is composed of “primitive concepts” which are ground descriptions that are used to form more complex descriptions, “defined concepts” which are designed using a set of constructors of the description language, e.g. conjunction( ), disjunction (), negation (¬), universal (∀) and existential (∃) value quantifiers, etc. The description language we are using in this paper correspond to ALC (Attributive Language with Complements). Concept descriptions in this language are formed according to the following syntax rule, where the letter A is used for atomic concepts, the letter R for atomic roles and the letters C and D for concept descriptions: C, D ::= ⊥ | | A | ¬C | C D | C D | ∃R.C| ∀R.C The terminological axioms accepted by ALC are sentences of the form C D and which are called General Concept Inclusion (GCI). The semantics generally adopted for the ALC language is based on Tarskistyle semantics. An interpretation I is a pair I = (Δ, ·I ), where Δ is a non-empty set called the domain of the interpretation and ·I is the interpretation function. The interpretation function maps: • Each atomic concept A to a subset AI of Δ. • Each atomic role R to a subset RI of Δ × Δ. • Each object name a to an element aI of Δ. The interpretation function can be inductively extended to concept descriptions as follows: • • • • • • •
⊥I = ∅ I = ΔI (C D)I = C I ∩ DI (C D)I = C I ∪ DI (¬C)I = ΔI \ C I (∃R.C)I = {a ∈ ΔI |∃b ∈ ΔI , (a, b) ∈ RI ∧ b ∈ C I } (∀R.C)I = {a ∈ ΔI |∀b ∈ ΔI , (a, b) ∈ RI → b ∈ C I }
In DLs, the basic reasoning service on concept expressions is subsumption, written C D. This inference service checks whether the first concept always denotes a subset of the set denoted by the second one. We use this service on the optimization of merged ontologies. Another service that we are using intensively is consistent checking of a knowledge base, i.e. an ABox A is consistent with respect to a TBox T, if there is an interpretation that is a model of both A and T [2].
194
O. Cur´e
Both domains, FCA and DL ontologies, use the term of concept. In the rest of this paper, concepts in the context of FCA (resp. DL ontology) are named formal concepts, resp. DL concepts. To clarify the distinction between them, we can state that DL concepts correspond to the attributes of K.
2.3 Possibilistic Logic Possibilistic logic, or possibility theory, [12] provides an efficient solution for handling uncertain or prioritized formulas and coping with inconsistency. In this theory, each formula is associated to a real value in [0,1]. The notion of possibility distribution π is fundamental to define this logic’s semantics and defined as π : Ω → [0, 1], where Ω represents the set of all classical interpretations. From a possibility distribution, two important measures can be processed: (i) the possibility degree of a formula φ, defined as Π(φ)= max{π(ω) | ω |= φ}, where ω(φ) is the degree of compatibility of interpretation ω with available beliefs. (ii) the certainty degree of a formula φ, defined as N (φ)=1 - Π(¬φ). A possibilistic formula is a pair (φ, α) where φ is a logic formula and α expresses a degree of certainty. A set of possibilistic formulas, also called a possibilistic knowledge base (PKB), has the form {(φi , αi )} with 1 ≤ i ≤ n The classical knowledge base (CKB) associated with PKB corresponds to {φi | (φi , αi ) ∈ P KB}. A PKB is consistent if and only if its CKB is consistent. Given a PKB and α ∈ [0, 1], the α-cut of PKB is: P KB≥α = { φ ∈ CKB | (φ, β) ∈ P KB and β ≥ α}. The inconsistency degree of PKB, denoted Inc(P KB), is defined as Inc(P KB) = max{αi | P KB≥α is inconsistent}. Recently, possibilistic logic has been studied in the context of DL [13], [22], [27]. In Section 4, we exploit some of these results.
3 Ontology Merging Using FCA In this section, we present the process of merging two source ontologies. In fact, the method can be used for several ontologies as long as these ontologies share elements of their datasets. That is ABoxes of these ontologies contain assertions about the same objects.
3.1 Source TBoxes Let us consider two geographical applications that manipulate space parcel data. Each application uses an independent ontology formalism to represent the concepts related to its data. Also the teams of experts that designed each ontology may not agree on the semantics of some concepts.
Merging Expressive Spatial Ontologies Using FCA
195
Nevertheless, the two applications need to exchange information, and thus require that some correspondences are discovered between their DL concepts. The following two ontology extracts, O1 and O2 , are used all along this paper. In order to ease the understanding and reading of our example, all concepts and roles are under scripted with the number of their respective ontology, i.e. ’1’ for O1 and ’2’ for O2 . Terminological axioms of ontology O1 : (1) CF1 ≡ F1 ∃vegetation1 .C1 (2) BLF1 ≡ F1 ∃vegetation1 .M1 (3) C1 M1 ⊥ This extract of ontology O1 defines two concepts, CF1 , standing for Coniferous Forest, and BLF1 , standing for Broad Leaved Forest, in terms of the concepts F1 (Forest), C1 (Coniferophyta) and M1 (Magnoliophyta). Line #1 states that the coniferous forest concept is defined as the intersection of the concept Forest of O1 and the concept having at least one vegetation being a coniferophyta. Line #2 defines the concept of a broad leaved forest accordingly with magnoliophyta. Line #3 states that the concepts coniferophyta and magnoliophyta are disjoint. Terminological axioms of ontology O2 : (4) (5) (6) (7)
CF2 ≡ F2 ∀vegetation2 .C2 ∃vegetation2 .C2 BLF2 ≡ F2 ∀vegetation2 .M2 ∃vegetation2 .M2 M F2 ≡ F2 ∃vegetation2 .C2 ∃vegetation2 .M2 C2 M2 ⊥
The study of O2 emphasizes that designers do not entirely agree on the semantics of forest related concepts of O1 . On line #4, the concept of a coniferous forest is defined has being a forest composed of at least coniferophyta vegetation and exclusively of this kind of vegetation. Line #5 defines the concept of broad leaved forest accordingly with magnoliophyta. In order to represent other kinds of forests, the designers of O2 define a mixed forest concept as the intersection of being a forest with at least one coniferophyta vegetation and at least one magnoliophyta vegetation. Finally Line #8 states that the concepts coniferophyta and magnoliophyta of O2 are disjoint. Merging the ontologies O1 and O2 with some other ontologies would require that the TBoxes for these new ontologies are available and are no more expressive than ALC.
3.2 Source ABoxes Given the kind of TBoxes presented in the previous section, e.g. ALC, we consider DL knowledge bases with non-empty ABoxes. In a first step, we map the information of the two ABoxes on a common set of observed objects.
196
O. Cur´e
The information of these ABoxes can be stored in a structured or unstructured format. It is interesting to note that the activity of several research teams in the DL and Semantic Web community focuses on studying cooperations between the domains of databases and knowledge bases represented in a DL. For instance, the authors of [26] recently claimed that the ideal solution would be to have the individuals of the ABox stored in a relational database and represent the schema of this database in a DL TBox. Also tackling this same objective, the team supporting the Pellet reasoner, one of the most popular OWL reasoners, recently released OWLgres which is being defined by their creators as a ’scalable reasoner for OWL2’ (the latest version of the OWL). A main objective of this tool is to provide a conjunctive query answering service using SPARQL and the performance properties of relational database management systems. Hence, using such an approach, the set of observed objects may be retrieved from existing relational database instances. The mapping we propose between both ontologies can be represented by a matrix, either generated by a specific tool and/or by interactions with endusers. In order to map concepts of both ontologies via the selected set of observed objects, a reference reconciliation tool may be used [10]. Using an approach that exploits a relational database as the data container for the ontology ABox enables to use existing FCA tools. This is the case of the ToscanaJ suite [4] which provides features for database connectivity. We present a sample of this mapping in Table 1: the rows correspond to the objects of K, i.e. common instances of the KB’s ABox, and are identified by integer values from 1 to 6 in our example. In the context of geographical information, these values identify spatial parcels. The columns correspond to FCA attributes of K, i.e. concept names of the two TBoxes. In the same table, we present, side by side, the formal concepts coming from our two ontologies, i.e. CF1 , BLF1 , F1 from O1 , and CF2 , BLF2 , M F2 , F2 from O2 . Thus this matrix characterizes the type of spatial parcels in terms of two different ontologies. Merging more than two ontologies would require that the individuals of the ABox belong to the extension of the concepts of these ontologies. That is concepts from a third ontology can be added to the columns of Table 1 and objects of the ABox (rows of the table) are instances of these new concepts.
3.3 Generation of the Galois Connection Lattice The matrix is built using the information stored in the TBox and ABox of both ontologies: - first, for each row, mark the columns where a specific instance is observed, e.g. the object on line #1 is an instance of the CF1 and CF2 concepts. Thus ABox information is used in this step.
Merging Expressive Spatial Ontologies Using FCA
197
- then, complete the row with the transitive closure of the subsumption relation between ontology concepts, e.g.: line #1 must be also marked for DL concepts F1 and F2 , as respective ontologies entail that: CF1 F1 and CF2 F2 . Here, the concept hierarchy of TBoxes is exploited. Table 1 Sample dataset for our ontology merging example CF1 BLF1 1 x 2 x 3 x 4 x 5 x 6 x
F1 CF2 BLF2 M F2 x x x x x x x x x x x x
F2 x x x x x x
It is interesting to note that lines #3 and #6 emphasize different assumptions for their respective parcels. For instance, the parcel corresponding to line #3 has been defined as a coniferous forest using the classification of O1 while, possibly due to a vegetation not limited to coniferophyta, it has been defined as a mixed forest using O2 . The same kind of approach applies to the parcel associated to line #6. Using Table 1 with the Galois connection method [9], we obtain the lattice of Figure 1, where a node contains two sets: a set of objects (identified by the integer values of the first column of our matrix) from K (extension), and a set of DL concepts from the source ontologies (intension), identified by the concept labels of source ontologies.
3.4 Dealing with Emerging Concepts In order to concentrate solely on the intensional aspect of the lattice, i.e. the TBox, we now remove the extensional part of each node of the lattice. Hence, the only set present in each node correspond to concept names (Figure 2). Considering that the relationship holding between two nodes in this lattice corresponds to an inheritance property, it is possible to minimize each node’s set by removing concept names that are present in an inherited node. The method we propose consists in deleting repeated occurrences of a given concept name along a path of the lattice and thus to obtain a minimal set of concept names for each node. Next, we define this notion of minimality: Definition 3. Given a node N in the Galois connection lattice and a set of concept symbols S contained in its intension fragment. We consider that S is minimal for N if and only if there is no S’ for N such that |S | < |S|, where |S| denotes the size of S. Due to the lattice structure obtained by applying the Galois connection method, we can proceed by using a top-down navigation, i.e. starting from
198
O. Cur´e
Fig. 1 Galois connection lattice
the top concept (Top), on the concepts of the merged ontology. Basically, this algorithm (named optimizeLabel and presented in Algorithm 1) proceeds as follows: for a given formal concept C of the lattice, it computes all its children c (line #1) and checks if a concept symbol used to characterize C is used in the concept name set for c (line #2). If this the case, this symbol is removed from their set of c (line #3) otherwise the set of symbols of c remain unchanged. Finally, the method is applied recursively to each concept c until all concepts are processed (line #5). Algorithm 1 optimizeLabel (Concept C) 1 FOR EACH child c of C DO 2 IF label(C) ∈ label(c) THEN 3 remove label(C) from label(c) 4 END IF 5 optimizeLabel(c) 6 END DO Processing this algorithm on our running example, yields Figure 2 where lattice nodes contain singleton sets, corresponding to concept names from some of the source ontologies or newly introduced symbols, e.g. α, which replace empty sets. Several kinds of nodes, in terms of the size of a name set,
Merging Expressive Spatial Ontologies Using FCA
199
Fig. 2 Galois connection lattice with “empty nodes”
can be generated with this method. Basically, it is important to distinguish between the following three kinds of nodes: 1. a singleton: a name of a concept from some of the source ontologies, because it can be distinguished from any of its successors by this specific name, e.g. this is the case for the {CF1 }. lattice node. 2. an empty set, denoted by a variable (α), because it can not be directly distinguished from any of its possible successors. We have 2 such nodes in Figure 2, namely α and β. 3. a set of several concept symbols, all belonging to source ontologies, because the mediation based on the given ABoxes, has not been able to split the concepts into several nodes. Indeed, it is as if the two names are glued together in a single concept name. In our running example, we have one such node with concept set {F1 , F2 }. All singletons are maintained in the resulting merged ontology and we are now aiming to provide a concept description to the remaining concepts, case 2 and 3 of our node categorization. The first step toward our solution is to expand the concepts of the merged ontology according to their respective TBoxes [2]. That is, we replace each occurrence of a name on the right handside of a definition by the concepts that it stands for. A prerequisite of this approach is that we are dealing with acyclic TBoxes. Thus this process stops and the resulting descriptions contain only primitive concepts on the right hand-side.
200
O. Cur´e
We first deal with the nodes which are formed of several concept symbols, denoted σi , e.g. the node labeled F1 , F2 in Figure 2. Due to the fact that the algorithm adopted results from the generation of the Galois connection lattice [9], these nodes appear at the top of the lattice and do not have multiple inheritance to concepts that are not of this form. Thus we adopt a top-down approach from the top concept () of our merged ontology. We consider that the concepts associated are equivalent, e.g. F1 ≡ F2 , since they have exactly the same extension. We also propose a single concept symbol σ, e.g. F (Forest) for F1 , F2 , and associate information to this concept stating that this concept is equivalent to the original concepts for interoperability reasons, e.g. F ≈ F1 and F ≈ F2 . Now all occurrences of the concept σi are translated into the concept symbol σ in the concept descriptions of the merged ontology. We can now concentrate on the nodes with empty sets, e.g. α and β. According to the Galois based lattice creation, these nodes can not be at the root of the lattice. This means that they inherit from some other concept(s). We use the description of these inherited concept(s) to provide a description. Using this method, the concepts α and β of Figure 2 have the following description: α ≡ CF1 M F2 ≡ F ∃vegetation1 .C1 ∃vegetation2 .C2 ∃vegetation2 .M2 β ≡ BLF1 M F2 ≡ F ∃vegetation1 .M1 ∃vegetation2 .C2 ∃vegetation2 .M2 All concepts from the merged ontology have been associated to a concept description, except of course the primitive concepts. Alignments between primitive concepts and roles of the source ontologies are able to refine the merged ontology. Later in this section, we will propose solutions to finding these alignments and dealing with their uncertainty, but we now present the impact of providing such correspondences between TBox elements. Suppose that we are being provided the following alignments: C1 ≡ C2 , M1 ≡ M2 and even vegetation1 ≡ vegetation2. So we can easily introduce some concept symbols to simplify the different equivalences: (8) C ≡ C1 ≡ C2 , M ≡ M1 ≡ M2 and vegetation ≡ vegetation1 ≡ vegetation2 We are then able to modify the descriptions of the merged ontology and we denote this TBox as Om1 : (9) (10) (11) (12) (13) (14) (15) (16)
CF1 ≡ F ∃vegetation.C BLF1 ≡ F ∃vegetation.M CF2 ≡ CF1 ∀vegetation.C ∃vegetation.C BLF2 ≡ BLF1 ∀vegetation.M ∃vegetation.M M F2 ≡ F ∃vegetation.C ∃vegetation.M α ≡ F ∃vegetation.C ∃vegetation.M β ≡ F ∃vegetation.C ∃vegetation.M C M ⊥
Merging Expressive Spatial Ontologies Using FCA
201
We can notice that the descriptions for the concepts α, β and M F2 are the same. Thus we can state that M F2 ≡ α ≡ β. Finding such equivalences, or subsumption relationships, is easily processed by a DL reasoner. This result is comforted by the fact that starting from the ontologies O1 and O2 and the alignments of (8), any DL reasoner is able to provide the ontology Om1 , assuming that we have the alignment F ≡ F1 ≡ F2 (which has been deduced from our Galois lattice). The lattice corresponding to this new ontology is depicted in Figure 3.
Fig. 3 Lattice corresponding to merged ontology Om1
Of course, alignments different from (8) can be proposed between primitive concepts and roles of O1 and O2 . For instance, if we consider the alignments in (17), then the optimized merged ontology again correspond to Figure 3. (17) M2 M1 , C ≡ C1 ≡ C2 and vegetation ≡ vegetation1 ≡ vegetation2 . Concentrating on the relationships between M1 and M2 , alignments other than (8) and (17) can generate different merged ontologies. Let consider the alignments in (18) where the only difference with (8) and (17) is that now M1 is a subconcept of M2 : (18) M1 M2 , C ≡ C1 ≡ C2 and vegetation ≡ vegetation1 ≡ vegetation2 . Then looking at the descriptions of BLF1 and BLF2 (respectively (2) and (4) in Section 3), we can no longer state that BLF1 BLF2 . We consider that the alignments of (18) do not contradict our FCA-based method but instead refine the constructed lattice of Figure 2. In fact, this lattice is the
202
O. Cur´e
result of applying a Galois connection based algorithm from a given dataset. This dataset can be considered to be a model of the merged ontology but it is only one of the possible models for this ontology. The statements in (18) say that instances of BLF2 need not to be instances of BLF1 , a situation that was not present on our dataset (Table 1). Moreover, the statements of (18) allow us to state that M F2 CF1 and α ≡ M F2 but prevent us from saying that M F2 BLF1 . The lattice corresponding to this new merged ontology, which we denote Om2 , is presented in Figure 4.
Fig. 4 Lattice corresponding to merged ontology Om2
In terms of the DL model-theoretic semantics presented in 2.2, the M F2 ¬BLF1 axiom makes the ABox represented in Table 1 inconsistent with the merged ontology of Figure 4. Recall that an ABox A is consistent with respect to a TBox T , if there is an interpretation that is a model of both A and T . Intuitively, M F2 ¬BLF1 states that it is not possible to be an instance of both BLF1 and M F2 in a given model which is not case of object #6 in our dataset. This raises the issue of the confidence one has on the existence of an object, of an alignment and to their relationship. For instance, we can have a greater confidence on the statements of (18) than on the existence of object #6. This would yield a merged ontology similar to the one presented in Figure 4 but without the β concept. We will provide details on the notion of confidence values when introducing our solution to deal with uncertainties in Section 4. In summary, we can generate different ontologies based on the fact that we are able to propose different alignments, to assign them confidence values
Merging Expressive Spatial Ontologies Using FCA
203
and to assign confidence values to some objects of our sample dataset matrix. In order to provide alignments between source ontologies, we consider the following two approaches: these alignments originate from external ontologies or they are provided by the end-user. 3.4.1
Alignments Originating from External Knowledge
The alignments of primitive concepts and roles can be provided by an external knowledge source. This is in fact frequently the case when designing ontologies. Early in the design process, a background ontology, preferably recognized as a standard in the application domain, is identified and imported in the source ontologies. It is likely that the source ontologies we consider for fusion, import some common parts of a given background ontology, e.g. it can be the case in spatial information with the SWEET ontology. Then the alignment of some imported primitive concepts and roles is straightforward and less subject to some uncertainty since their interpretations are identical. 3.4.2
End-User Defined Alignments
In cases alignments can not be provided by some background knowledge, endusers can define their own correspondences between concepts and roles. In such a situation, different end-users may provide differing alignments. Also, an end-user may not be totally confident on an alignment she is providing. This uncertainty aspect needs to handled by the system in order to propose the most adequate merged ontology.
4 Dealing with Uncertainty In the previous section, we highlighted several situations characterized by some forms of uncertainty. In particular, we highlighted uncertainties at the ’object-level’, that is we are not totally confident in the correctness of some of our dataset objects. We also emphasized on uncertainties at the ’alignmentlevel’, that is one can be more or less confident on the correspondences set between concepts and roles of the source ontologies. In order to deal with these uncertainties, we use possibilistic logic to encode both object and alignment confidences within a DL knowledge base context. Concerning the setting of confidences on objects of the source datasets, we do not believe that an automatic solution can produce reliable and relevant confidence values. Hence, it is necessary to integrate the end-user, generally a domain expert, in the process of setting these certainty levels. Two solutions can be envisioned: (i) ask the end-user to assign confidence values to all the tuples of the dataset, (ii) assume that the dataset is sound and ask the end-user to set certainty degrees only on the tuples that are causing inconsistencies. Solution (i) can not be realistically implemented since the dataset may be very large and the end-user may not have the time and knowledge to assign a
204
O. Cur´e
confidence value to each tuple. In this perspective, solution (ii) is much more realistic and efficient since we are asking the end-user to study only a subset of the dataset objects. This is based on the assumption that the data contained in practical databases is sound and that only a subset of it may be erroneous. Hence, this approach requires that all objects are first set to a default value of 1, i.e. assuming soundness, recall that confidence are set in [0,1]. It also implies that the system provides a solution to check consistency of the knowledge base and is able to identify objects responsible for inconsistencies. Such a solution is already implemented in several DL reasoners, e.g. Pellet. Once the knowledge base has been detected as inconsistent, we invite the end-user to refine the confidence value of each object responsible for the knowledge base inconsistency. The next question to ask ourselves is: when to check the consistency of the (merged) knowledge base ? In fact, this knowledge base can only be detected inconsistent after the application of some alignments. This is due to the consistency of the merged ontology computed from our Galois connection based solution. We will come back to this inconsistency aspect but first, we would like to make precise the definition of uncertainties on the alignments. We consider that alignments originating from some external knowledge or deduced by our FCA solution (e.g. F1 ≡ F2 ) are set with a default value of 1. This assumption is motivated by the following facts: • the quality of the external ontology generally imported in specific ontologies. That is, we consider the import of an ontology fragment as a strong end-user commitment which ensures the adequacy and quality of this external ontology. • in practice, our FCA solution only computes concept equivalence on large concept extensions which are likely to be correct. Nevertheless, the end-user has the ability to refine confidence values on any alignment. Each alignment proposed by the end-user requires a confidence value which can only be defined manually. Consider our running example and the alignments of (18), we can define the following set of possibilistic formulas for our alignments : {(F1 ≡ F 2, 1),(M 1 M2 , 0.5),(C1 ≡ C2 , 0.9),(vegetation1 ≡ vegetation2 , 1)}. That is, we are totally confident on the following alignements: F1 ≡ F2 and vegetation1 ≡ vegetation2 . But to a certain extent, we are not totally confident of the correctness of C1 ≡ C2 and M1 M2 since their degree of certainty are respectively of 0.9 and 0.5. The process of generating a consistent merged ontology with respect to a set of alignments and some certainty levels can be defined by the following algorithm.
Merging Expressive Spatial Ontologies Using FCA
205
Algorithm 2 createOntology (Ontology O, Alignment Al, Dataset D) 1 create an ontology O from O and Al. 2 create an ABox A from O objects of D 3 WHILE (< O , A > is inconsistent) 4 I(D) = set of objects making < O , A > inconsistent 5 Ask end-user to set confidence values to assertions involving I(D) 6 END WHILE 7 classify O 8 return O The understanding of this algorithm is relatively straightforward. Our FCA-based solution generates a merged ontology which is later refined by a set of alignments (step 1). Moreover, the object matrix of our source ontologies is transformed into a unique ABox (step 2). All axioms are associated with a certainty degree which makes this knowledge base of a possibilistic one. Initially, these certainty levels correspond to the value 1 for all concepts and objects of the knowledge base and axioms of the alignments can be refined by the end-user. We now need to clarify the notion of consistency checking of step 3 in the context of a possibility logic theory. The notion of consistency of a possibilistic knowledge base (PKB) is related to its possibility distribution, already presented in Section 2, and is denoted as πP KB . This possibility distribution over the set I of all classical DL interpretations enables to define the semantics of a DL PKB. In this DL context, a PKB corresponds to P T Box, P ABox where P T Box and P ABox are respectively a possibilistic TBox and ABox. The classical DL axioms associated to P T Box (resp. P ABox) is T Box, i.e. {φi | (φi , αi ) ∈ P T Box} (resp. ABox defined similarly) and KB = T Box, ABox . With α ∈ [0, 1], the α-cut of P T Box is (defined similarly for P ABox): P T Box≥α = { φ ∈ T Box | (φ, β) ∈ P T Box and β ≥ α}. Thus, P KB≥α = P T Box≥α , P ABox≥α . The possibility distribution of an interpretation I, denoted πI , of PKB can be defined as follows: 1 if ∀φi ∈ P KB, I |= φi πI = 1 − max{αi |I |= φi , (φi , αi ) ∈ P KB} otherwise Then a PKB is consistent if and only if πP KB |= PKB and we are now able to check the consistency necessary for step 3 of our algorithm. In order to identify the instances responsible for the inconsistencies, we use the instance checking inference service which states that an individual a is a plausible instance of a concept C wrt a PKB if P KB>Inc(P KB) |= C(a), where P KB>α , the strict α-cut, is defined as follows: {φ ∈ CKB|(φ, β) ∈ P KB and β > α}.
206
O. Cur´e
In the context of our running example with alignments (18), the first two lines of the createOntology algorithm generates the Om2 ontology (Figure 4) with an ABox containing the 6 objects of the Table 1. For this first iteration of the algorithm’s loop, all certainty degrees are set to the value 1. Hence we are in a context of a classical DL knowledge base which is inconsistent since the intersection of M F2 and BLF1 is not empty. In fact using instance checking, we are able to identify object #6 (denoted obj#6) as a source of this inconsistency. In step (5) of our our algorithm, the end-user is proposed to modify the certainty level associated to object #6 (denoted obj#6) for the most specific concepts it is an instance of in Om2 . That is setting new certainty levels for both β and M F2 on this object. Suppose that for some reasons, the end-user is aware that some parcels are not rpecisely classified in some source ontologies. Then she has the opportunity to modify the certainty levels of this object to a value of 1 for M F2 and 0.3 for β. Let α=0.3, we have P KB≥0.3 = {P T Box≥0.3 , P ABox≥0.3 } where the formulas M F2 ¬BLF1 , β BLF1 are contained in P T Box≥0.3 and P ABox≥0.3 contains the assertions M F2 (obj#6) and β(obj#6) (respectively stating that object #6 is an instance of M F2 and β). It is clear that P KB≥0.3 is inconsistent. Now let α=0.5, then P KB≥0.5 = {P T Box≥0.5 , P ABox≥0.5 } where P T Box≥0.5 contains the formula M F2 BLF1 and P ABox≥0.5 contains the assertions M F2 (obj#6) P KB≥0.5 is clearly consistent. Therefore Inc(PKB)=0.3. Hence, our method enables to compute the inconsistency degree of a possibilistic DL knowledge base and to detect consistent merged ontology based on different certainty levels set on objects and alignments.
5 Related Work In this Section, we survey related works in ontology mediation solutions and in particular we present some solutions which exploit extensions of the ontologies, i.e. ABoxes. In the literature, two distinct approaches in ontology merging have been distinguished. In the first approach, the merged ontology captures all the knowledge of the source ontologies and replaces them. An example of such a system is presented in [25] with the PROMPT tool. In the second approach the source ontologies are not replaced by the merged ontology, but rather a socalled ’bridge ontology’ is created. The bridge ontology imports the original ontologies and defines the correspondences using axioms which are called “bridge axioms”. An example of such an approach is the Ontomerge solution which has been described in [11]. The most relevant work related to our solution is the FCA-merge system [28]. It uses instances of ontology classes to exploit an FCA algorithm. The FCA-merge system produces a lattice of concepts which relates concepts from the source ontologies. This new concept lattice is then handed to the domain expert in order to generate the merged ontology. Thus we can consider
Merging Expressive Spatial Ontologies Using FCA
207
FCA-merge to be a semi-automatic solution while our solution aims to generate the merged ontology automatically. So the main differences are that the FCA-merge is unable to propose concepts emerging from the fusion of the source ontologies and does not propose a label generation solution. Also, without the help of domain experts, the FCA-merge system is not able to refine the merged ontology. The Ontex (Ontology Exploration) method, presented in [19], also tackles the tasks of creating and merging ontologies using the knowledge acquisition technique of Attribute Exploration [18] encountered in FCA. For both ontology creation and merging, the Ontex method concentrates on providing a high quality conceptual hierarchy of the top-level concepts. Considering the merging task, Ontex provides an interactive knowledge acquisition technique for the top-level concepts. The other concepts of the of the merged ontology can be created using heuristics-based approaches. Comparatively, our approach does not limit the concept processes in terms of levels in the hierarchy. Considering works involving FCA methods and DLs, it is interesting to study [3]. In this paper the authors are concerned with the completeness quality dimension of TBoxes, i.e. they propose techniques to enable ontology engineers in checking if all the relevant concepts of an application domain are present in a TBox. Like our approach, one of their concerns is to minimize interactions with domain experts. Hence FCA techniques are being used to withdraw trivial questions that may be asked to experts in case of incomplete TBoxes. The approach we presented in this paper is more concerned with the generation and optimization of a mediated ontology. We can also consider that our approach is more involved in the soundness quality dimension and tackles the issue of generating different forms of merged ontology.
6 Conclusion In this paper, we presented an approach to merge DL ontologies based on the methods of FCA. Our main contribution enables the creation of concepts not originally in the source ontologies and the description of the concepts in terms of elements of source ontologies. Moreover, through the management of several forms of uncertainty (at the object and alignment levels), with DL extended to possibility theory, we are able to easily handle the creation of different merged ontologies. We have presented this approach in the geographical domain but it can be exploited in all fields where ontologies are used. We are currently testing its usefulness in the context of life science applications, i.e. medicine and pharmacology. Future work on this system are related to extracting automatically an optimized set of instances from ABoxes for the Galois connection matrix. In particular, we would like to provide a notion of weights to objects of the
208
O. Cur´e
matrix, i.e. the number of tuples from the source ABox that satisfy a given distribution over the source ontology concepts. For instance, the weight of object #1 in Table 1 would be 2 and we could remove object #2 from the matrix. This approach would enable to load more compact matrices in our solution. Moreover, we would like to pursue investigations in using possibilistic theory in the context of ontology mediation. Another direction for future work consists in studying more expressive DLs, i.e. going beyond the ALC language. For instance, we aim to study the SHIF and SHOIN DLs which underpin the Lite and DL OWL species. This would open our solution to an important number of existing and widely used ontologies designed for the Web.
References 1. AdV, Amtliches Topographisch Kartographisches Informatoinssyetm ATKIS. Technical report, Landdesvermessungsamt NRW, Bonn (1998) 2. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, New York (2003) 3. Baader, F., Ganter, B., Sertkaya, B., Sattler, U.: Completing Description Logic Knowledge Bases Using Formal Concept Analysis. In: Proc. IJCAI 2007, pp. 230–235 (2007) 4. Becker, P., Correia, J.-H.: The toscanaj suite for implementing conceptual information systems. In: Formal Concept Analysis State of the Art, Springer, Heidelberg 5. Birkhoff, G.: Lattice Theory, 3rd edn., American Mathematical Society. Colloquium Publications (1973) 6. Choi, V.: Faster Algorithms for Constructing a Concept (Galois) Lattice., CoRR, abs/cs/0602069 (2006) 7. Cuenca Grau, B., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P., Sattler, U.: OWL 2: The next step for OWL. J. Web Semantics 6(4), 309–322 (2008) 8. Cur´e, O., Jeansoulin, R.: An FCA-based Solution for Ontology Mediation. In: Proc. ONISW 2008, pp. 39–46 (2008) 9. Davey, B., Priestley, H.: Introduction to lattices and Order. Cambridge University Press, New York (2002) 10. Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: Proc. SIGMOD 2005, pp. 85–96 (2005) 11. Dou, D., McDermott, D., Qi, P.: Ontology translation by ontology merging and automated reasoning. In: Proc. EKAW 2002, pp 3–18 (2002) 12. Dubois, D., Lang, J., Prade, H.: Possibilistic logic. In: Handbook of logic in artificial intelligence and logic programming, pp. 439–513. Oxford University Press, Oxford (2004) 13. Dubois, D., Mengin, J., Prade, H.: Possibilistic uncertainty and fuzzy features in description logics. A preliminary discussion. In: Capturing Intelligence: Fuzzy Logic and the Semantic Web, pp. 101–113. Elsevier, Amsterdam (2006) 14. Ehrig, M.: Ontology Alignment: Bridging the Semantic Gap. Springer, Heidelberg (2006)
Merging Expressive Spatial Ontologies Using FCA
209
15. Corine land cover, technical guide. Technical report, European Environmental Agency. ETC/LC, European Topic Centre on Land Cover 16. Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007) 17. Ganter, B., Wille, R.: Formal Concept Analysis: mathematical foundations. Springer, New York (1999) 18. Ganter, B.: Attribute Exploration with Background Knowledge. TCS 217(2), 215–233 (1999) 19. Ganter, B., Stumme, G.: Creation and Merging of Ontology Top-Levels. In: Proc. ICCS 2003, pp. 131–145 (2003) 20. Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. Knowledge Engineering Review 18(1), 1–31 (2003) 21. Kanellakis, P.C.: Elements of relational database theory. In: Handbook of theoretical computer science. Formal Models and Semantics, vol. B, pp. 1073–1156. MIT Press, Cambridge (1990) 22. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in description logics for the Semantic Web. Journal Web Semantics 6(4), 291–308 (2008) 23. McGuinness, D.: Ontologies Come of Age. In: Fensel, D., Hendler, J., Lieberman, H., Wahlster, W. (eds.) Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, Cambridge (2003) 24. Motik, B., Horrocks, I., Sattler, U.: Bridging the gap between OWL and relational databases. In: Proc. WWW 2007 (2007) 25. Noy, N., Musen, M.: PROMPT: Algorithm and tool for automated ontology merging and alignment. In: Proc. AAAI 2000 (2000) 26. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking Data to Ontologies. Journal of Data Semantics 10, 133–173 (2008) 27. Qi, G., Pan, J., Ji, Q.: A Possibilistic Extension of Description Logics. In: Proc. DL 2007 (2007) 28. Stumme, G., Maedche, A.: FCA-MERGE: Bottom-Up Merging of Ontologies. In: Proc. IJCAI 2001, pp. 225–234 (2001) 29. Semantic Web for Earth and Environmental Terminology (SWEET), http://sweet.jpl.nasa.gov/ontology/
Generating Fuzzy Regions from Conflicting Spatial Information Steven Schockaert and Philip D. Smart
Abstract. Applications such as geographic information retrieval need to deal with spatial information of a very diverse nature, typically involving a mixture of qualitative spatial relations and quantitative geographic constraints. Since existing approaches to spatial reasoning cannot easily be adapted to such a setting, we argue for a more direct approach, in which available information is viewed as constraints on possible representations of spatial regions, and we propose a specific implementation based on genetic algorithms. In the second part of this chapter, we turn our attention to the problem of inconsistencies. As no complete procedures for consistency checking are available, in general, instead of trying to repair the given knowledge base, we try to find representations of regions that satisfy available information to the best extent possible. To avoid arbitrary decisions, and to obtain solutions that emphasize, rather than ignore, the conflicting nature of different views, we advocate the use of fuzzy sets to represent regions. Finally, the results of a number of experiments are presented to demonstrate the effectiveness of the overall approach.
1 Introduction In most application domains, precise boundaries of geographic regions are either not available at all, or only available through expensive licensing. On the other hand, freely available information about the approximate location of regions abounds on Steven Schockaert Ghent University, Department of Applied Mathematics and Computer Science, Krijgslaan 281, 9000 Gent, Belgium e-mail:
[email protected] Philip Smart Cardiff University, School of Computer Science, 5 The Parade, Roath, Cardiff, UK e-mail:
[email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 211–239. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
212
S. Schockaert and P.D. Smart
the web, in the form of natural language statements (mostly topological, distance, and orientation relations), semi-structured knowledge bases such as wikipedia1 (mostly topological relations and centroids), user-contributed content such as geonames2 and wikimapia3 , and even genuine gazetteers containing topological relations (the Alexandria gazetteer4 ) or, more commonly, centroids (Alexandria gazetteer, Tiger gazetteer5 ). The spatial information that can be acquired from these sources can be further augmented by spatial information that is asserted implicitly, such as geotagged photographs on flickr6 , geographic coordinates that are obtained by geocoding addresses from web documents, etc. For instance, the coordinates of a photo that has been tagged with lovely restaurant in the Quartier Latin, Paris are most probably contained in the region of Paris called Quartier Latin. While each of the aforementioned sources does not, individually, contain sufficient information to reconstruct reasonable approximations of the boundaries of a given region, together they may provide a surprisingly clear picture of where each region is roughly situated. Besides gathering all the relevant spatial information from the web, two important problems need to be addressed, if we want to construct approximate boundaries in this way. First, we need a technique to generate polygons that satisfy a range of very heterogeneous spatial constraints. For a given region A, we may know, for instance, that it contains points p1 , . . . , pn (whose coordinates are given), that its minimal bounding box is equal to a given rectangle R, that it is located North-East of B, adjacent to C, and within a 2 kilometer radius from D. Complete algorithms to check if such a description is consistent do not currently exist, let alone techniques that are guaranteed to find region boundaries (polygons) that satisfy all the available constraints. This problem was first addressed in [41, 43], where it is proposed to discretize the problem using Delaunay triangulation, and to solve the resulting combinatorial optimization problem using a combination of genetic algorithms (GAs; [13]) and ant colony optimization (ACO; [8]), two well-known metaheuristics. The second problem is related to the lack of guarantees about the correctness and accuracy of the acquired spatial information. As a result, the combined set of spatial knowledge we start with is very likely inconsistent. When this methodology is applied to vernacular regions, the situation becomes even worse: as different people are unlikely to agree exactly on the boundaries of such regions, different assertions about them may have originated from different points of view, making inconsistencies even more likely. However, in [41] and [43], the problem of potential conflicts is only addressed in a limited way. Most importantly, to avoid conflicts it is proposed in [43] to use point data in an indirect way only. The main reason is that such information is usually very error-prone. Indeed, point data is often obtained using recall-oriented heuristics, whose precision is known to be suboptimal [20, 37]. To 1 2 3 4 5 6
http://www.wikipedia.org/ http://www.geonames.org/ http://wikimapia.org/ http://www.alexandria.ucsb.edu/ http://www.census.gov/cgi-bin/gazetteer http://www.flickr.com/
Generating Fuzzy Regions from Conflicting Spatial Information
213
cope with such noisy point data, all points that are assumed to lie in a given region are converted to a kernel density surface [3], and this surface is used as heuristic information in the ACO part of the algorithm. In this way, point data is still used to generate better polygons, but information about an individual point is not treated as a hard constraint. The aim of this chapter is to investigate how generating fuzzy regions, rather than crisp polygons, may help to make the overall algorithm more robust in the face of inconsistencies. The use of fuzzy regions is motivated by the observation that given an inconsistent set of spatial constraints, it is usually possible to find fuzzy regions satisfying each of the constraints to some non-zero degree. This has several important advantages. First, when the regions involved are vague (e.g. the centre of Toulouse, the area around the Eiffel tower, Southern England, the Alps, etc.) it is quite likely that each of the conflicting constraints is valid to some extent, despite that together they are classically inconsistent. In other words, because the boundaries of the regions involved are vague, one may take different points of view regarding the spatial relationship between them. In this case, it is clearly more natural to represent regions as fuzzy sets than to ignore some of the given relations. Second, when inconsistencies are not caused by vagueness, the resulting fuzzy sets serve as a compact representation of various possible worlds. In general, having many fuzzy regions in the solution then indicates that there are a large number of inconsistencies in the presented information. The fuzzy regions that will be obtained are all representable as a small number of nested polygons, i.e. we do not deal with fuzzy regions with continuous membership functions. As it will turn out, such regions are easier to obtain in this context, and they are easier to manipulate afterwards (e.g. to store them in a spatially indexed database). Moreover, most of these fuzzy regions will consist of only one or two such polygons. In the first case, we simply have a classical region representation. Also the second case, where a fuzzy region is represented by two nested polygons, is very common in the spatial reasoning literature [6, 10, 4, 2]. This in particular makes it easier to interpret the fuzzy regions that are obtained. This chapter is organized as follows. In the next section, we give an overview of different types of spatial inconsistencies, and pointers to solutions that have been proposed in the literature. Next, in Section 3, we recall how spatial regions satisfying a set of available constraints may be generated using a genetic algorithm, and propose some modifications to make this approach applicable in the face of inconsistencies. Among others, we propose a new algebraic closure algorithm to this end. Section 4 subsequently deals with the generation of fuzzy regions, focusing on the definition of the fitness function and changes that need to be made in the representation of chromosomes. Furthermore, we stress the fact that having fuzzy regions may substantially increase the size of the search space, and we present a solution which borrows an idea from cooperative coevolution techniques. Finally, an experimental evaluation is presented in Section 5.
214
S. Schockaert and P.D. Smart
2 Spatial Inconsistency A major challenge to the realisation and the success of using geographic information (geo-information) from the web is the reliability and consistency of the geo-information that is being shared and used. Inaccuracy or error in geographic data can be accumulated at different stages of handling and using the data [19, 24], from the data collection phase, to maintenance and update processes on stored data. Inconsistencies can arise between information within solitary datasets and between information from diverse datasets. Geo-information has unique requirements not found in other domains [18], for example geographic objects have spatial extent (object location) in addition to thematic properties. Geographic constraints can then be formalised on an object’s thematic properties, such as population or feature type, or on its spatial properties, such as topology or size [23]. Constraints on spatial properties can be further subdivided to those that deal with the geometric representation of objects, and those that deal with the semantics of spatial objects [35]. The following categorisation of spatial consistency constraints was proposed in [5]: • Geometric Concerned with maintaining the accuracy of object geometry or location. For example, checking that a polygon has 3 or more sides or looking for polygon overshoots and slivers. • Semantic integrity constraints Concerned with the meaning of geographical features and how they should legally be allowed to interact. For example, a toposemantic constraint defines legal interactions between objects’ spatial configuration, e.g. a road cannot pass through a river, a house is not containing within a lake. Other spatial relations can also be considered, for example size-semantic where a city can not be smaller than the union of its member neighborhoods. • User defined constraints which can be either semantic or geometric. these are analogues to user defined business rules. In addition to constraints over single, self-contained geo-information sources, constraints are also necessary to maintain consistency of the integration of geoinformation from multiple diverse data sources. That is, similar information from each source must be effectively matched [9] and any logical, attribute, positional or temporal contradictions between them must be resolved [15]. Typically, errors in integration arise from differences in the reliability, accuracy and scale of representation between sources [35]. Furthermore, imperfections between sources can arise regarding objects which are inherently vague [28], and have differing vernacular definitions [48]. Errors or interpretational and observational differences in the description, either qualitatively or quantitatively, of the location and shape of geo-objects between sources can propagate to what appears to be inconsistencies in the spatial relations between those objects. Consequently, without proper treatment, an agent (human or machine) would either need to deal with multiple possible views of these relations, or assume them inconsistent and unreliable. For example, information from administratively sources may show officially that one region is adjacent to another, whereas information from user contributed sources
Generating Fuzzy Regions from Conflicting Spatial Information
215
may be vague and show the same two regions to be overlapping. A question here arises as to whether one interpretation is valid and the other should be thrown away, or whether both are, to some extent, valid. Within the past two decades, a number of works have emerged that deal specifically with methods to maintain the integrity of spatial knowledge bases. For example, rule based spatial integrity constraints have been applied to numerous existing spatio-temporal databases, which often include mechanisms to ‘clean’ or rectify errors [29, 49]. More recently, the ideas of spatial integrity constraints have been brought to the area of web ontologies, where ontologies themselves are seen as a step toward maintaining geographic information [12]. An approach to maintaining the integrity of qualitative relations between diverse datasets was shown in [9], where the authors try to maintain consistency of qualitative topological information across different sources using qualitative spatial reasoning techniques. Such premise was extended in [45], where a hybrid ontology and rule framework is developed to help maintain the consistency of ontological knowledge from a number of different geographical ontologies, again using spatial reasoning techniques. However all aforementioned works assume there is a gold standard of consistency which must be adhered to, where from the classical standpoint ‘Consistency describes the absence of any logical contradictions within a model of reality’[26]. In this sense, information is either consistent with a set of constraints and heuristics, or not. However as noted, certain relations are inherently vague and subject to different interpretations. In this context a number of different approaches have been developed that deal natively with uncertain or vague information. For example, one way to deal effectively with vague spatial objects with indeterminate boundaries is through a fuzzification of spatial data types along with metrics that deal with degrees of possibility [36, 50]. In this work, we deal with inconsistencies by treating topological relations as vague constraints that can be partially satisfied. In effect, this allows different sources to have different interpretations of the relations that hold between geographical regions.
3 A Genetic Algorithm for Crisp Regions The main aim of this section is to familiarize the reader with the GA approach to generating spatial scenarios, which was introduced in [43, 41]. At the same time, we highlight some changes that are needed when the initial information is inconsistent. As already mentioned in the introduction, the general idea is to discretize the problem, such that only a finite number of polygons need to be considered for each region, and to use a simple, but configurable algorithm to generate polygons that satisfy a number of spatial constraints. A genetic algorithm is then proposed to find optimal configurations. To keep the discussion focused, we will not consider the extension to ACO in this chapter. This extension is mainly useful in the face of quantitative information, and it is largely orthogonal to the extension we propose here. The interested reader is referred to [43].
216
S. Schockaert and P.D. Smart
3.1 Discretizing Space In principle, an (uncountably) infinite number of polygons can be considered as candidate boundaries for a given region. To make the approach tractable, it is therefore desirable to apply some form of discretization. The most obvious choice would be to represent the plane as an infinite grid (i.e. the digital plane Z2 ), and only consider polygons that can be represented as a contiguous set of cells, within some bounded fragment of this grid. It is interesting to note that reasoning with topological relations is not genuinely affected by discretizing the plane in this way [22]. Another advantage is that processing geometries in such a grid representation is relatively easy and fast. An important disadvantage with this technique, however, is that there may be some geographical information available a priori, e.g. in the form of polygons, which cannot be exactly represented in a grid representation. Therefore, we propose to use a discretization based on triangles instead of grid cells, and to choose the triangles in such a way that all the initial information is representable. Specifically, we choose a finite set of points P, which will correspond to the vertices of these triangles. The set P should at least contain all the vertices of the polygons in the input data. In addition, some points should be chosen within the area of interest, either randomly or uniformly. The more points are chosen, the larger the search space becomes and the slower the algorithm will converge. However, if too few points are chosen, it is possible that constraints that are classically consistent, can no longer be satisfied. For instance, in the extreme case where only three vertices are chosen, there will be only one triangle and all regions will be equal. Once the set of points P has been constructed, the actual discretization of the plane is calculated. Although this can be done in many different ways, it seems natural to use the Delaunay triangulation to this end.7 [7]. Recall that among all possible triangulations of P, the Delaunay triangulation is defined as the one in which triangles are as much equiangular as possible. The Delaunay triangulation is often used because it satisfies many interesting properties, which intuitively ensure that the triangulation is the most natural one. For example, the circumscribing circle of any triangle of the Delaunay triangulation contains no other points than the vertices of the triangle itself (Empty Circle property). Furthermore, every internal edge E is locally optimal, i.e. for Q the quadrilateral composed of the two triangles sharing E , replacing E in the triangulation by the other diagonal of Q does not increase the minimum of the six internal angles. The Delaunay triangulation has been well investigated and a number of algorithms exist that allow a calculation in O(n log(n)) time [21, 11]. As an example, assume that information is available about the polygons v1 , v2 and v3 from Figure 1(a). A possible Delaunay triangulation is shown in Figure 1(b). Note that v1 , v2 and v3 all remain representable. The set of vertices P has been obtained deterministically here. However, rather than choosing these points completely uniformly, more points have been chosen in areas about which we have more quantitative information. Specifically, because there are polygons in the upper-left corner, 7
Note that if some of the polygons v1 , . . . , vk would not be representable in the triangulation, this can easily be remedied by splitting some of the triangles in smaller triangles.
Generating Fuzzy Regions from Conflicting Spatial Information
(a)
217
(b)
Fig. 1 Regions v1 , v2 , v3 and the resulting Delaunay triangulation
a finer granularity is used there. The reason is that the more quantitative information we have about a certain area, the more sensible it becomes to use a highly detailed representation.
3.2 Spatial Reasoning Basic Notions Although the GA approach can easily be applied to various types of spatial information, including unary (e.g. region A is convex), binary (e.g. region A is located north of region B), and ternary constraints (region A is located between region B and C), we will restrict the examples and experiments in this chapter to topological relations, for clarity. Topological relations are often modeled using the region connection calculus (RCC; [32]). In this framework, topological relations are defined, starting from a symmetric and reflexive relation C, called connection. Intuitively, C holds between two regions, when they have at least one point in common. Table 1 shows how other topological relations can be defined in terms of C. The default interpretation of some of the relations is illustrated in Figure 2. In practice, spatial relations are often expressed using the following eight base relations: EQ, T PP, NT PP, T PP−1 , NT PP−1 , PO, EC and DC, where .−1 denotes the inverse relation, e.g. NT PP−1 (u, v) ≡ NT PP(v, u). The term RCC-8 is used to refer to this choice of base relations. Note that there is a total of 28 possible topological relations that can be expressed using these base relations, i.e. the number of subsets of the eight base relations. For example u {EQ, T PP, NT PP} v means that either the relation EQ, T PP or NT PP holds between regions u and v8 . Also note 8
Throughout this chapter, we use lowercase letters to refer to regions when they are treated as variables, and uppercase letters when they are treated as geometrical entities (typically polygons).
218
S. Schockaert and P.D. Smart
Table 1 Definition of topological relations in the original RCC and the fuzzy RCC for regions a and b [39] Name Disconnected Part Proper Part Equals Overlaps Discrete Partially Overlaps Externally Connected Non-Tangential Part Tangential PP Non-Tangential PP
A
RelationRCC Definition DC ¬C(a,b) P (∀c ∈ U )(C(c, a) ⇒ C(c,b)) PP P(a, b) ∧ ¬P(b,a) EQ P(a, b) ∧ P(b,a) O (∃c ∈ U )(P(c,a) ∧ P(c,b)) DR ¬O(a, b) PO O(a,b) ∧ ¬P(a,b) ∧ ¬P(b,a) EC C(a,b) ∧ ¬O(a, b) NT P (∀c ∈ U )(C(c, a) ⇒ O(c,b)) T PP PP(a,b) ∧ ¬NTP(a,b) NT PP PP(a,b) ∧ NTP(a, b)
B
EC Meets
A B TPP-1 Tangental Proper Part Inverse / Covers
A
B
DC Disjoint
Fuzzy RCC Definition 1 −C(a,b) infc∈U IT (C(c,a),C(c, b)) min(P(a, b),1 − P(b,a)) min(P(a, b),P(b,a)) supc∈U T (P(c, a), P(c,b)) 1 − O(a,b) min(O(a, b), 1 − P(a, b),1 − P(b,a)) min(C(a,b), 1 − O(a,b)) infc∈U IT (C(c,a), O(c,b)) min(PP(a, b), 1 − NT P(a,b)) min(1 − P(b, a),NT P(a, b))
A
B
PO Partial Overlapping
A
B
B
A/B
EQ Equal B A
A NTPP-1 Non-Tangental Proper Part Inverse/ Contains
TPP Tangental Proper Part/ CoveredBy
NTPP Non-Tangental Proper Part/ Inside
Fig. 2 Intuitive meaning of the RCC-8 base relations
that the eight base relations are pairwise disjoint and jointly exhaustive: between every pair of regions, exactly one of the base relations holds. In practice, we write e.g. PP(a, b) as an abbreviation of a {NT PP, T PP} b. Most reasoning tasks of interest in RCC–8 can be polynomially reduced to satisfiability checking, which is known to be an NP–complete problem [34]. To support practical reasoning, a large number of (maximal) tractable subfragments have been identified [34, 16, 25, 33], i.e. (maximal) subfragments of the 28 RCC-8 representable relations for which reasoning only requires polynomial time. For example, in [25], it was shown that reasoning in RCC–8 is tractable, provided only base relations are used (i.e. no indefinite information). This subfragment is not maximal, however, and can be extended in several ways. In particular, it has been shown that there exist exactly three maximal tractable subfragments that contain these eight base relations [34, 33]. In all three subfragments of RCC–8, satisfiability can be checked using an O(n3 ) algebraic closure algorithm, similar to Allen’s algorithm for temporal reasoning [1].
Generating Fuzzy Regions from Conflicting Spatial Information
219
Algebraic Closure Let v1 , . . . , vk be variables, and let Θ be a set of spatial relations involving (only) these variables. As a preprocessing step, information that follows implicitly from the relations in Θ should be made explicit, as this will facilitate finding suitable spatial scenarios, i.e. instantiations of the variables by polygons. When Θ is consistent, this simply amounts to calculating the algebraic closure of Θ . When Θ contains only topological relations, this can be done in O(n3 ), n being the number of relations in Θ , by repeatedly applying composition rules (or transitivity rules). For example, from the RCC–8 composition table [32], we know that NT PP(a, c) holds as soon as NT PP(a, b) and T PP(b, c) hold for some region b. Therefore, if NT PP(a, b) and T PP(b, c) are both contained in Θ , we can add NT PP(a, c) as well. When it is not possible anymore to add new relations to Θ in this way, Θ is called algebraically to denote the algebraic closure of Θ . Note however that the closed. We will write Θ algebraic closure algorithm will not necessarily find all consequences of Θ . When the topological relations used are not all contained in a tractable subfragment of RCC-8, for example, or when a combination of topological and other types of spatial relations are used. Completeness is not of prime importance, however, as our approach should be applicable in situations where complete algorithms do not even exist. The focus here is to make information available to the GA, which follows easily from what we already know. A key advantage of the algebraic closure algorithm is that it can easily be adapted to other types of spatial relations. When Θ is inconsistent, the algebraic closure procedure will typically9 lead to at least one conflict, e.g. a pair of regions (a, b) for which both {T PP, NT PP} and {EC} can be derived. Our solution, in this case, is to give priority to the relation that was derived earliest. For instance, if {T PP, NT PP} is obtained by composing two of the initial spatial relations, while we need to compose three relations to obtain {EC}, the latter will be ignored. The intuition behind this approach is that most spatial conflicts are local, and we want to restrict the consequences of a given conflict to the variables that are most directly involved. If {T PP, NT PP} and {EC} are obtained during the same iteration, the union {T PP, NT PP, EC} is added, as we can make no reasonable choice between the two conflicting relations in this case. In the latter case, we will also tag the relation between a and b as conflicting and make no further inferences with it. This avoids a blow-up of the possible relations in the face of inconsistencies. The details of the proposed algorithm are presented in Procedure Closure. In this procedure, ◦ denotes relational composition, i.e. R(vi , v j ) ◦ R(v j , vk ) is the set of RCC-8 base relations that may hold between vi and vk knowing that R(vi , v j ) are the possible base relations between vi and v j and R(v j , vk ) are the possible base relations between v j and vk . Furthermore, All denotes the union of all eight RCC-8 base relations. Intuitively, in each step, S corresponds to a refinement of the relation between vi and vk , which follows from the knowledge base. When this refinement is consistent (i.e. S is not the empty set), and when no inconsistencies were found 9
This will always be the case when all relations from Θ are contained within one of the three aforementioned, maximal tractable subfragments of RCC-8.
220
S. Schockaert and P.D. Smart
Procedure Closure
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data: For each pair of regions (vi , v j ), let R(vi , v j ) be the set of RCC-8 relations that may hold between vi and v j . The set of all regions under consideration is given by {v1 , . . . , vn }. todo ← {(vi , v j , vk )|i = j = k ∧ i < k} while todo = 0/ do todo1 ← todo todo ← 0/ for (vi , v j ) do R (vi , v j ) ← R(vi , v j ) for (vi , v j , vk ) in todo1 do if ¬taggedInconsistent(vi , v j ) and ¬taggedInconsistent(v j , vk ) then S ← R(vi , vk ) ∩ (R (vi , v j ) ◦ R (v j , vk )) if S ⊂ R(vi , vk ) and S = 0/ and ¬taggedInconsistent(vi , vk ) then R(vi , vk ) ← S R(vk , vi ) ← (R(vi , vk ))−1 todo ← todo ∪ {(i, k, l)|l = i = k} ∪ {(l, i, k)|l = i = k} else if (S = 0/ or taggedInconsistent(vi , vk )) and R (vi , vk ) = All then R(vi , vk ) ← R(vi , vk ) ∪ (R (vi , v j ) ◦ R (v j , vk )) R(vk , vi ) ← (R(vi , vk ))−1 tag R(vi , vk ) as inconsistent tag R(vk , vi ) as inconsistent
earlier w.r.t. (vi , vk ), this refinement is added to the knowledge base (lines 10-13). When S is the empty set, however, it means that we have discovered a conflict. In this case, or in the case where an (unresolved) conflict w.r.t. (vi , vk ) was discovered earlier, the composition of R(vi , v j ) and R(v j , vk ) is ignored (i.e. we give priority to the earlier derived information), except if nothing was known yet about R(vi , vk ) at the beginning of the iteration (lines 15-19). Note that his latter situation occurs when the conflict is caused only by inferences that were drawn during the current iteration. An important characteristic of this procedure is that the initial relations from Θ have not been removed (although they may have been refined). This is important, because we are interested in spatial configurations that satisfy as many as possible of the initial relations. Whether or not the derived relations are satisfied is irrelevant, the algebraic closure algorithm is only used to help the GA find suitable solutions. Also note that the algorithm does not attempt to repair inconsistencies: if the initial knowledge base is inconsistent, then so will the result be. The idea behind the algebraic closure procedure is further clarified in the following example. Example 1. Let Θ = {EC(u, v), T PP(w, u), T PP(w, v), NT PP(z, u)}, where we have omitted the inverse relations for clarity, i.e. it is implicitly assumed that also EC(v, u), T PP−1 (u, w), etc. are known. Clearly this knowledge base is inconsistent: T PP(w, u) and T PP(w, v) imply that u and v should overlap, whereas EC(u, v)
Generating Fuzzy Regions from Conflicting Spatial Information
221
implies that u and v cannot overlap. In particular, during the first iteration of the algebraic closure algorithm, the composition rules are applied as follows: T PP(w, u) ∧ EC(u, v) ⇒ w {EC, DC} v T PP(w, v) ∧ EC(v, u) ⇒ w {EC, DC} u T PP−1 (v, w) ∧ T PP(w, u) ⇒ v {PO, EQ, T PP, T PP−1 } u NT PP(z, u) ∧ EC(u, v) ⇒ DC(z, v) NT PP(z, u) ∧ T PP−1 (u, w) ⇒ z {DC, EC, PO, T PP, T PP−1 } w These rules lead to several conflicts: w {EC, DC} v contradicts T PP(w, v), w {EC, DC} u contradicts T PP(w, u), and v {PO, EQ, T PP, T PP−1 } u contradicts EC(u, v). As priority is given to the relations that were initially in the knowledge base, after the first iteration, we have
Θ = {EC(u, v), T PP(w, u), T PP(w, v), NT PP(z, u), DC(z, v), z {DC, EC, PO, T PP, T PP−1 } w} In the second iteration, the following two applications of composition rules allow to further refine the relation between z and w: NT PP(z, u) ∧ T PP−1 (u, w) ⇒ z{DC, EC, PO, T PP, NT PP}w DC(z, v) ∧ T PP−1 (v, w) ⇒ DC(z, w) Hence, we discover no further inconsistencies, and the final specification of the knowledge base is given by = {EC(u, v), T PP(w, u), T PP(w, v), NT PP(z, u), DC(z, v), DC(z, w)} Θ
3.3 Generating Polygons The main idea of the algorithm is to instantiate the variables (regions) by polygons (boundaries) one by one, using previously instantiated regions as spatial constraints on the possible instantiations of the subsequent regions. In particular, assume that v1 , v2 , . . . , vk have already been instantiated by a polygon, and let u be the next re involving u, only those involving gion to consider. Among all relations from Θ exclusively regions from the set {u, v1 , v2 , . . . , vk } are considered at this point. As an example, consider again the three regions v1 , v2 , v3 and the corresponding triangulation from Figure 1. As possible instantiations of u, we only consider polygons that can be defined as a set of contiguous triangles from the triangulation10. To find a polygon, or indeed a contiguous set of triangles T , which satisfies the 10
Note that the restrictions that the triangles be contiguous is only needed if self–connected regions are desired. In some scenarios, it may be desirable to allow instantiations of variables with finite unions of disjoint polygons (e.g. to model an archipelago).
222
S. Schockaert and P.D. Smart
spatial constraints in Θ , we translate topological relations to a number of simple set–theoretic restrictions on T . In general, topological relations may result in the following types of constraints on T : Upper bound: The set T should be a subset of T0 , i.e. T ⊆ T0 . Lower bound: The set T should be a superset of T0 , i.e. T0 ⊆ T . Overlap: The set T should contain at least one triangle from T0 , i.e. T ∩T0 = 0. / Partial exclusion: The set T may contain some, but not all triangles from T0 , i.e. T0 ⊆ T . 5. Boundary: The polygon defined by T should share at least one boundary point with the polygon defined by T0 .
1. 2. 3. 4.
where T0 may be an arbitrary set of triangles from the triangulation. For boundary constraints, we furthermore require that T0 is a contiguous set of triangles, or, equivalently, that T0 defines a polygon. When the given relations are inconsistent, the result is determined by the order in which the relations are considered. In particular, at each step, the constraints corresponding to a given relation are ignored when they are in conflict with the earlier derived constraints. To illustrate this procedure, consider the following example. Example 2. let v1 , v2 and v3 be defined as in Figure 1, and let Θ be defined by
Θ = {EC(v1 , u), PO(v2 , u), T PP(u, v3 )} Furthermore, let us denote the set of triangles corresponding to the polygons for v1 , v2 and v3 , as T1 , T2 and T3 respectively. First, EC(v1 , u) implies that (the polygons defined by) T and T1 should share at least one boundary point, and furthermore, it induces the following constraint on T (upper bound): T ⊆ U \ T1 where U denotes the set of all triangles in the Delaunay triangulation. Next, PO(v2 , u) implies that (overlap, partial exclusion, and again overlap): T ∩T2 = 0/ T2 ⊆ T T ∩ (U \ T2 ) = 0/ and finally, T PP(u, v3 ) means that T and T3 share at least one boundary point, and that (upper bound and partial exclusion): T ⊆ T3 T3 ⊆ T Note that other types of spatial relations, such as orientation relations, lead to similar constraints, and hence, could be treated in the same way. Once all spatial relations from Θ have been translated to constraints on the set of triangles T , we need to determine the actual instantiation of u. To this end, we use
Generating Fuzzy Regions from Conflicting Spatial Information
223
a simple randomized greedy algorithm. First, we select an initial triangle t 0 from U which satisfies all upper bound constraints on T : 1. if at least one lower bound constraint has been obtained from Θ , t 0 is chosen randomly from the triangles in the available lower bounds; 2. otherwise, if at least one overlap constraint has been obtained, t 0 is chosen randomly from the triangles involved; 3. otherwise, t 0 is chosen randomly among all triangles that satisfy the available upper bounds. After t 0 has been chosen, the algorithm repeatedly adds a neighboring triangle to the current set of triangles. In particular, let T 0 = {t 0 }, and let T i be the set of triangles that has been obtained after i steps (|T i | = i+1). To find T i+1 from T i , we consider the set {t1 ,t2 , . . . ,ts } containing all triangles t satisfying: 1. triangle t is bordering on one of the triangles in T i , but t is itself not contained in T i ; 2. adding t to |T i | does not violate any of the available constraints (e.g. t is contained in all the available upper bounds); 3. T i ∪ {t j } does not correspond to a polygon with a hole (if polygons without holes are desired); If a triangle t in {t1 ,t2 , . . . ,ts } is contained in one of the available lower bounds, this triangle is chosen, i.e. T i+1 = T i ∪ {t}. If none of the triangles occurs in a lower bound, a triangle from {t1 ,t2 , . . . ,ts } is chosen randomly. The algorithm stops when no candidate triangles, satisfying the three criteria above, can be found (failure), or when a solution has been found that satisfies all constraints (success). As an example, in Figure 3, a typical outcome of our algorithm is shown for Θ defined as in Example 2. After a correct solution has been found, we may choose to add further triangles (or remove some of the earlier assigned triangles) to obtain a more desirable polygon, e.g. with a more natural shape.
3.4 A Genetic Algorithm An important characteristic of the algorithm from Section 3.3 is that its chances of being successful critically depend on the order in which the variables are processed. This is illustrated in the next example. Example 3. Let Θ be given by
Θ = {T PP(v2 , v1 ), EC(v3 , v1 ), EC(v3 , v2 ), EC(v4 , v1 ), EC(v4 , v2 ), DC(v4 , v3 )} and assume that first v1 , v2 and v3 are instantiated. Then it is quite likely that a scenario such as the one displayed in Figure 4(a) is obtained. In that case, no solution can be found, since v4 should share at least one boundary point with v1 and v2 (from EC(v4 , v1 ) and EC(v4 , v2 )). However, as all boundary points that are common to v1 and v2 are also included in v3 , this would violate the requirement that DC(v4 , v3 ). Such problems could easily be avoided if the variables were processed in the order
224
S. Schockaert and P.D. Smart
Fig. 3 Possible definition of u satisfying EC(v1 , u), PO(v2 , u) and T PP(u, v3 )
(a)
(b)
Fig. 4 (a) After v1 , v2 and v3 have been instantiated, no polygon v4 can be found anymore such that EC(v1 , v4 ), EC(v2 , v4 ) and DC(v3 , v4 ); (b) After v1 , v3 and v4 have been instantiated, a suitable polygon for v2 can always be found
v1 , v3 , v4 , v2 . After v1 , v3 and v4 have been instantiated, a suitable polygon for v2 can always be found (see Figure 4(b)). Similarly, when the given relations are inconsistent, the result will depend heavily on the order in which the relations are processed. To find these optimal orderings of variables and relations, we propose a genetic algorithm. Genetic algorithms are a popular metaheuristic, mimicking the evolution of species to tackle combinatorial optimization problems. Because of this analogy, candidate solutions are called chromosomes. A set of such chromosomes, called the population, is maintained by the algorithm, and allowed to evolve using two problem-specific operators: crossover and mutation. Typically, a crossover operator takes two chromosomes as input (the parents) and recombines these to obtain one or more new chromosomes (the offspring), whereas a mutation operator takes one chromosome as input and alters it in some way. The sets of solutions that are
Generating Fuzzy Regions from Conflicting Spatial Information
225
repeatedly obtained are called generations of the population. To improve the average quality of the solutions found, the fitness of each chromosome, i.e. the quality of the candidate solutions, is evaluated and used to select the parents for the next generation: fitter chromosomes are more likely to be selected as parent than others. Note that the chromosomes (candidate solutions), in this case, do not correspond to spatial configurations, as one might intuitively expect, but to configurations of the algorithm from Section 3.3. Specifically, we will use (complex) chromosomes consisting of the following two components: 1. an ordering of the variables; 2. for each variable, an ordering by which the relations involving that variable should be processed. As the use of GAs to optimize orderings is well–established [30, 46], a large number of order–preserving crossover operators have been proposed in the literature, including Order Crossover (OX), Order Crossover 2 (OX2), Cycle Crossover (CX), Position Based Crossover (PBX) and Partially Mapped Crossover (PMX); we refer to [14, 27, 47] for more details on these operators. In [43], experimental evidence is presented, which supports the use of PMX for our problem. Because this observation is, moreover, consistent with results that have been obtained in the context of facility layout problems [17] — where also optimal variable orderings are sought to generate spatial scenarios — we will only consider this particular crossover operator in our experiments below. Two natural candidates emerge as mutation operators [44]. Either we can randomly swap two elements in an ordering, or we can change the position of a given element to a random new position in the ordering. In the experiments below, we have applied the latter operator, as it seems intuitively more appropriate. Furthermore, initial experiments with these two operators have not revealed any significant differences in performance. The mutation operator is applied on every element of the ordering, i.e. every element is shifted to a different position with a small probability p1mut , for the ordering of the variables, and p2mut for the ordering of the relations. To evaluate the fitness of a chromosome chi , we apply the algorithm from Section 3.3 using the corresponding configuration. This results in some spatial scenario Schi which defines a polygon for each variable of interest. The fitness (or quality) f (chi ) of chromosome chi is then defined as the percentage of spatial relations from Θ that are satisfied in scenario Schi : f (chi ) =
|{γ |γ ∈ Θ ∧ Sch |= γ }| |Θ |
Note that in this way, due to the randomness of the procedure from Section 3.3, the fitness function is non–deterministic. To cope with this, and to avoid the computational overload of calculating too many scenarios, only one scenario is generated for every chromosome, which is used to define its fitness function until the chromosome is discarded from the population. Besides determining which is the most desirable solution found by the GA, the fitness function is mainly used to choose suitable parents to generate the offspring of
226
S. Schockaert and P.D. Smart
the next generation. In general, selection of parents should balance between favoring fitter chromosomes (since we want the average quality of solutions to increase after each generation), and maintaining sufficient diversity in the population (since we want to avoid getting stuck in local optima). A common selection scheme is roulette wheel selection, by which the probability that a chromosome is chosen, is proportional to its fitness. Such as scheme is not optimal here, however, as the absolute difference in fitness between the best and worst solution will typically be quite small. A useful alternative, which interprets fitness scores in an ordinal fashion, is tournament selection. Let ntour be an integer that is smaller than the size of the population and let ptour be in ]0, 1[. to select a parent using tournament selection, first ntour chromosomes are chosen from the current population at random. With probability ptour , the best chromosome among these ntour chromosomes is selected, with probability ptour (1 − ptour ) the second-best chromosome is selected, with probability ptour (1 − ptour )2 the third-best chromosome, etc. To generate offspring, once two parents have thus been selected, we proceed as follows. With probability pcross the crossover operator is applied to the selected chromosomes. Subsequently, the mutation operator is applied on the result of the crossover (or on the original chromosomes, when crossover was not used). Finally, various options exist to govern how offspring are added to the population. Below, we consider two different alternatives, which are referred to as Generational GA (GGA) and Steady-State GA (SSGA; [51]). When the GGA strategy is used, in each generation s pop offspring are generated (where s pop denotes the size of the population), which are used to replace all the chromosomes from the current generation. As an exception, the fittest chromosome from the current generation is used to replace the least fit chromosome from the next generation (elitism). When the SSGA strategy is used, in each generation 1 offspring is generated, and added to the current population. Subsequently, the worst chromosome from the current generation is removed (to keep the total number of chromosomes fixed). Note that this might either be the newly added offspring, or one of the chromosomes of the previous generation. Experiments in [43] strongly suggest that SSGA is more suitable in this context. In our experiments, we will therefore only consider the SSGA strategy.
4 A Genetic Algorithm for Fuzzy Regions In this section, we discuss how an extension of the GA approach can be used to generate fuzzy spatial scenarios. However, we still assume that the input information is presented in the form of crisp spatial relations, since this is the kind of information we are likely to deal with in practical applications. Fuzziness is introduced as a means to deal with potential inconsistencies. This entails that whenever the initial relations in Θ are consistent, we still prefer to instantiate the variables (regions) by crisp polygons. When the initial information is conflicting, on the other hand, we assume that solutions are preferred in which variables are instantiated by fuzzy regions. Note that it is natural to model spatial relations between such fuzzy regions as fuzzy relations, as there may be situations in which we want to indicate that a
Generating Fuzzy Regions from Conflicting Spatial Information
227
given spatial relationship is only partially satisfied. In Section 4.1, we briefly recall how such fuzzy degrees can be defined in the case of topological information. There are at least two reasons why we may prefer fuzzy regions in applications, when the available information is conflicting. First, the boundaries of many geographic regions are inherently ill-defined, and as a consequence, also spatial relations involving these regions are ill-defined. When the input relations are all acquired from reliable sources, it is therefore plausible that none of the given relations is completely wrong, and the conflicting information is merely a result of different points of view. This is exemplified in [42], for instance, where a technique is introduced to mine topological relations from web documents. When applied to the city of Cardiff, strong evidence was found for the following two, conflicting relations: EC(Cardiff Bay, Butetown) P(Cardiff Bay, Butetown) Ignoring either the fact that Cariff Bay can be considered as adjacent to Butetown or contained in it, would be an unacceptable solution, as both points of view can be justified. In this case, it is clearly more desirable to have a solution in which the spatial extent of Butetown, for instance, is modeled as a fuzzy set, such that both relations hold to some non-zero degree. Second, when the conflicts are due to actual errors, and the amount of support for all relevant relations is sufficiently strong, crisp scenarios can only be obtained by making an arbitrary choice of which relations to ignore. Using fuzzy regions then allows to make a softer decision in which none of the relations is ignored completely. In this case, fuzzy region boundaries correspond to varying degrees of compatibility with the given input information, rather than an inherent gradualness of the boundary.
4.1 Fuzzy Topological Relations An intuitive way to generalize the region connection calculus it to replace C by a fuzzy relation (still being reflexive and symmetric), i.e. a mapping from region pairs to the unit interval [0, 1], and to replace the first-order definitions of the other spatial relations by corresponding fuzzy connectives. However, this leads to a wide variety of possibilities, as different fuzzy connectives may result in a very different behavior. Moreover, replacing the definitions of the spatial relations by expressions that are classically equivalent may lead to very different fuzzifications as well. In [39], the definitions shown in the right column of Table 1 were advocated, where T is a left-continuous t-norm and IT is its residual implication11. This particular choice of fuzzy implication is important to ensure that transitivity properties such as T (P(a, b), P(b, c)) ≤ P(a, c) hold, which generalize the property that for crisp 11
Recall that a t-norm is a symmetric, associative and increasing [0, 1]2 − [0, 1] mapping T which satisfies the boundary condition T (1, x) = x for all x in [0, 1]. T-norms are commonly used to model conjunction in fuzzy logics. Given a left-continuous t-norm T , its residual implicator IT is defined as IT (x, y) = sup{λ |λ ∈ [0, 1] ∧ T (x, λ ) ≤ y} for all x and y in [0, 1].
228
S. Schockaert and P.D. Smart
regions a, b and c, whenever a is a part of of b and b is a part of c, we have that a is a part of c. Note that the definitions of the original RCC relations, in Table 1, correspond to the first-order expressions that were generalized. They are (classically) equivalent, but not identical to the original definitions from [32]. The properties of the fuzzy RCC relations that were shown in [39] hold for an arbitrary, reflexive and symmetric fuzzy relation C. In applications, on the other hand, a specific interpretation of C, as well as of the other spatial relations, is required. Different choices of C will lead to different interpretations of fuzzy spatial relations, which may be useful in different applications. In [38], for instance, connection is defined in terms of nearness between fuzzy regions, and a characterization of all resulting fuzzy RCC relations was shown. In [40], reasoning problems such as satisfiability checking were investigated for these fuzzy RCC relations, in the specific case where T is the Łukasiewicz t–norm, defined by T (a, b) = max(0, a + b − 1). This choice of t-norm guarantees that all consistent sets of fuzzy RCC relations can be satisfied by fuzzy regions that are bounded and, moreover, only take membership degrees from a particular finite set Mk = {0, 1k , 2k , . . . , 1} with k ∈ N \ {0}. This property, which is fundamental for the approach we take in this paper, does not hold in general for other popular t–norms such as the minimum or the product. In particular, when we are looking for a fuzzy spatial scenario, we thus only need to consider fuzzy regions with a (typically small) finite number of different α -levels. This will be extremely useful in extending the genetic algorithm, as each fuzzy region can then simply be seen as a finite collection of nested crisp regions. Furthermore, when the fuzzy regions A and B only take a finite number of different membership degrees, fuzzy RCC relations between A and B can be defined by looking at the pairs of α -levels (Aα1 , Bα2 ) for which the corresponding classical RCC relation holds12 . In particular, assume that only fuzzy regions are considered in two-dimensional Euclidean space, which only take membership degrees from Mk , and whose α -level sets are all regularly closed13. Let the fuzzy relation C for two fuzzy regions A and B be defined as C(A, B) = max{λ |λ ∈ Mk ∧ (C(A1 , Bλ ) ∨C(A1− 1 , Bλ + 1 ) ∨ · · · ∨C(Aλ , B1 ))} (1) k
k
Then it can be shown that [40] O(A, B) = max{λ |λ ∈ Mk ∧ (O(A1 , Bλ ) ∨ O(A1− 1 , Bλ + 1 ) ∨ · · · ∨ O(Aλ , B1 ))} (2) k
k
P(A, B) = min{λ |λ ∈ Mk ∧ P(A1 , Bλ ) ∧ P(A1− 1 , Bλ − 1 ) k
(3)
k
∧ · · · ∧ P(A1−λ + 1 , B 1 )} k
k
NT P(A, B) = min{λ |λ ∈ Mk ∧ NT PP(A1 , Bλ ) ∧ NT PP(A1− 1 , Bλ − 1 ) k
k
(4)
∧ · · · ∧ NT PP(A1−λ + 1 , B 1 )} k
12 13
k
Recall that for all α in ]0, 1], the α -level set Aα of a fuzzy set A in a universe X is defined as the crisp set Aα = {x|x ∈ X ∧ A(x) ≥ α }. Recall that a set is regular closed, if it is equal to the topological closure of its interior. This condition avoids degenerate regions, such as points and lines in a two-dimensional space.
Generating Fuzzy Regions from Conflicting Spatial Information
229
It is furthermore easy to see that the value of all other fuzzy RCC relations can be obtained by combining the degrees to which C, O, P and NT P hold, using the minimum and standard negation (i.e. the complement w.r.t. 1). The particular choice of C in (1) is motivated by the fact that the Łukasiewicz t-norm is used in the definition of the fuzzy spatial relations. For other choices of t-norms, alternative definitions of C would be required. Also note that this approach is very similar to a generalization of the Egg-Yolk calculus[6]. Rather than defining degrees to describe a certain situation, however, the Egg-Yolk calculus defines a new relation for every possible combination of RCC relations between the α -levels. As a consequence, the Egg-Yolk calculus can discriminate between more situations than the fuzzy RCC relations. This, however, makes Egg-Yolk relations difficult to handle, especially when more than two α -levels need to be considered. Note in particular that a quantitative assessment of how well a given situation corresponds to a given topological relation will be needed to define a fitness function. In the remainder of this paper, we will therefore use the fuzzy topological relations defined by (1)-(4).
(a)
(b)
(c) Fig. 5 Three different possibilities for fuzzy regions A, B, and C satisfying the relations {EC(A, B), NT PP(C, A), NT PP(C, B)} to some degree. These scenarios correspond to the optimal scenarios using (a) two-valued, (b) three-valued and (c) four-valued representations.
Example 4. Assume that the relations Θ = {EC(a, b), NT PP(c, a), NT PP(c, b)} are given as a partial description of the location of regions a, b and c. The set Θ is clearly
230
S. Schockaert and P.D. Smart
inconsistent as NT PP(c, a) and NT PP(c, b) entail that a and b are overlapping, and thus in particular ¬EC(a, b). In the classic case, where a, b and c can only be instantiated by crisp regions, at most two of the three relations from Θ can be satisfied. Such a scenario is depicted in Figure 5(a), in which a, b and c are interpreted as the regions A, B and C. Clearly NT PP(c, a) and EC(a, b) are then satisfied, whereas NT PP(c, b) is not. Now assume that regions can be instantiated by fuzzy sets which only use membership degrees from M2 = {0, 0.5, 1}, i.e. regions are modeled as a pair of nested crisp regions. Then it is possible to satisfy all three relations from Θ to a non-zero degree. This is illustrated in Figure 5(b), where regions a and c are still interpreted as crisp regions A and C, and region b is interpreted as a three-valued fuzzy set B. Since B only takes membership degrees from M2 , it is completely defined by its α -level sets B1 and B0.5 . Under the interpretation from Figure 5(b), we find C(a, b) = 1 NT P(c, a) = 1
O(a, b) = 0.5 P(a, c) = 0
NT P(c, b) = 0.5
P(b, c) = 0
which leads to EC(a, b) = min(C(a, b), 1 − O(a, b)) = 0.5 NT PP(c, a) = min(NT P(c, a), 1 − P(a, c)) = 1 NT PP(c, b) = min(NT P(c, b), 1 − P(b, c)) = 0.5 Finally, assume that fuzzy sets may take membership degrees from the set M3 = {0, 0.33, 0.66, 1}. In this case, it is possible to satisfy each of the three relations to an equal degree, as shown in Figure 5(c). This is in some sense desirable, as there is no a priori reason to favor one of the relations from Θ over the others, e.g. omitting any of the three relations would make Θ consistent. In the scenario from Figure 5(c), c is interpreted as a crisp region C, as before, and a and b are interpreted as the fourvalued fuzzy sets A and B. As it turns out, in this case we may take A0.33 = A0.66 and B0.33 = B0.66 . We find C(a, b) = 0.66 NT P(c, a) = 0.66 NT P(c, b) = 0.66
O(a, b) = 0.33 P(a, c) = 0 P(b, c) = 0
which leads to EC(a, b) = min(C(a, b), 1 − O(a, b)) = 0.66 NT PP(c, a) = min(NT P(c, a), 1 − P(a, c)) = 66 NT PP(c, b) = min(NT P(c, b), 1 − P(b, c)) = 0.66
Generating Fuzzy Regions from Conflicting Spatial Information
231
4.2 Modifications to the GA The definition of the fuzzy topological relations suggests an extension of the genetic algorithm from Section 3: replace each region by a collection of regions, and add constraints to ensure that the regions in these collections are nested. After suitable polygons have been found, each of these collections can then be interpreted as a fuzzy set, defined in terms of its α -levels. Hence, from a conceptual point of view, generating fuzzy regions works in the same way as generating crisp regions, by interpreting the α -levels as different, nested regions. However, in addition to the order in which variables (regions) should be processed, and the order in which the relations involving a given variable should be processed, the chromosomes now need to encode also: 1. for each region R occurring in Θ , the number of different α -levels that should be considered (i.e. the size of the collection of regions that will replace R); , the degree to which it should be satisfied. 2. for each relation in Θ This extension requires changes in the crossover and mutation operators, and in the definition of the fitness function. Moreover, some care will have to be taken to cope with the substantially increased size of the search space. Evolutionary operators As mutation operator, with probabilities p1mut and p2mut , the order of, respectively, the variables and relations is altered, as before. In addition, with probability p3mut , the number of α -levels of a randomly selected region is changed. In particular, the number of α -levels is then either increased or decreased by one (each with equal probability), provided that the number of α -levels remains at least one. In practice, we also impose a maximum αmax on the number of α -levels, to reduce the search space and to keep the computation time reasonable. Finally, the degree to which is required to be satisfied is changed with some probaeach of the relations from Θ 4 , i.e. none of the available bility pmut . Initially, this degree is 1 for all relations in Θ information is ignored, not even partially. When the mutation operator is applied, 1 the current degree for a given relation is either increased or decreased by αmax (both with equal probability), provided of course that this degree remains in the interval [0, 1]. There is one subtle issue about the order of the variables that has an important impact on the algorithm’s performance. Recall that the GA algorithm from Section 3.4 maintains an ordering of the regions/variables to determine when they should be instantiated. As we are now essentially generating α -levels, we should maintain an ordering on the α -levels instead of the actual regions. The problem now is that the number of α -levels that should be considered continually changes for each region (with probability p3mut in each iteration). In particular, every time the number of considered α -levels is decreased, an element should be removed from the ordering, and when the number of α -levels is increased, an extra element should be added to it. This would significantly complicate the crossover operation, however.
232
S. Schockaert and P.D. Smart
More fundamentally, every time an α -level is removed from the list, we loose potentially useful, learned information. To avoid these problems, we maintain an ordering which involves αmax different α -levels for each region, from the start. Based on the number of α -levels that is considered at a given point, some of the elements of the list will be ignored, but they will never be removed. Given the previous remark, the crossover operations from Section 3.4 can still be applied on the variable and relation orderings. In addition, the degree to which a is required to be satisfied in each of the children is obtained given relation from Θ using a standard two-point crossover operator. In particular, a fixed (but otherwise , and two numbers N − and N + arbitrary) order is imposed on the relations from Θ − between 0 and |Θ | are randomly selected; let N be the smallest of these two numbers. The degree to which a given relation should be satisfied in the first child, corresponds to the degree in the first parent for the first N − relations, as well as the | − N + relations, and to the degree in the second parent for the remaining last |Θ relations (and conversely for the second child). A straightforward way to adapt the fitness function to the fuzzy setting would be: f (chi ) =
∑γ ∈Θ Sch (γ ) |Θ |
where Sch (γ ) is the degree to which relation γ is satisfied in the scenario corresponding to chromosome ch. This fitness function, however, does not necessarily lead to the desired behavior. For example, again considering the scenarios from Figure 5, we find that the fitness function would evaluate to 23 in all three cases. As we have argued, however, the second and third scenarios seem more desirable, since they satisfy each of the relations to an equal degree. As an alternative, we therefore propose the following fitness function: f (chi ) = 1 −
∑γ ∈Θ (1 − Sch)2 (γ ) |Θ |
This fitness function will, in effect, favor solutions in which more relations are satisfied to some degree, over solutions in which some relations are completely satisfied and some are not. It is moreover easy to see that for scenarios with crisp regions, this fitness function coincides with the original fitness function. Returning to the scenarios from Figure 5, we now find f (chi ) = 23 = 0.66 in the case of Fig. 5(a), f (chi ) = 56 = 0.83 in the case of Fig. 5(b), and f (chi ) = 89 = 0.88 in the case of Fig. 5(c). Managing the Size of the Search Space To keep the complexity of the search space at a manageable level, it is useful to note that genetic operators modify chromosomes in two, largely orthogonal ways: 1. Changing the order in which variables are processed, called Mutation 1 henceforth, is mostly about the fact that some regions may be more constrained than
Generating Fuzzy Regions from Conflicting Spatial Information
233
others, in a way that is difficult to quantify. Instantiating these variables first will make it typically easier to find appropriate spatial scenarios, but it has no theoretic influence on what spatial scenarios could possibly result. 2. Changing the order in which relations are considered for each variable, called Mutation 2 henceforth, is about making the right choices when inconsistencies are present. Changing these orderings will not make it easier to find optimal spatial scenarios, but it will influence what relations the algorithm tries to satisfy. A similar behavior is observed when changing the degrees to which spatial relations are required to be satisfied, which we will call Mutation 4. The remaining mutation operator, i.e. changing the number of α -levels for a given region (Mutation 3) is relevant to both aspects. First note that Mutation 2 and Mutation 4 serve to the same goal. In practice, it will therefore be beneficial to only consider one of these operators (i.e. either p2mut = 0 or p4mut = 0). A further reduction of the search space, and a consequent gain in performance could be expected by explicitly decomposing the problem in the two aforementioned subproblems. Note, however, that because the two subproblems are not completely independent, we cannot evolve solutions to both subproblems completely separately. As an alternative, a technique known as cooperative coevolution is often used in such a case [31]. This solution essentially boils down to evolving (in this case) two separate populations, each involving a different type of chromosomes, and combining chromosomes from both populations to evaluate their fitness. In the simplest case, this could be applied to our problem as follows. Chromosomes from the first population encode the variable ordering as well as the number of α -levels to consider, whereas chromosomes from the second population encode either the ordering of the relation per variable (Mutation 2) or for each relation the degree to which it should be satisfied (Mutation 4). To evaluate fitness in the first population, chromosomes are combined with the best chromosome of the second population (and symmetrically for the second population). In addition, we will experiment with a simpler approach, based on the same idea, which we will refer to as alternating GA. In the first n1 generations, only Mutation 1 and Mutation 3 are applied as mutation operators. In the next n2 generations, either Mutation 2 or Mutation 4 is applied, as well as Mutation 3. This sequence is repeated, until the population converges, or a predefined number of generations have been generated. In addition to its simplicity, this scheme also has the advantage that Mutation 3, which is relevant to both aspects of our problem, can be applied in both cases, i.e. together with Mutation 1, as well as with Mutation 2/Mutation 4.
5 Experimental Results It is clear from the definition of the fitness function that the fitness of optimal fuzzy models will be higher than the fitness of optimal crisp models. On the other hand, the substantially increased search space when fuzzy models are considered may make it harder to find optimal solutions, and may consequently result in a degradation of performance. To investigate whether this is the case in practice, this section
234
S. Schockaert and P.D. Smart
Fig. 6 Performance of various configurations of the genetic algorithm on 50 random data sets involving 20 regions
reports on the results of a number of experiments that were conducted on randomly generated data sets. Specifically, we generated 50 synthetic sets of topological relations involving 20 regions. To generate each of these sets, the following procedure was applied. Using a heuristic optimization technique, we selected an RCC-8 base relation for each pair of regions, such that together they are classically consistent, and such that there is approximately an equal number of occurrences of each base relation. Subsequently, we selected 50 pairs of regions, and changed the corresponding relation randomly to another RCC-8 base relation. Note that the resulting set of spatial relations is classically inconsistent (with a very high probability). Figure 6 displays the average fitness per generation for a number of different configurations of the algorithm. In each case a SSGA is used with a population of 20 chromosomes and tournament selection parameters ntour = 3 and ptour = 0.5; crossover was applied with probability 0.9. In those configurations that apply the corresponding mutation operators, mutation probabilities were chosen as p1mut = 0.1, p2mut = 0.1, p3mut = 0.02 and p4mut = 0.02. A good performance for these parameters was witnessed in a number of preliminary experiments on separate test data. The extremely high value for p1mut and p2mut is remarkable: on average 10% of the elements are given a new position in the ordering, after every mutation (recall that the mutation operators are applied to every element of the variable and relation orderings). The reason for this is that these orderings contain a lot of redundant information, i.e. we are not really interested in finding one specific optimal ordering (as in e.g. traveling salesman problem), but rather in some ordering which satisfies some loose, but unknown, constraints.
Generating Fuzzy Regions from Conflicting Spatial Information
235
The configurations considered are specified as follows: • Crisp-Basic only uses crisp regions, and only applies Mutation 1 and Mutation 2. • Crisp-Restricted only uses crisp regions, and only applies Mutation 1. • Crisp-Alternating only uses crisp regions, and repeatedly applies only Mutation 1 for 33 generations, followed by 17 generations in which only Mutation 2 is applied. • Fuzzy-Basic uses fuzzy regions involving a number of α -levels between 1 and 4, and only applies Mutation 1, Mutation 3, and Mutation 4. • Fuzzy-Alternating4 uses fuzzy regions involving a number of α -levels between 1 and 4, and repeatedly applies Mutation 1 and Mutation 3 for 33 generations, followed by 17 generations of Mutation Mutation 4 and Mutation 3. • Fuzzy-Alternating2 uses fuzzy regions involving a number of α -levels between 1 and 4, and repeatedly applies Mutation 1 and Mutation 3 for 33 generations, followed by 17 generations of Mutation Mutation 2 and Mutation 3. The most important observation from Figure 6 is that the three configurations which generate fuzzy models result in a substantially higher fitness than the configurations that only generate crisp models. Hence, the different configurations appear to be successful in coping with the larger search space. Next, both in the fuzzy case and in the crisp case, the alternating variants result in a slightly higher fitness than the basic variants. This suggests that the idea of successively optimizing different aspects in a cyclic manner is useful. Although the amount by which the fitness function is increased is limited, we can expect this amount to be larger when more complex data is used (e.g. using more regions, involving a higher degree of inconsistency, or involving more expressive spatial information than only topological relations); a detailed investigation of this is left as future work. Interestingly, the approach based on cooperative coevolution resulted in a worse performance. This result is not shown in Figure 6, because more spatial scenarios are calculated per generation, hence we cannot simply compare the fitness after a certain number of generations. In particular, a population of 20 chromosomes was used to find optimal variable orderings, and a population of 10 to find optimal degrees to which each of the relations should be satisfied. As 30 spatial scenarios are thus calculated in each generation (instead of 20), the result after 166 generations should be compared against the result of the other configurations after 250 generations. This result was 0.694, which is less than the result of Fuzzy-Basic (0.706), Fuzzy-Alternating2 (0.708) and Fuzzy-Alternating4 (0.712). A final surprising observation is that Mutation 2 does not perform significantly worse than Mutation 4, despite that the latter involves a more informed decision based on how fuzzy spatial relations are evaluated in terms of the α -level sets.
6 Concluding Remarks The use of genetic algorithms (GAs) to generate spatial scenarios is appealing, as it can naturally cope with the diverse types of spatial information that are often
236
S. Schockaert and P.D. Smart
encountered in practice. With the aim of making such an approach more tolerant to inconsistencies in the input data — a requirement that is of prime importance if we are to apply it on web mined data — we have introduced an extension of our earlier work, in which fuzzy regions rather than crisp regions are generated. We have proposed suitable genetic operators to implement this idea, and we have discussed how the increased size of the search space can be managed using a variant of cooperative coevolution which we called alternating GAs. There are several directions in which we intend to extend this work. A first direction considers the application on real-world data. Although the experimental results in this chapter demonstrate that using the fuzzy region based GA may result in higher fitness scores, it is not entirely clear how such fuzzy regions could be used in practice, and indeed whether the resulting fuzzy boundaries are likely to have a reasonable geographic interpretation (e.g. are vernacular regions more likely to result in fuzzy regions than well-defined administrative regions?). A second direction relates to the specifics of the genetic algorithm itself, and its relation to cooperative coevolution. Although our problem can be naturally divided in two subcomponents — ordering variables and choosing relations — it appears that the subtle interplay between these subcomponents provides an interesting challenge for (variants of) approaches to cooperative coevolution. Acknowledgement. Steven Schockaert was funded as a postdoctoral fellow of the Research Foundation – Flanders.
References 1. Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 832–843 (1983) 2. Bittner, T., Stell, J.G.: Vagueness and rough location. Geoinformatica 6(2), 99–121 (2002) 3. Brunsdon, C.: Estimating probability surfaces for geographical point data: An adaptive kernel algorithm. Computers & Geosciences 21(7), 877–894 (1995) 4. Cicerone, S., Di Felice, P.: Cardinal relations between regions with a broad boundary. In: Proceedings of the 8th ACM International Symposium on Advances in Geographic Information Systems, pp. 15–20 (2000) 5. Cockcroft, S.: A taxonomy of spatial data integrity constraints. Geoinformatica 1(4), 327–343 (1997) 6. Cohn, A.G., Gotts, N.M.: The ‘egg-yolk’ representation of regions with indeterminate boundaries. In: Burrough, P.A., Frank, A.U. (eds.) Geographic Objects with Indeterminate Boundaries, pp. 171–187. Taylor and Francis Ltd., Abington (1996) 7. Delaunay, B.N.: Sur la sphere vide, Izvestia Akademia Nauk SSSR,Otdelenie Matematicheskii i Estestvennyka Nauk 7 VII Seria, 793–800 (1934) 8. Dorigo, M., Di Caro, G.: The ant colony optimization meta–heuristic. In: Corne, D., et al. (eds.) New ideas in optimization, pp. 11–32. McGraw–Hill Ltd., New York (1999) 9. El-Geresy, B.A., Abdelmoty, A.I.: A qualitative approach to integration in spatial databases. In: Quirchmayr, G., Bench-Capon, T.J.M., Schweighofer, E. (eds.) DEXA 1998. LNCS, vol. 1460, pp. 280–289. Springer, Heidelberg (1998)
Generating Fuzzy Regions from Conflicting Spatial Information
237
10. Erwig, M., Schneider, M.: Vague regions. In: Proceedings of the 5th International Symposium on Advances in Spatial Databases, pp. 298–320 (1997) 11. Fortune, S.: A sweepline algorithm for voronoi diagrams. Algorithmica 2(2), 153–174 (1987) 12. Frank, A.U.: Tiers of ontology and consistency constraints in geographic information systems. International Journal of Geographical Information Science (7), 667–678 (2001) 13. Goldberg, D.E.: Genetic algoritms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Amsterdam (1989) 14. Goldberg, D.E., Linge, R.L.: Alleles, loci and the traveling salesman problem. In: Proceedings of the First International Conference on Genetic Algorithms and their Applications, pp. 154–159 (1985) 15. Gong, P., Mu, L.: Error detection through consistency checking (2004), http://www.cnr.berkeley.edu/gong/PDFpapers/ GongMulanError.pd 16. Grigni, M., Papadias, D., Papadimitriou, C.: Topological inference. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 901–906 (1995) 17. Hamamoto, S., Yih, Y., Salvendy, G.: Development and validation of genetic algorithmbased facility layout: a case study in the pharmaceutical industry. International Journal of Production Research 37(4), 749–768 (1999) 18. Hart, G., Dolbear, C.: What’s So Special about Spatial? In: The geospatial web how geobrowsers, social software and the web 2.0 are shaping the network society ed., February 2007, vol. 10.1007/978-1-84628-827-2, pp. 39–44. Springer, London (2007) 19. Hunter, G.J.: Management issues in gis: Accuracy and data quality. In: Conference on managing Geographic Information Systems for success, pp. 95–101 (1996) 20. Jones, C.B., Purves, R.S., Clough, P.D., Joho, H.: Modelling vague places with knowledge from the Web. International Journal of Geographic Information Systems 22(10), 1045–1065 (2008) 21. Lee, D.T., Schachter, B.J.: Two algorithms for constructing a delaunay triangulation. International Journal of Computer and Information Sciences 9(3), 219–242 (1980) 22. Li, S.: On topological consistency and realization. Constraints 11(1), 31–51 (2006) 23. Jildou, L., Sisi, Z., Ron, L., Peter, O.: Specifying and implementing constraints in gis– with examples from a geo-virtual reality system. Geoinformatica 10(4), 531–550 (2006) 24. Marble, D.F.: The extended data dictionary: A critical element in building viable spatial databases. In: 11th annual ESRI user conference (1990) 25. Nebel, B.: Computational properties of qualitative spatial reasoning: first results. In: Proceedings of the 19th German Conference on Artificial Intelligence, pp. 233–244 (1994) 26. Egenhofer, M.J., Tryfona, N.: Multi-resolution spatial databases: Consistency among networks. In: Integrity in Databases-Sixth International Workshop on Foundations of Models and Langauges for Data and Objects, September 1996, pp. 119–132 (1996) 27. Oliver, I.M., Smith, D.J., Holland, J.R.C.: A study of permutation crossover operators on the traveling salesman problem. In: Proceedings of the Second International Conference on Genetic Algorithms and their Applications, pp. 224–230 (1987) 28. Parsons, S.: Current approaches to handling imperfect information in data and knowledge bases. IEEE Transactions on Knowledge and Data Engineering 8(3), 353–372 (1996) 29. Pernici, B., Mirbel, I., Brisaboa, N.R.: Constraints in spatio-temporal databases: A proposal of classification, June 4 (1998) 30. Poon, P.W., Carter, J.N.: Genetic algorithm crossover operators for ordering applications. Computers and Operations Research 22(1), 135–147 (1995) 31. Potter, M.A., De Jong, K.A.: Cooperative coevolutoin: an architecture for evolving coadapted subcomponents. Evolutionary Computation 8(1), 1–29 (2000)
238
S. Schockaert and P.D. Smart
32. Randell, D.A., Cui, Z., Cohn, A.G.: A spatial logic based on regions and connection. In: Proceedings of the 3rd International Conference on Knowledge Representation and Reasoning, pp. 165–176 (1992) 33. Renz, J.: Maximal tractable fragments of the Region Connection Calculus: A complete analysis. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 448–454 (1999) 34. Renz, J., Nebel, B.: On the complexity of qualitative spatial reasoning: A maximal tractable fragment of the Region Connection Calculus. Artificial Intelligence 108(1-2), 69–123 (1999) 35. Rodr´ıguez, A.: Inconsistency issues in spatial databases. In: Hunter, A., Bertossi, L., Schaub, T. (eds.) Inconsistency Tolerance. LNCS, vol. 3300, pp. 237–269. Springer, Heidelberg (2005) 36. Schneider, M.: Metric operations on fuzzy spatial objects in databases. In: Li, K.-J., Makki, K., Pisinou, N., Ravada, S. (eds.) Proceedings of the 8th ACM Symposium on Advances in Geographic Information Systems (GIS 2000), November 10-11, 2000, pp. 21–26. ACM Press, New York (2000) 37. Schockaert, S., De Cock, M.: Neighborhood restrictions in geographic IR. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 167–174 (2007) 38. Schockaert, S., De Cock, M., Cornelis, C., Kerre, E.E.: Fuzzy region connection calculus: an interpretation based on closeness. International Journal of Approximate Reasoning 48, 332–347 (2008) 39. Schockaert, S., De Cock, M., Cornelis, C., Kerre, E.E.: Fuzzy region connection calculus: representing vague topological information. International Journal of Approximate Reasoning 48, 314–331 (2008) 40. Schockaert, S., De Cock, M., Kerre, E.E.: Spatial reasoning in a fuzzy region connection calculus. Artificial Intelligence 173(2), 258–298 (2009) 41. Schockaert, S., Smart, P.D.: Evolutionary strategies for expressive spatial reasoning. In: Proceedings of the Workshop on Soft Methods for Statistical and Fuzzy Spatial Information processing, pp. 26–39 (2008) 42. Schockaert, S., Smart, P.D., Abdelmoty, A.I., Jones, C.B.: Mining topological relations from the web. In: Proceedings of the 19th International Workshop on Database and Expert Systems Applications (FlexDBIST), pp. 652–656 (2008) 43. Schockaert, S., Smart, P.D., Twaroch, F.A.: An evolutionary approach to expressive spatial information processing (submitted) 44. Shi, G.: A genetic algorithm applied to a classic job-shop scheduling problem. International Journal of Systems Science 28, 25–32 (1997) 45. Smart, P.D., Abdelmoty, A.I., El-Geresy, B.A., Jones, C.B.: A framework for combining rules and geo-ontologies. In: Marchiori, M., Pan, J.Z., de Sainte Marie, C. (eds.) RR 2007. LNCS, vol. 4524, pp. 133–147. Springer, Heidelberg (2007) 46. Starkweather, T., McDaniel, S., Mathias, K., Whitley, D., Whitley, C.: A comparison of genetic sequencing operators. In: Proceedings of the fourth International Conference on Genetic Algorithms, pp. 69–76 (1991) 47. Syswerda, G.: Schedule optimization using genetic algorithms. In: Davis, L. (ed.) A Handbook of Genetic Algorithms, pp. 332–349. Van Nostrand Reinhold (1991) 48. Twaroch, F.A., Jones, C.B., Abdelmoty, A.I.: Acquisition of a vernacular gazetteer from web sources. In: LOCWEB 2008: Proceedings of the first international workshop on Location and the web, pp. 61–64. ACM, New York (2008) 49. Ubeda, T., Egenhofer, M.J.: Topological error correcting in gis. In: Scholl, M., Voisard, A. (eds.) SSD 1997. LNCS, vol. 1262, pp. 283–297. Springer, Heidelberg (1997)
Generating Fuzzy Regions from Conflicting Spatial Information
239
50. Usery, E.L.: A conceptual framework and fuzzy set implementation for geographic features. In: Burrough, P.A., Frank, A.U. (eds.) Geographic Objects with Indeterminate Boundaries, pp. 71–86. Taylor & Francis, Abington (1996) 51. Whitley, D., Kauth, J.: GENITOR: A different genetic algorithm. In: Proceedings of the Rocky Mountain Conference on Artificial Intelligence, pp. 118–130 (1988)
Part 3: Prediction and Interpolation
Fuzzy Methods in Image Mining Alfred Stein
Abstract. This paper presents the use of soft methods in image mining. Image mining considers the chain from object identification on natural or man-made processes from remote sensing images through modelling, tracking on a series of images and prediction, towards communication to stakeholders. Attention is given to image mining for vague and uncertain objects. Aspects of up- and downscaling are addressed. We further consider in this paper both spatial interpolation and decision making. The paper is illustrated with several case studies. Keywords: Image mining, uncertain objects, spatial data quality, decision making, scale.
1 Introduction Remote sensing images are available at an increasing frequency and spatial resolution and from a large variety of sensors. Spatial resolution is down to 1 m for the modern Quickbird and Ikonos satellites, as well as for the Indian Cartosat-2 satellite, whereas images are becoming available every 15 minutes, e.g. from the Meteosat 2nd generation satellite MSG-1 and MSG-2, although at spatial resolutions of 1 to 3 km. Further, spectral resolution is becoming further refined, as for example the Hyperspectral Mapper (HyMAP) is a 126-band imaging spectrometer of reflected solar radiation within the 0.4 to 2.5 m wavelength region of the electromagnetic spectrum with spectral coverage nearly continuous in the visible-to-near-infrared (VNIR) and shortwave-infrared (SWIR) regions. Despite this development, uncertainty is still inherent in images. The resolution in space and time is restricted, whereas the spectral Alfred Stein ITC, PO Box 6, 7500 AA Enschede, The Netherlands e-mail:
[email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 243–268. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
244
A. Stein
resolution is limited as well. For example, the spatial configuration of the HyMap sensor accounts for an instantaneous-field-of-view (IFOV) of 2.5 mrad along track and 2.0 mrad across track resulting in a pixel size of 3 to 5 m. Further, with increasing quality of images, questions to be answered by using those images are changing as well. In the recent past, a simple land cover classification was important, e.g. to compare it with a previous classification to discover changes. These days attention focuses on much more complicated activities, like monitoring of objects, super resolution mapping, assessing biodiversity and deriving values for agricultural and environmental models. Image mining is a relatively new development focusing on extracting relevant information from large sets of remote sensing images. Image mining concerns the classification and segmentation, either in space or in space-time. In space we consider textural image segmentation as a first step, whereas no clear procedures exist in space-time. Main objective of image mining is to reduce uncertainty allowing to make better decisions. Because of the high temporal resolution, images often are not (and cannot be) properly validated. We may therefore doubt the quality of the derived information, requiring additional efforts for skilful mathematical and logical modelling. Fuzzy methods, statistical methods and probabilistic procedures are important in order to deal with the various characteristics, and with a proper attention to aspects of data quality. Image mining has been developed as an integrated approach to catch and follow objects discernable on remote sensing images. It concerns on the one hand objects that could be identified on an individual image, and on the other hand objects that are present on a series of image and thus constitute an object in the space-time domain. So far, attention has focused mainly on crisp objects, i.e. objects with a sharp boundary and of a well-defined and homogeneous content. We may realize, however, that many real world objects are inherently vague. This can be due to a poor object definition, or to the existence of gradual transition zones. For that reason, probabilistic and fuzzy methods are required. Import steps during image mining are the identification of spatial patterns and testing of their significance, dimension reduction to have an improved informative content and the use the existing information as good as possible by making realistic assumptions, including all determining factors and making quantitative statements, including the uncertainties. These are in general issues of spatial data quality. Spatial data quality, however, also depends upon the required decision and hence on the stakeholders’ interests. Interesting in image mining is the issue of scale. Scale reflects the size of an object in space and in time. But scale is also a matter of semantics. An object at one scale can consist of various objects of a different meaning at another scale. A forest consists of trees, paths, open spaces and heather fields, a farm consists of fields, farmhouses, connecting roads, etc. These objects can all be crisp or fuzzy, depending upon the spatial resolution and interest of stakeholders. At the moment, we can identify the following new developments. First, it is of an increasing importance to be able to handle vague and fuzzy objects identifiable on remote sensing images in an efficient and effective way. Methodology has broadly
Fuzzy Methods in Image Mining
245
been available, but a solid mathematical foundation has been missing. Second, spatial data quality is an emerging field of science, covering aspects that are inherent in image mining. Positional accuracy, attribute accuracy, and semantic accuracy, but also temporal accuracy are of concern. Third, new methodology is available to make the step from point data to areal statements, i.e. to link point observations through interpolated maps with remote sensing images. This requires the interpolation of model results and related questions on optimal sampling and network configurations. Fourth, issues of changes in scale are receiving an increasing attention. Statistical methods are able to deal with crisp and vague objects in this sense. The aim of this paper is to present some general findings for image mining on uncertain objects. Several examples are included, that show the dynamic behavior of uncertain spatial objects.
2 A Short Introductory Example We consider an object that is characterized by a reflectance that deviates from that of its neighborhood on a series of images [35]. Here we focus on flooding of the Tonle Sap Great Lake in Cambodia, part of the Mekong river (Fig. 1). Flooding due to the Mekong river and its tributaries is a recurrent problem that occurs almost every year. Seasonal flooding characterizes most of the tropical region rivers. This flood causes a considerable damage on human settlements, agricultural activities and infrastructures of the surrounding area. The Tonle Sap Great Lake, which is located at the heart of Mekong river system, extends from a small area in the dry season to three to four fold area extent in the wet season. The study area lies on the lower part of Mekong region of Cambodian floodplain following the Mekong river. The main study area is bounded within the geographic co-ordinate of latitude 12◦ 06 25” N to 13◦ 55 56” N and longitude 102◦ 29 13” E to 104◦ 28 51” E. The country is characterized by five distinct topographic features: sandstone Dangrek Range in the north, which forms the border with Thailand, the granite Cardamom Mountains with peaks of over 1500 m to the south-west, the Darlac Plateau in the north-east, which rises to over 2700 m and the Central Plains between 10 and 30 m above sea level, which form 75 percent of the land area. The Mekong River flows over a distance of 486 km in Cambodia. Most of the Cambodian rivers and streams flow into the Mekong Great Lake basin. Flooding of the Mekong rover was monitored during the rainy season in 2001. Starting from May 2001 to January 2002, a time series of nine Landsat 7 ETM+, multi-spectral images are acquired, moments being dictated by visibility of the area, and only those images that contained less than 15% clouds were useful. Depending on the time of image acquisition the size of area extent of the lake varies from image to image. Images were collected on the following dates: • At t1 = May 31st , early flood, we observe n1 objects: one large object and a series of n1 − 1 small objects. These are characterized as O1,i , i = 1, . . . , n1 .
246
A. Stein
Fig. 1 Flooding objects identified in the Tonle Sap Great Lake area at the nine moments of observations
• There is little change as compared to t2 = June 16th , early flood. The objects O1,i can simply be tracked to objects O2,i , equating objects at the same position but at different moments in time with each other. • At t3 = July 2nd , start of rising flood, we notice that some of the smaller objects expand, and that some objects merge. Hence a careful tracking is required to relate the n3 objects at t3 with the n2 objects at t2 . We also notice the birth of some new objects. • At t4 = July 18th , rising flood, we notice that the large object is now increasing further and that one of the objects at t3 is decreasing and splitting into some smaller objects. • Changes from t4 to t5 , rising to peak flood, are small, but we observe a decrease in the size of the largest object. • At t6 = September 4th , peak flood, we notice that the objects have now merged into a single large object, e.g. labeled as O6,1 . • This object we also observe as the single object O7,1 at t7 = September 20th , peak to falling flood, although of a somewhat smaller size. • During the successive period the flooding apparently reduces and at t8 = November 23rd , falling flood stage, we notice the reduction in the size of O7,1 to become O8,1 and the birth of several, say n8 − 1 smaller objects.
Fuzzy Methods in Image Mining
247
• At t9 = January 10th , end of flood, we notice that the largest object, now labeled O9,1 is of a comparable size as at t1 , but of a somewhat different shape. Flooding apparently is one of the phenomena that we can characterize using image mining. It starts at some moment in time, it may be poorly visible at several moments in time, because of (partial) cloud cover, it increases in size, it may split, several objects may merge, and after the river withdraws, the flood ends and the object reduces to its original size. The image shows the water bodies as crisp objects, whereas in fact they are uncertain objects with membership values between 0 and 1. However, the boundaries between flooded and non-flooded land are difficult to draw, if possible at all and (partial) cloud cover may prohibit its precise and detailed observation. Flooding can further be interpreted from various perspectives like a natural, an ecological, an environmental or a socio-economical perspective. This makes the phenomenon inherently fuzzy. For example, flooding covering an area A may have an effect on the economy and the infrastructure on a much larger area. Also, environmental effects of flooding may expand well beyond the edge of the flooded area and remain present for years. Finally, the ecological effects can be large as well, like the creation of new habitats as a consequence of flooding. In an analysis of flooding in space and time, we are facing the following issues: • • • • • •
What characterizes a flooded object? Can we obtain a reliable estimate the number of flooded objects? What is the size of a flooded object? Do we notice the splitting (and merging) of flooded objects in time? What is happening to flooded objects between moments of observation? Can we predict the future of the flooding beyond the last observation time, hence beyond t9 (and before t1 )? • Can we reliably predict whether a similar problem will occur in the next year? • What determines the amount of flooding? • And, most fundamentally: do we understand the process called flooding? We will not be able to answer all the questions just raised (that most likely could be complemented by many more), but we will sketch a procedure based on fuzzy theory that will help us along the road. For that we first turn towards fuzzy theory.
3 Fuzzy Theory Mathematical modeling by means of rationalization requires the transformation of vague concepts. Precise data are not always available. For that reason, from the nineteen sixties onward the theory of fuzzy logic has been developed [36], with good and insightful applications in e.g. [3] and [15]. In the flooding example, the object ’flooded area’ may not be properly defined, as a gradual transition most likely occurs between inundated area and non-inundated area, reaching from moist soil, through vegetation on the water to vegetation extending above water, thus leading to vague or imprecise data. This applies a fortiori for observations on flooded land, where images are both limited in their crispness because of their limited spatial
248
A. Stein
resolution and because of their limited spectral signature, i.e. water is not always equally simply discernable from land. The common procedure is that objects are characterized by their membership functions. Membership functions in a 2-dimensional space X are functions μA (x), taking values between 0 and 1, which specify the degree to which the location given by the 2-dimensional vector x is characterized by A [2]. A membership function is formally defined by a function η (z) for z ∈ [0, 1], which transforms through a homeomorphism towards the membership function μA (x) for x ∈ X , with A being the set of interest. To best understand this, we realize that a crisp object is topologically equivalent to the unit interval [0, 1], i.e., there is a homeomorphism h from [0, 1] to the object O in R2 . A crisp object thus is the image of the [0, 1] interval by the homeomorphism h : {h(t) = (x(t))|t ∈ [0, 1]}. We can build a fuzzy set Q in [0, 1] that satisfies the continuity properties for membership functions, then transfer its membership values to the crisp object O via the homeomorphism h. This leads to a fuzzy object. For the construction of a vague ˜ from Q via the homeomorphism h the vague object O ˜ is built from the object O extension principle as the image h(η ) of the fuzzy set Q. The difference with the fuzzy object is then that both the set Q is fuzzy and we consider a fuzzy object ˜ For some objects that have to satisfy specific conditions, we may put additional O. constraints. For example, [11] restricts homeomorphisms for vague line objects to (0, 1) to allow looped lines, thus requiring continuity at the end points 0 and 1. This definition applies to both discrete and to continuous images, i.e. to images in which the approach is in general pixel based, as well as to images that we may see (or handle) as a continuum. The choice for an interval [0, 1] is arbitrary, though, and the theory of fuzzy sets continues k for any set L, say elements of a lattice, and sometimes, even a finite set Lm = m−1 |0 ≤ k ≤ m − 1 ⊆ [0, 1]. Such membership functions are characterized by the steepness of their slopes, i.e. showing how rapid it increases from zero to one, by their homogeneity, by their support, i.e. those x ∈ X for which μA (x) > 0, their core, i.e. those x ∈ X for which μA (x) = 1 and their centroid, being the point of gravity of a fuzzy set. These membership functions characterize the following properties of fuzzy sets: • The support of the fuzzy set A: Supp(A) = {x ∈ X |μA (x) > 0}. If Supp(A) is one element, then it is called a fuzzy singleton (or a τ -singleton if μA (x) = τ < 1) • The height of a fuzzy set is defined as hgt(A) = supx∈X μA (x) > 0 • α −cuts are defined as A>α = {x ∈ X |μA (x) > α }. Strong α −cuts are defined as A≥α = {x ∈ X |μA (x) ≥ α }. Notice that Supp(A) equals A>0 and that core(A) equals A≥1 . The famous representation theorem states that each fuzzy set uniquely determines its α -cuts, and that conversely all α -cuts uniquely determine a fuzzy set. • The centroid of the membership function for a two-dimensional vector x is defined as X x · μA(x)dx In the image mining process described below one of the main challenges will be to derive membership functions from images.
Fuzzy Methods in Image Mining
249
Fig. 2 A fuzzy set on a 2-dimensional space in two perspectives
A special type of fuzzy sets are the normal sets. They emerge from an ordinary fuzzy set by dividing all membership values by the hgt(A). The reason for doing so can be manifold. For example, man interest may exist in relating fuzzy sets to each other, irrespective of their degree of truth as specified by the height of the membership function. Doing such does not affect the support or the cardinality, defined as the sum or the integral of a fuzzy set. Further, one commonly encounters the cardinality of a fuzzy set, defined as card(A) = ∑x μA (x) for discrete fuzzy sets and by card(A) = A μA (x)dx for continuous fuzzy sets. Relative cardinality is the fraction of the whole set occupied by the fuzzy set, i.e. card(A)/card(X): the cardinality of A divided by the cardinality of the whole set (the image). For spatial data quality of uncertain objects we consider on the one hand the relation between the set A of interest, as expressed by the membership function, and on the other hand the spatial precision. The relation between A and the membership function is given by the homomorphy h from a function η on the set A towards a set in R1 or R2 of interest. Important questions in this regard are the representation of A (does A really reflect the property of interest), the membership values η (are the membership values a proper reflection of the properties) and the behavior of the function h (is the correspondence between content and object properly represented). The second issue deals with positional precision of the objects to be studied in R1 and R2 . Coordinate precision, spatial extension and unambiguity in the definition all play a role here. On fuzzy sets various operations are defined in terms of their membership functions. Considering two fuzzy sets A and B we distinguish the following basic rules: • μ¬A (x) = 1 − μA(x): the membershipfunction of the complement of A is 1 minus the membership function of A. • A ∨ ¬A = μX (x), i.e. the membership function of A or its negation is the membershipfunction of the whole image. • If A = B then μA (x) = μB (x). • A ∪ B corresponds to μA∪B (x) = max(μA (x), μB (x)), i.e. to taking the maximum the two membershipfunctions of the constituting sets. • A ∩ B corresponds to A∩B (x) = min(μA (x), μB (x)), i.e. to taking the minimum of the two membership functions of the constituting sets. • If A ⊇ B then μA (x) ≥ μB (x): a set B contained in A has a membership function that is lower than the membership function of A.
250
A. Stein
This can be further generalized, e.g. by mathematically identifying fuzzy sets with their membership functions, which can be iterated, leading to fuzzy sets of higher type (e.g. memberships are fuzzy sets). For image analysis it may be appropriate to have degrees of intensity chosen from [0,1] for each band, or well-defined combination of bands, such as the vegetation index or the wetness index. For n colors, for example, L = [0, 1]n , intensities can be normalized as membership degrees over a universe of wavelengths. Membership degrees can also be interpreted as degrees of truth of a sentence, leading to a manyvalued logic: μA (x) = [x ∈ A], where [H] denotes the truth degree of a statement H. The type of interpretation opens the door to reaching analogies between notions and results related to fuzzy sets.
4 Image Mining Image mining is defined as The analysis of (often large sets of) observational images to find (un)suspected relationships and to summarize the data in novel ways that are both understandable and useful to stakeholders (Fig. 3). Objects discernable on those images can be either crisp or fuzzy and vague. We distinguish the five important steps in image mining: identification, modelling, tracking, prediction and communication with stakeholders. All these processes will be briefly discussed below. On top of this, we notice aspects of spatial data quality in each of these steps.
Object Identification
Modelling External factors
Images
Prediction Tracking in time t1,…,tn Quality control
Fig. 3 Image mining of uncertain objects
t0 Stakeholders
Fuzzy Methods in Image Mining
251
4.1 Object Identification Focusing on uncertain objects, we consider first their identification. The typical issue is that a pattern observed in an image is equated with a set of signals, that could relate to one of the objects of interest. The typical example of a flooding would mean that concern is about an area in which flooding may occur and that suddenly a homogeneous set of pixel values appears within a sub-area of the image. This may not be equal to flooding, however, as also other changes in land cover may result in a homogeneous set of pixel values, like for example snow coverage, fires, and human induced land-cover change (planting of trees at a large scale, yielding a cropped field, etc).
4.2 Extracting Objects from Images The next step concerns the extraction of the uncertain objects from images. This is usually done by means of a segmentation, followed by a classification. Extraction of objects makes the step from raster to objects. Typically, extraction is done by applying a segmentation routine, in which both the object and the uncertainty are modeled. Various procedures for image segmentation are well documented, and include procedures based on mathematical morphology, on edge detection and on identifying homogeneity in one band or in a set of bands [16]. Modelling of uncertainty has been done in the past by using, e.g. a confusion index, whereas traditionally a discriminant analysis could be applied, honoring the presence of different spectral bands and yielding posterior probabilities to the non-selected classes. Recently, interesting results are being obtained by applying a texture based segmentation, based upon the papers of Ojala, T. and Pietik¨ainen [24] for single bands, and Lucieer et al. [23] for multiple bands. This leads to improved segmentation, again including uncertainty values. The result of this operation for an image at moment t thus is a series of nt objects Ot,i , i = 1, . . . , nt , that are characterized by similar pixel values, which are different from pixels values in the vicinity. We distinguish in this document three approaches: the semantic import model, fuzzy classification and the use of kernel functions. The Semantic Import Model The semantic input model [2] is based on an existing base of characteristics, stored in a knowledge base. If an identified object corresponds to these characteristics up to a pre-specified precision, then it the object is identified as being of that type. Examples are abundant, and are encountered in a range of geological and land cover studies. Kernel Smoothers We first consider as in [26] and [27] kernel smoothers and define them on a single band. An image X of size N = Ni · N j is defined as set of pixels ∪x = ∪xs where the index s = 1, . . . N refers to vertices of the image. Kernel smoothers are based upon a
252
A. Stein
specific pixel, say x0 . They use local weights to produce the estimate at each pixel of the image, with weights decreasing when at increasing distance from the pixel under consideration. Several kernel functions are available, all with a volume that integrates to 1 ([19]). A commonly used kernel function is the Gaussian function. It has a weight function based on the Gaussian density function (equation 1) and assigns weights to points that decrease exponentially with the squared Euclidian distance (x0 − xs )2 from the pixel x0 towards other positions xs : 1 − (x0 −xs )2 e 2·λ (1) λ A large value of λ implies a low variance with the kernel averaging more observations, but it also implies a larger bias, whereas a small value of λ implies relatively large variance and a small bias. Its rate of decrease is described by a parameter λ , also termed the bandwidth or the smoothing parameter. For kernels with small λ , i.e. narrow kernels, the largest influence comes from nearby observations, whereas kernels with a large value of λ , i.e. wide kernels, receive more influence from distant observations. For the Gaussian kernel it corresponds to the variance of the Gaussian density, and in that sense controls the width of the neighborhood. Narrow kernels emphasize small-scale details of the data structure, whereas wide kernels reveal more global properties of the distribution. When λ is chosen optimally the smoothed image has a larger signal to noise ratio than the original image. The simplest form of kernel estimate is the Nadaraya-Watson weighted average. This estimator approximates a pixel value x0 by an approximate value xˆ0 , using relations with neighboring pixels xs . The form of the relation is given by the kernel function in equation 1: Kλ (x0 , xs ) =
xˆ0 =
N−1
N−1 Kλ (x0 , xs ) = ∑ xs hs (x0 ) s=0 s=1 Kλ (x0 , xs )
∑ xs ∑N
s=1
(2)
in which the term hs (x0 ) abbreviates the dependence between x0 and the xs s. The estimator can be viewed an expansion in renormalized radial basis functions, with a basis function hs located at the pixel location x0 and coefficients xs , s = 1 . . . N [19]. As both the numerator and the denominator in 2 are convolutions, this operation is computationally rather complex. Instead of representing an image as a an array of pixel values, the image can be represented as the sum of sine waves of different frequencies, amplitudes and directions, i.e. in the frequency domain [16]. The Fourier transform places an image from the spatial domain into the frequency domain, whereas the inverse Fourier transform places an image from the frequency domain into the spatial domain. A convolution operation in the spatial domain becomes simply a point-by-point multiplication in the frequency domain. Therefore an alternative way of applying a smoothing kernel is to multiply the Fourier transforms of the convoluted elements and to perform the inverse Fourier transform of the obtained product.
Fuzzy Methods in Image Mining
253
Fuzzy Classification The next way of deriving membership values (rather than membership functions) is to carry out a fuzzy classification on a multiband image. This procedure is as follows. Fuzzy classification has been used for posterior classification of soil (Burrough, 1989; Burrough et al., 1992; McBratney and de Gruyter, 1992a; McBratney and de Gruyter, 1992b; Odeh et al., 1992). Main benefits from fuzzy classification over hard classification are: • it can handle usually vague concepts such as those that one may have to deal with in soil or geological studies, or that are identifiable from remote sensing images • it takes gradual changes into account, as in climate and vegetation studies • it allows a user controlled variable for improved classification. • it does not require the presence of a representative profile. Fuzzy classification has been used successfully in soil contamination studies, where representative profiles are not defined, where gradual changes occur, and where a statistical classification technique is required [20]. In remote sensing we consider an image that contains p different bands and we suppose that the data are collected in a matrix X p of size N × p, with as before N the number of pixels. A classification of the individuals gives the matrix with membership values M = mic of size N × k where k is the number of classes and a matrix of class centroids C = {c jc } of size p × k. The value of mic is the membership value for observation i to class c. Fuzzy classification of N individuals into k classes requires that the sum of the memberships for each individual is equal to 1.0: k
∑ mic = 1, 1 ≤ i ≤ N
(3)
c=1
that each class has at least one participation, i.e., empty classes do not exist: N
∑ mic ≥ 0, 1 ≤ c ≤ k
(4)
i=1
and that each observation may belong to different classes: 0 ≤ mic ≤ 1, 1 ≤ i ≤ N.1 ≤ c ≤ k
(5)
Fuzzy classification has been carried out by making use of the fuzzy k-means algorithm. The fuzzy k-means algorithm generalizes minimization of the within class sum of squares JB (M,C): N
k
φ
JB (M,C) = ∑ ∑ mic dic2
(6)
i=1 c=1
where dic2 is the squared distance between observation point and class center and φ is an exponent with φ > 1, which can be chosen by the user. Minimization results in the following formula for the membership values and the class centra:
254
A. Stein
−2/(φ −1)
mic =
dic
−2/(φ −1)
∑kc=1 dic
, 1 ≤ i ≤ N, 1 ≤ c ≤ k
(7)
φ
Cc =
∑ni=1 mic yi φ
∑ni=1 mic
,1 ≤ c ≤ k
(8)
In these equations, Cc is the centroid of class C, yi is the measured values for variables at observation point i. In fact, the class centroid is the weighted average of the observed values at the points which participate to this class for all variables each. The weights are determined by the membership values raised to the power φ . If φ equals 1, then the soft classification turns into a hard classification. With an increasing φ , the relative participation of high membership values increases. This means that a minimum for JB (M,C) is reached for a distribution of membership values that is increasingly equal. If φ is approaching infinity, then observations tend to have an equal weight over all classes, resulting into a high degree of fuzziness. The value of JB (M,C) is higher if the distance between observation point and class center is higher. If the value of φ is higher than higher membership values will be punished more severely. For very high values for φ this results in membership values as large as 1/k for all points in all classes. By means of a resulting Piccard iteration the minimum value of JB (M,C) can be obtained. After deriving the membership values and the class centroids, commonly a decision is made to allocate each pixel to a single class. This means that the fuzzy classification turns into a hard classification. The most natural candidate for the final classification of a pixel is the class with the highest membership values. The estimated final class CF (x0 ) to which a certain grid point x0 belongs is thus cF (xo ) = argmaxc (m(x0 )c ), where m(xo )c denotes the membership to class c for the pixel at location xo . Additional information, however, is available as well in terms of the membership values to the other classes. Such information is useful to express the degree of certainty for the fuzzy classification. One way to do so is by the so-called map impurity (mi, defined as mi = 1 − max(m(xo )c ). If a pixel is classified with a high degree of certainty, then max(m(xo )c is likely to be close to 1, and hence mi is close to 0. If classification is highly uncertain, meaning that all classes are equally likely, then max(m(xo )c is close to the reciprocal of the number of classes, i.e. 1/k, and thus mi is close to 1 − 1/k [3]. Fuzzy classification does allow us to make a classification of an image at the end while still having an index of uncertainty (the map impurity) to express the fuzziness of this classification.
4.3 Modelling Modelling of identified fuzzy objects Ot,i requires the selection of a limited number of parameters, thus relating them to a class O of interest ([29]; [11]). This will facilitate modelling during the next stage and help to predict or find the cause of the spatio-temporal object.
Fuzzy Methods in Image Mining
255
In a fuzzy modelling effort, characteristics of membership functions are being recorded and stored on those moments that images are available. The centroid, for example, is available at each moment t of observation, and if present on, say, n images can be recorded at the moments t1 , . . . ,tn . At the different moments these centroids can then be plotted and a trajectory can be fitted. As an alternative one could consider modelling by Gaussian functions that approximate the kernel densities ([27]; [26]). Gaussian functions in 2 dimensions are fully specified by 5 parameters, and have the property that a parametric shape emerges, showing smoothness that is absent but can be interpreted and communicated to stakeholders. In [27] we proceeded as follows. First, the maximum value of the membership function on each image in a series of images was identified. This pixel xC = (xc1 , xc2 ), corresponding to the centroid, is taken as the center point of a Gaussian function. Next, estimates for the spread of the object along the two axes (σx1 and σx2 , respectively) is identified. Based on these values a Gaussian bivariate function (Equation 9) characterizing this area is generated: − 1 2 1 f (x1 , x2 ) = e 2(1−ρ ) 2πσx1 σx2 1 − ρ 2
x1 −xc1 σx1
2 2 x −x x −x x −x + 2σx c2 −2ρ 1σx c1 2σx c2 2 1 2
(9)
This models one single object and a grid layer containing this model is obtained by filling in the values for x1 and x2 throughout the image. The process of modelling can be repeated on the residual image, obtained by subtracting the model layer from the main image. This residual image is then considered as the new input to the previously defined steps. One may repeat such modelling as often as requested, e.g. until the maximum value of the remaining image is below a threshold (such as 5%) of the maximum value of the image layer. This threshold value allows estimation at a sufficiently high accuracy. All model layers thus obtained can then combined to form a new image. Alternatively, uncertainty is modeled by stochastic and probabilistic objects. At this stage we may mention two examples, without having the intention of being exhaustive. The first approach is modeling by means of wavelets, also leading to a reduced number of parameters [14]. The second approach concerns random sets [33] focusing on the analysis of shapes and forms that are subject to random influences. Statistical methods are used for that purpose and current process in this field is promising.
4.4 Tracking in Time After parametrization of the objects, the next stage is to model their behavior in space and time [12]. As long as the object is characterized by a limited set of parameter, such tracking may be relatively straightforward. However, treatment of objects that was relatively straightforward for crisp objects becomes somewhat more complicated for uncertain object: splitting and merging of objects, their birth and death require some special attention. Let us consider the object Ok,1 at moment tk . The
256
A. Stein
splitting of this object leads to two new objects Ok+1,1 and Ok+1,2 at moment tk+1 . Both objects require membership functions, which are defined on the basis of the membership function at tk as well as the characteristics of the new objects. An inverse operation is merging of two objects at moment tk towards one single object at moment tk+1 . This requires the membership functions to be combined into one new membership function. Also the two existing centroids have to be combined into one one new centroid, possibly by a weighted averaging of the centroids of the original objects. Tracking of objects thus requires a proper modelling of both splitting and merging. In addition, tracking requires to carefully consider the birth and death of objects. When developing algorithms for tracking, care has to be taken to properly take this dynamics into account.
4.5 Prediction If the tracking routines are successful, one may venture into the next stage of image mining, e.g. prediction of the object in the space-time domain. When tracking a single uncertain object over time, one way to proceed is to define a parametric curve for the centroid and possibly of other parameters of the membership function and then to predict at which location the curve most likely will be at a moment t0 beyond the moments of observation so far. Rajasekar et al. [27] showed the use of a linear statistical model, whereas particle filtering methods are of a possible use here as well. In prediction, one may consider a future event, i.e. a real prediction, or one moment prior to image availability, like predicting the moment that the object is born. Also, prediction is sometimes required of the object between two moments tk and tk+1 . Prediction results into the values of the membership function, of the centroid, and of the associated uncertainties. Predicted values are typically of interest to stakeholders, who may thus have a tool to support their decision making.
4.6 Stakeholders Communication to stakeholders can be manifold. It may range from simple visualization tools towards assessments of costs and benefits. Issues from decision support typically are required here. Recent developments have focused on the use of Bayesian methods in particular in a fuzzy set context [34]. In this study, we focused on the slope of the membership function for a 1-dimensional membership function. We took for this study a triangular function, i.e. ⎧ 0 : x
Fuzzy Methods in Image Mining
257
between c and d, respectively, and we took a prior from the normal distribution with the mean denoting the midpoint and the variance corresponding the variance for the first line fragment and to the opposite of the variance for the second line fragment. The likelihood was obtained from a high resolution remote sensing image, yielding a posterior estimate thus including both the prior knowledge and validation data. A fuzzy decision tree is built by considering a range of objects, which when combined produce the objects of interest, in the case of the paper it were beach objects, whereas in the flooding study described above this would be the delineation of the flooded objects. Decision making applied in this way provided essential information in a spatial context, whereas we found the combination with a probabilistic approach with fuzzy methods to be interesting as a relation between real world objects and remote sensing images leads to improved object delineation, that could be communicated to stakeholders, including the uncertainties.
5 Spatial Data Quality Spatial data quality refers to various aspects of data quality as can be identified for in geographical objects, where quality is defined as the totality of characteristics of a product that bear on its ability to satisfy stated and implied needs [21]. Spatial data quality is thus close to fitness for use, i.e. ”the ability to satisfy stated and implied needs”. Data quality may require that the truth is available, i.e. the data should reflect the truth. We may question, however, the role of spatial data quality as abstraction by data or models implies that no true values exist and that therefore measurement of error is impossible [17]. The answer to this question is twofold. • Measurement of error is possible. There is agreement upon a set of rules (the ontology), the contents of which being called the semantics that together define the membership function. By considering its support or by applying an appropriate α -cut the boundary of the object is defined. Then an error analysis can be carried out by applying the same set of rules in the collection of observations and reference data. After applying the membership function for the object to both observations and reference data it is possible to evaluate for example positional errors in the test data and summarize these as positional accuracy. The resulting accuracy will be relevant to those users who consider the semantics of their application sufficiently similar to the semantics of the data set. • Measurement of error is not possible. Then the goal may no longer be set to identifying the exact position of the boundary of a mountain and the errors therein. Instead, fuzzy set theory represents its location including the uncertainties and an exact position is no longer required, as it may not at all be possible to determine it [15]. In our treatment of the various aspects of spatial data quality for image mining from a fuzzy logic point of view we will base ourselves on Van Oort [25].
258
A. Stein
5.1 Positional Accuracy Positional accuracy is the accuracy of address and coordinate values. From a fuzzy logic perspective, either the membership function is uncertain, or the set of reference is uncertain. Both will lead to an uncertain centroid following section 3. We may assume, though, that positional accuracy will have an effect on the membership function μA (x), which changes under the influence of positional uncertainty into μ˜ A˜ (x). This effect could be quantified, if clear information about the positional accuracy is available. The coordinates of the centroid of an object may then change into a set of fuzzy numbers, or otherwise be specified. A distinction can be made into absolute positional accuracy, i.e. the accuracy relative to a given coordinate reference system, and relative positional accuracy, i.e. the accuracy relative to other data in a test data set. Relative positional accuracy is sufficient to calculate the variance in area and variance in perimeter or diameter by means of error propagation analysis [5]. More generally, relative positional accuracy is sufficient in an error propagation analysis on a single spatial data set. If data sets are to be combined and an error propagation analysis is needed, however, then absolute positional accuracy needs to be known. In the flooding example, where data are mainly derived from remote sensing images, positional accuracy is a major problem, as no precise ground references are available. A relation with clearly identifiable objects may be made, but aspects like spatial resolution, the curvature of the earth, inclination and elevation differences play a role here. Positional accuracy has not been fully exploited in an image mining context, however, where the fuzziness and vagueness of an object may be prohibitive to arrive at quantitative statements.
5.2 Attribute Accuracy Attribute accuracy is the accuracy of all attributes other than the positional attributes of a spatial data set. Attributes can be measured either at the numeric or at the nominal measurement scale. Nominal attributes are unordered, like for example geological or soil units. The accuracy of nominal attributes can be described by means of the fraction that observations correspond to the hypothesized truth. Combined for different nominal units this leads to an error matrix [6]. The accuracy of numerical attributes can be described with the root mean square error (RMSE) and similar other measures. In fuzzy aspects of image mining, attribute accuracy may be difficult to solve, and may often be even inherently unsolvable. However, it will have an effect on the membership function μA (x) that now turns into an approximate μA˜ (x) function. In figure 4 we illustrate several degrees of uncertainty on the membership function. We applied a distortion d from different uniform distributions. With increasing uncertainty the object becomes increasingly unrecognizable. In [34], however, an attempt was made for at least one type of objects (land use forms) to model and quantify uncertain boundaries. We notice in passing that attribute and positional accuracy are difficult to separate in observational studies.
Fuzzy Methods in Image Mining
259
Fig. 4 Effects of different distortions d to the membership functions: from top to bottom, d = 0.05, d = 0.1, d = 0.2 and d = 0.5, respectively
Apparently positional accuracy and attribute accuracy are closely related to each other. For uncertain objects, positional accuracy is dealt with by characteristics of the membership function, such as its support, the shape of the membership function and characteristics of its α -shapes. The attribute accuracy is identified by the content of the membership function, i.e. its relation towards the object under study. It basically answers the question to which degree the membership function expresses the concepts that are displayed.
5.3 Logical Consistency Logical consistency is the fidelity of relationships encoded in the data structure. It refers for example to the fact that the lowest parts in an area are flooded, depending on the elevation, but also to processes that reflect a changing area on a sequel of images, e.g. that the spread of the flood increases during the rainy season and decreases during the dry season. For example, the change from μA (x) at time ti should be logically related to μA (x) at time ti+1 or ti−1 . In an image mining context with uncertain objects, logical consistency may be translated into differences in membership functions and relations of membership functions with external variables. The inclusion operator defined in section 3 may be useful in this context.
260
A. Stein
In a spatial fuzzy objects analysis [34] has shown that the sequel from sea to land can be logically divided into objects that, although fuzzy, logically and consistently follow each other, like the sequel ’See → Beach → Foredune → Dune’.
5.4 Completeness Completeness is a measure of the absence of data and the presence of excess data. Brassel et al. [4] distinguish two kinds of completeness: data completeness and model completeness. Data completeness corresponds with the definition given in the ISO-standards, whereas model completeness is a measure of how well the semantics of a data set correspond with the semantics of an application of the data set. Fitness for use assessment in fact considers the role of both the producer and the user, as a producer cannot assess model completeness, which would require knowledge of the semantics of applications of data sets by the users. The producers´role is thus to document and report the semantics of his own data sets and then to allow for users to assess model completeness. Clearly, completeness plays a role when considering the different points of view in the flooding example. In fact, at each moment considering the topographic, ecological, environmental and socio-economic perspectives it may be at least be described with four membership functions, say μA1 (x), . . . , μ A4 (x), whereas the definition (and observation) of each of the membership functions may be limited to a different degree. Moreover, images are available only at those moments that a satellite is passing by, that is functioning well under cloud-free circumstances. An issue of concern is the representativity of individual images for intervals of time. In particular, one has to decide whether an image is representative for the moment of observation, for the moment between major changes (requiring a decision on which changes are major), for the moments that an observation can be taken at all, or for even longer (or shorter) periods of time. Such decisions usually require a stakeholder´s point of view.
5.5 Lineage Lineage provides a description of the source material from which the data were derived, and the methods of derivation, including all transformations involved in the production process. It thus summarizes the history of a geographic data set. The dataset in the flooding example was collected between 2001 and 2002 using Landsatt 7 ETM+, multi-spectral images. To properly understand the images and the features present on them, a good understanding of the properties of this satellite is required. For example, the spectral features like the number of bands and their spectral properties reveals information about the different forms of land cover and thus towards an interpretation of what areas we can identify as flooded area and which as the ordinary lake and nearby ponds.
Fuzzy Methods in Image Mining
261
5.6 Semantic Accuracy Semantic accuracy is defined Salg´e in [18] as: ”The quality with which geographical objects are described in accordance with the selected model” . It includes concepts usually known as completeness, consistency, currency and attribute accuracy, whereas in addition it includes the ”ability of abstraction”. One could argue that a proper definition of the class A in the membership function μA (x) is of a critical issue here. In terms of communication this is an essential issue, i.e. does the user properly understand what the producer intended? Stein and De Beurs [30] used semantic accuracy in an approach to evaluate various map indices. As we discussed already above, the semantic accuracy is an important and wide concept that governs the choice of membership functions.
5.7 Temporal Quality Temporal quality refers to the aspects of time in the data set. Several sub-elements of temporal accuracy have been distinguished, some of which are based on measurements of error. We can distinguish • the accuracy of the time: is the measurement referring to a split instant (like an image), a daily average value (like a rainfall value), average values over multiple days (like maximul temperature), maximum values occurring over a period of time (as in remote sensing missing value replacement procedures), or even multiple years? • temporal validity (the validity in respect of time, also sometimes called currency): when comparing several images and other layers of information, large discrepancies may exist, where images of one year are considered to be comparable with data layers from previous moments. • temporal consistency (correctness of the order of events): the basic issue here is whether the order of events reflects the process of interest. • last update, i.e. is the spatial information timely for the fitness of use. • rate of change, i.e. answering the question how rapidly we may observe a change in the process of inteerst. In an urbanization such updates are required sometimes more often than for example in some geological studies. • temporal lapse (the average time between change on the nominal ground and its representation in data). All these issues may affect on the membership functions, either through its value, or through the set A: μA (x) may change into an approximate membership function μ˜ A˜ . A full description of these effects would require an extension of the membership function towards the space-time domain, i.e. by replacing μA (x) a function μA(t) (x,t) allowing both itself and the set A to be based in the space-time domain. In all, temporal quality can be treated in a similar way to spatial data quality, but differences are noted when considering process-based information.
262
A. Stein
5.8 Other Issue of Spatial Data Quality Fitness for use assessment is closely related to decision support. The aspects of usage like costs and accuracy [1] [9] are important for assessing the fitness for use of any data and of any classification in terms of fuzzy logic. Fitness for use is also one of the key elements in recent publications in the field of spatial data quality [32]. The communication aspects of fuzzy procedures remains a challenge. At this stage, it These aspects also occur in current metadata standards. Variability or homogeneity is defined as a textual and qualitative description of the expected or tested uniformity of quality parameters in a geographic data set. In fact, the quality can vary within space, time and attribute, whereas its documentation is often limited to one of these three domains. For example the mean squared error may present positional accuracy and it’s variability in the positional domain, but not in the attribute or temporal domain. Even a seemingly homogeneous flooded surface may show variability representing differences in depth, water quality, waves and atmospheric distortions between the object and the sensor.
6 From Point to Area - Scale In a fuzzy image mining context the issue of scale plays (at least) at two different instances: during object identification, i.e. concerning the resolution of the images, i.e. as the supply aspects of the information, and during communication to stakeholders, i.e. as the demand aspects of the information. Resolution is defined as the smallest discernable unit or the smallest unit represented. Small scale images usually have a lower resolution than large scale images. Although currently this relationship no longer automatically holds, scale is still often encountered as an indicator of resolution, and scale and resolution are used as indicators of the resolution component in spatial data quality. A high resolution may correspond to a high accuracy, but the relationship does not automatically hold. In a fuzzy image mining context the resolution of a dataset can simply be increased, thereby suggesting a higher precision or higher accuracy. The term resolution can also be used if error and accuracy are not an issue. It is then the smallest unit represented, which may indicate fuzziness of the data. Scale will be treated in more detail in section 6 below. On the demand side, care should be taken that the correct information is supplied. One of the main challenges here is to communicate the uncertainty. So far, our experience with most stakeholders allows us to communicate relatively crude and simple aspects of uncertainty like ’areas with a low probability of flooding’, ’areas with a high probability of flooding’ and ’intermediate areas’. Probability values are hard to communicate, in particular to the general public. Possibly, geostatistical methods can be of value on this context. We base ourselves on the consideration that inclusion of field and point information, either by observations or by deterministic models, facilitates interpretation of fuzzy and uncertain objects recorded by remote sensing images, and thus the various mining aspects of remotely sensed images. Standard geostatistical procedures
Fuzzy Methods in Image Mining
263
have been developed since the nineteen sixties, but despite many successes, their application has been difficult because of unrealistic assumptions and the requirement of using relatively large data sets. Recently, the advance of model-based geostatistics has been of a great help. Procedures developed here are able to deal with counts, with skewed distributions and allow an easy inclusion of available information. Basic ideas are expressed in [10]. The spatial data are modeled as X(x) = aX (x) + S(x) + εX (x),
(10)
where as before (x) denotes a location in space as the position of a pixel in the image X, also called the signal, aX (x) is a (deterministic) trend, S(x) a (spatially dependent) signal, with zero expectation, and εX (x) the spatially independent error part, also with zero expectation. Modelling is then done by allowing a range of different distributions to the signal. Thus, a link is created towards the general linear model. After georeferencing, grids obtained with geostatistics can simply be combined with remote sensing data, also available at the same spatial resolution. Model-based geostatistics allows to combine for example crop models with remote sensing information, or air quality models. A recent study [22] showed the possibilities of combining a MODIS image with a 250m resolution, the OPS model on the air quality applied in the Netherlands with a 5 km resolution, and point data on air quality. Further studies can be anticipated, in which progress in combining these sources of information are to be expected. Scale is one of the most critical aspects in spatial studies. It applies to the data representation, it applies to generalizing statements and it applies to matching data from different sources. In all these aspects, changes in scale are of a concern. In this paper I show a statistical approach for this purpose. Two general scale models are developed. The first model considers discrete spatial data, i.e. data collected on a lattice. The second model considers continuous spatial data collected according to an arbitrary sampling plan. Both models are able to deal with both upscaling and downscaling, i.e. changing from both a fine to a coarse resolution as well as the inverse. Changing scales appears to be sensitive to the shape of the correlation function and to the range of spatial dependence, and to a lesser extent to the prior variance parameters.
6.1 A Statistical Model for Scaling Spatial data are tied to locations xi , i = 1, ..., n in space and there observations are denoted by {y(xi )} , i = 1, ..., n. We suppose that they are generated by a random field Y (x) for x ∈ D ⊂ R∈ that is characterized by a deterministic trend μY (x) and random variation εY (x). The trend is assumed to be linear in a set of variables Z j , j = 1, ...p with parameters β j . Possible regressors are coordinates, and co-variables that determine the random field. Using the assumption of 2nd order stationarity allows us to estimate the parameters (Chil´es and Delfiner, 1999). Using spatial dependence requires estimation of either the covariance function K(hx ) or the variogram γ (hx )
264
A. Stein
for a distance hx between two locations on the image. This is commonly done by comparing pairs of data, separated by approximately the same distance, which then combined with an estimate of the mean, yield an estimate of the empirical covariˆ x ). Through the empirical covariance function estimates a covariance function K(h ance function model is fitted, for example the so called Mat´ern correlation function Kθ (hx ) =
1 |hx | θ2 |hx | · K θ2 2θ2 −1Γ (θ2 ) θ1 θ1
(11)
with Kθ2 the Bessel K-function of order θ2 . The parameter θ1 is a scale parameter, the√parameter θ2 > 0 is a smoothness parameter and the parameter θ1 equals θ1 /(2 θ2 ). For image mining studies the locations x j represent observations with a positive (> 0) support size. The support may be crisp (as in most approaches), but use of more complex point spread functions can be considered as well [28]. The support of such an area is denoted by δ and is represented by either a step function in the crisp case, or by a combination of two line spread functions as in [7]. The support specifies the extent of a data location. In addition we consider the resolution r. The resolution specifies the precision of every reference to the data. A change in scales is equivalent to a change in resolution from r1 to r2 . If r1 > r2 it is termed downscaling if r1 < r2 it is termed upscaling or aggregation.
6.2 Mathematical Modeling Changing scales requires two discretizations: one at the level where data are available, and one at the level where scaled data are required. We first consider data at the nodes of a fine-mazed grid T of size mr by mc . Pixel values at grid nodes, i.e. data with δ < r have as a support a single node, data representative for a particular support have as a support a contiguous cluster of neighboring nodes. Multivariate normal data X (.) are collected as point data x(t) with t ∈ T referring to a single node. A model for changes in scale is constructed as yi = Σ j Hi j x(t j ) + εi
(12)
where the matrix H of size mr · mc by n transforms the lattice X towards the observations lattice Y . The x(t j ) represent the nodes of X , with the index j taking these values in some predefined order, and the index i runs over Y also in some predefined order. A natural way to deal with lattice data is by means of independence and Markov modelling ([8]). A second model considers data x(s) with s referring to a point or a contiguous area (in R2 ). This continuous model is defined as y(s) =
H(t)x(t)dt.
(13)
To make distributional inferences, we suppose that the field of interest x has a prior distribution x ∼ N 0, (λ · G)−1 , where λ is a parameter of variance and G is a
Fuzzy Methods in Image Mining
265
Y
Z
x
Fig. 5 The support function H j (s) with a support δ
symmetric and positive definite matrix and we let 1/κ be the prior variance. Elements of G are obtained from a correlation function between the points represented by the elements of G, i.e. (G)i j = K(hs ). Therefore y ∼ (H(t), κ −1 I). The function H determines the scaling between data at various resolutions. To further investigate the continuous model, we may discretize the space by considering different functions H. One of the functions used defined earlier was the step functions, defined as 1 if t ∈ [x j1 − δ , x j1 + δ ) × [x j2 − δ , x j2 + δ ) Hi1 i2 (t) = 4δ 2 (14) 0 otherwise for square lattices. As an alternative for more advanced point spread functions, we may consider exponential functions, or other continuous functions, for example given in Fig. 5. Using these functions we may then proceed with the changes in scale by letting m be the number of cells for estimation and A be a functional on X , so that A X is a matrix. Then similar as the space X the space to scale to is discretized and again we can apply a similar approach as in equation 14: 2 m if s ∈ [ i1m−1 , im1 ) × [ i2m−1 , im2 ) Ai1 i2 (s) = (15) 0 otherwise corresponding to the resolution at which we intend to have for upscaled values. As A(x) = ( s∈S A j (s)x(s)ds)Jj−1 is the object of inference, the following matrix version for scaling emerges: let B = Cov(H X , H X ) be the n × n matrix with covariances of the process at the supply level, C = Cov(H X, A X ) be the m × n
266
A. Stein
matrix with covariances those between the supply level and the demand level, and D = Cov(A X, A X) be the m× m matrix with covariances at the demand level. Then a prediction equals C · B−1 y and the prediction error variance equals D − CB−1CT . A similar outcome will emerge if the smoother approach as in fig. 5 is used also at the level to scale to.
7 Concluding Remarks The aim of this chapter has been to present some general findings for image mining on uncertain objects. Several examples have been included, that show the dynamic behavior of uncertain spatial objects. More problems than before can be solved, and in particular uncertain objects can now be more easily handled than before. Further progress can be expected in the combination of geostatistics with deterministic models and remote sensing in abroad range of applications. Further, the monitoring of uncertain objects is still a challenge. Progress has been achieved recently, but monitoring usually depends upon a large set of specific object properties, such as the spatial and temporal resolution, the speed of change, detectability and possibilities for modelling. In particular in this field, there are still many issues that deserve to be explored. Image mining starts to be able to communicate uncertain objects towards stakeholders. We may anticipate some further progress in the future here. Also, quite some work has been done on the storage of, and handling with, uncertain objects. Still, several steps have to be made: tracking and monitoring of uncertain objects, making quantitative risk estimates, etc. A further expansion of probabilistic tools could be beneficial in this respect.
References 1. Aronoff, S.: Geographic information systems: a management perspective, p. 294. WDL, Ottawa (1991) 2. Burrough, P.A.: Fuzzy mathematical methods for soil survey and land evaluation. J. Soil Sci. 40, 477–492 (1989) 3. Burrough, P.A., Wilson, J.P., van Gaans, P.F.M., Hansen, A.J.: Fuzzy k-means classification of topo-climatic data as an aid to forest mapping in the Greater Yellowstone area, USA. Landscape Ecology 16, 523–546 (2001) 4. Brassel, K., Bucher, F., Stephan, E.M., Vckovski, A.: Completeness. In: Guptill, S.C., Morrison, J.L. (eds.) Elements of spatial data quality. International Cartographic Association, pp. 81–108. Elsevier Science, Tokyo (1995) 5. Chrisman, N.R.: The error component in spatial data. In: Maguire, D.J., Goodchild, M.F., Rhind, D.W. (eds.) Geographical Information Systems: Principles and Applications, vol. 1, pp. 165–174 (1991) 6. Congalton, R.G., Green, K.: Assessing the accuracy of remotely sensed data: principles and practices. Lewis publishers, Coca Raton (1999) 7. Cracknell, A.P.: Synergy in remote sensing - what’s in a pixel. International Journal of Remote Sensing 19(11), 2025–2047 (1998) 8. Cressie, N.A.C.: Statistics for spatial data. Wiley, New York (1991)
Fuzzy Methods in Image Mining
267
9. Devillers, R., Bedard, Y., Jeansoulin, R.: Multidimensional management of geospatial data quality information for its dynamic use within GIS. Photogrammetric Engineering & Remote Sensing 71(2), 205–215 (2005) 10. Diggle, P.J., Tawn, J.A., Moyeed, R.A.: Model-based geostatistics (with discussion). Applied Statistics 47, 299–350 (1998) 11. Dilo, A., De By, R., Stein, A.: A system of types and operators for handling vague spatial objects. International Journal of Geographical Information Science 21(4), 397–426 (2006) 12. Dubois, D., Jaulent, M.C.: A general approach to parameter evaluation in fuzzy digital pictures. Pattern Recognition Letters 6, 251–259 (1987) 13. Ebert, A., Kerle, N., Stein, A.: Object-oriented image analysis of urban social vulnerability. Natural Hazards (2008) 14. Epinat, V., Stein, A., De Jong, S.M., Bouma, J.: A wavelet characterization of highresolution NDVI patterns for precision agriculture. International Journal of Applied Earth Observation and Geoinformation 3, 121–132 (2001) 15. Fisher, P., Wood, J., Cheng, T.: Where is Helvellyn? Fuzziness of multi-scale landscape morphometry. Trans. Inst. Br. Geogr. 29, 106–128 (2004) 16. Glasbey, Horgan: Image analysis for the biological sciences. Wiley, Chichester (1995) 17. Goodchild, M.F.: Introduction to part I: Theoretical Models for Uncertain GIS. In: Shi, W., Fisher, P.F., Goodchild, M.F. (eds.) Spatial Data Quality. Taylor and Francis, London (2002) 18. Guptill, S.C., Morrison, J.L. (eds.): Elements of spatial data quality. International Cartographic Association. Elsevier Science, Tokyo (1995) 19. Hastie, T., Tibshirani, R., Friedman, J.: The elements of the statistical learning. In: Data mining, inference and prediction, p. 533. Springer, Heidelberg (2001) 20. Hendricks-Fransen, H.J.W.M., Van Eijnsbergen, A.C., Stein, A.: Use of spatial prediction techniques and fuzzy classification for mapping soil pollutants. Geoderma 77, 243–262 (1997) 21. ISO, ISO 19113:2002 Geographic information - Quality principles, p. 29 (2002) 22. Van de Kassteele, J., Koelemeijer, R.B.A., Dekkers, A.L.M., Schaap, M., Homan, C.D., Stein, A.: Statistical mapping of PM10 concentrations over Western Europe using secondary information from dispersion modeling and MODIS satellite observations. Stochastic Environmental Research and Risk Assessment (SERRA) 21(2), 183–194 (2006) 23. Lucieer, A., Stein, A., Fisher, P.: Multivariate Texture Segmentation of High-Resolution Remotely Sensed Imagery for Identification of Fuzzy Objects. International Journal of Remote Sensing 26, 2917–2936 (2005) 24. Ojala, T., Pietik¨ainen, M.: Unsupervised texture segmentation using feature distributions. Pattern Recognition 32, 477–486 (1999) 25. Van Oort, P.: Spatial data quality: from description to application. PhD thesis, Wageningen University (2006) 26. Quintano, C., Stein, A., Bijker, W., Fern´andez-Manso, A.: Subm. Pattern validation for data mining of burned area objects from MODIS satellite images. International Journal of Remote Sensing 27. Rajasekar, U., Stein, A., Bijker, W.: Image mining for modeling of forest fires from Meteosat images. IEEE Transactions on Geoscience and Remote Sensing 45(1), 246– 253 (2006) 28. Richards, J.A., Jia, X.: Remote Sensing digital image analysis, 3rd edn. Springer, Berlin (1999)
268
A. Stein
29. Rosenfeld, A.: The fuzzy geometry of image subsets. Pattern Recognition Letters 12, 311–317 (1984) 30. Stein, A., De Beurs, K.: Map indices to quantify semantic accuracy in segmented Landsat images. International Journal of Remote Sensing 26, 2937–2951 (2005) 31. Stein, A.: Modern developments in image mining. Science in China Series E: Technological Sciences 51 (Suppl. 1), 13–25 (2008) 32. Stein, A., Shi, W., Bijker, W. (eds.): Quality Aspects in Spatial Data Mining. CRC Press, Boca Raton (2008) 33. Stoyan, D., Stoyan, H.: Fractals, random shapes and point fields. John Wiley & Sons, Chichester (1994) 34. Van de Vlag, D., Stein, A.: Uncertainty Propagation in Hierarchical Classification using Fuzzy Decision Trees. IEEE Transactions on Geoscience and Remote Sensing 45(1), 237–245 (2006) 35. Yifru, M.Z.: Stereology for Data Mining. Unpublished MSc thesis, ITC International Institute for Geoinformation Science and Earth Observation, Enschede, the Netherlands (2006) 36. Zadeh, L.A.: Fuzzy Sets. Inform. a Contr. 8, 338–353 (1965)
Kriging and Epistemic Uncertainty: A Critical Discussion Kevin Loquin and Didier Dubois
Abstract. Geostatistics is a branch of statistics dealing with spatial phenomena modelled by random functions. In particular, it is assumed that, under some wellchosen simplifying hypotheses of stationarity, this probabilistic model, i.e. the random function describing spatial dependencies, can be completely assessed from the dataset by the experts. Kriging is a method for estimating or predicting the spatial phenomenon at non sampled locations from this estimated random function. In the usual kriging approach, the data are precise and the assessment of the random function is mostly made at a glance by the experts (i.e. geostatisticians) from a thorough descriptive analysis of the dataset. However, it seems more realistic to assume that spatial data is tainted with imprecision due to measurement errors and that information is lacking to properly assess a unique random function model. Thus, it would be natural to handle epistemic uncertainty appearing in both data specification and random function estimation steps of the kriging methodology. Epistemic uncertainty consists of some meta-knowledge about the lack of information on data precision or on the model variability. The aim of this paper is to discuss the pertinence of the usual random function approach to model uncertainty in geostatistics, to survey the already existing attempts to introduce epistemic uncertainty in geostatistics and to propose some perspectives for developing new tractable methods that may handle this kind of uncertainty. Keywords: geostatistics; kriging; variogram; random function; epistemic uncertainty; fuzzy subset; possibility theory. Kevin Loquin IRIT, Universit´e Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex 9 e-mail: [email protected] Didier Dubois IRIT, Universit´e Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex 9 e-mail: [email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 269–305. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
270
K. Loquin and D. Dubois
1 Introduction Geostatistics is the application of the formalism of random functions to the reconnaissance and estimation of natural phenomena.
This is how Georges Matheron [42] explains the term geostatistics in 1962 to describe a scientific approach to estimation problems in geology and mining. The development of geostatistics in the 1960s resulted from the industrial and economical need for a methodology to assess the recoverable reserves in mining deposit. Naturally, the necessity to take into account uncertainty in such methods appeared. That is the reason why statisticians were needed by geologists and mining industry to perform ore assessment consistently with the available information. Today, geostatistics is no longer restricted to this kind of application. It is applied in disciplines such as hydrology, meteorology, oceanography, geography, forestry, environmental monitoring, landscape ecology, agriculture or for ecosystem geographical and dynamic study. Underlying each geostatistical method is the notion of random function [12]. A random function describes a given spatial phenomenon over a domain. It consists of a set of random variables, each of which describes the phenomenon at some location of the domain. By analogy with a random process, which is a set of random variables indexed by time, a random function is a set of random variables indexed by locations. When little information is available about the spatial phenomenon, a random function is only specified by the set of means associated to its random variables over the domain and its covariance structure for all pairs of random variables induced by this random function. These parameters describe, respectively, the spatial trend and spatial dependencies of the underlying phenomenon. The dependence structural assumption underlying most of the geostatistical methods is based on the intuitive idea that, the closer are the regions of interest, the more similar is the phenomenon in these areas. In most geostatistical methods, the dependencies between the random variables are preferably described by a variogram instead of a covariance structure. The variogram depicts the variance of the increments of the quantity of interest as a function of the distance between sites. The spatial trend and spatial dependence structure of this model are commonly supposed to be of a given form (typically, linear for the trend and spherical, power exponential, rational quadratic for the covariance or variogram structure) with a small number of unknown parameters. From the specification of these moments, many methods can be derived in geostatistics. By far, kriging is the most popular one. Suppose a spatial phenomenon is partially observed at selected sites. The aim of kriging is to predict the phenomenon at unobserved sites. This is the problem of spatial estimation, sometimes called spatial prediction. Examples of spatial phenomena estimations are soil nutrient or pollutant concentrations over a field observed on a survey grid, hydrologic variables over an aquifer observed at well locations, and air quality measurements over an air basin observed at monitoring sites. The term kriging was coined by Matheron in honor of D.G. Krige who published an early account of this technique [39] with applications to estimation of a
Kriging and Epistemic Uncertainty: A Critical Discussion
271
mineral ore body. In its simplest form, a kriging estimate of the field at an unobserved location is an optimized linear combination of the data at observed locations. Formally, this method has close links to Wiener optimal linear filtering in the theory of random functions [57], spatial splines [22, 56] and generalized least squares estimation in a spatial context [11]. A full application of a kriging method by a geostatistician involves different steps: 1. An important structural analysis is performed: usual statistical tools like histograms, empirical cumulative distributions, can be used in conjunction with an analysis of the sample variogram. 2. In place of the sample variogram, that does not respect suitable mathematical properties, a theoretical variogram is chosen. The fitting of the theoretical variogram model to the sample variogram, informed by the structural analysis, is performed. 3. Finally, from this variogram specification (which is an estimate of the dependence structure of the model), the kriging estimate is computed at the location of interest by solving a system of linear equations of the least squares type. Kriging methods have been studied and applied extensively since 1970 and later on adapted, extended, and generalized. Georges Matheron, who founded the “Centre de G´eostatistiques et de Morphologie Math´ematique de l’Ecole des Mines de Paris” in Fontainebleau, proposed the first systematic approach to kriging [42]. Many of his students or collaborators followed his steps and worked on the development and dissemination of geostatistics worldwide. We can mention here Jean-Paul Chil`es, Pierre Delfiner [9] or Andr´e G. Journel [34] among others. All of them worked on extending, in many directions, the kriging methodology. However, very few scholars discussed the nature of the uncertainty that underlies the standard Matheronian geostatistics except G. Matheron himself [45] and even fewer considered alternative theories to probability theory that could more reliably handle epistemic uncertainty in geostatistics. Epistemic uncertainty is uncertainty that stems from a lack of knowledge, from insufficient available information, about a phenomenon. It is different from uncertainty due to the variability of the phenomenon. Typically, intervals or fuzzy sets are supposed to handle epistemic uncertainty, while probability distributions are supposed to properly quantify variability. More generally, imprecise probability theories, like possibility theory [21], belief functions [51] or imprecise previsions [55] are supposed to jointly handle those two kinds of uncertainty. Consider the didactic example of a dice toss where you have more or less information about the number of facets of the dice. When you know that a dice has 6 facets, you can easily evaluate the variability of the dice toss: 1 chance over 6 for each facet from 1 to 6; but now, suppose that you miss some information about the number of facets and that you just know that the dice has either 6 or 12 facets, you can not propose a unique model of variability of the dice toss, you can just propose two: in the first case, 1 chance over 6 for each facet from 1 to 6 and 0 chance for each facet from 7 to 12, in the second case, 1 chance over 12 for each facet from 1 to 12. This example enables the following simple conceptual
272
K. Loquin and D. Dubois
extrapolation: when facing a lack of knowledge or insufficient available information on the studied phenomenon, it is safer to work with a family of probability distributions, i.e. to work with sets of probability measures, to model uncertainty. Such models are generically called imprecise probability models. Bayesian methods address the problem by attaching prior probabilities to each potential model. However, this kind of uncertainty is of purely epistemic origin and using a single subjective probability to describe it is debatable, since it represents much more information than what is actually available. In our dice toss example, choosing a probability value for the occurrence of each possible model, even if we choose a uniform distribution, i.e. a probability of 1/2, for each possible model, is much more information than actually available about the occurrence of the possible models. Besides, it is not clear that subjective and objective probabilities can be multiplied as they represent information of a very different nature. This paper proposes a discussion of the standard approach to kriging in relation with the presence of epistemic uncertainty pervading the data or the choice of a variogram. In Section 2 basics of kriging theory are recalled and the underlying assumptions are discussed. Section 3 is mainly dedicated to the variogram or covariance function estimation, which is the major issue in kriging. The sources of epistemic uncertainty in kriging are discussed in Section 4. Then, a survey of some existing intervallist or fuzzy extensions of kriging is offered, respectively in Sections 5 and 6. Finally, in Section 7, a preliminary discussion of the role novel uncertainty theories could play in this topic is provided.
2 Some Basic Concepts in Probabilistic Geostatistics Geostatistics is commonly viewed as the application of the “Theory of Regionalized Variables” to the study of spatially distributed data. This theory is not new and borrows most of its models and tools from the concept of stationary random function and from techniques of generalized least-squares prediction. Let D be a compact subset of R and Z = {Z(x), x ∈ D} denote a real valued random function. A random function (or equally a random field) is made up of a set of random variables Z(x), for each x ∈ D. In other words, Z is a set of random variables Z(x) indexed by x. Each Z(x) takes its values in some real interval Γ ⊆ R. In this approach, Z is the probabilistic representation of a deterministic function z : D −→ Γ . The data consists of n observations Zn = {z(xi ), i = 1, . . . , n} understood as a realization of the n random variables {Z(xi ), i = 1, . . . , n} located at the n known distinct sampling positions {x1 , . . . , xn } in D. Zn is the only available objective information about Z on D.
2.1 Structuring Assumptions In geostatistics, the spatial dependence between the two random variables Z(x) and Z(x ), located at different positions x, x ∈ D, is considered an essential aspect of
Kriging and Epistemic Uncertainty: A Critical Discussion
273
the model. All geostatistical models strive to capture such spatial dependence, in order to provide information about the influence of the neighborhood of a point x on the random variable Z(x). Different structuring assumptions of a random field have been proposed. They mainly aim at making the model easy to use in practice. Results of geostatistical methods highly depend on the choice of those assumptions. 2.1.1
The Second-Order Stationary Model
A random function Z is said to be second-order stationary if any two random variables Z(x) and Z(x ) have equal mean values, and their covariance function only depends on the separation h = x − x . Formally, ∀x, x ∈ D, there exist a constant m ∈ R and a positive definite covariance function C : D → R, such that E[Z(x)] = m, (1) Cov(Z(x ), Z(x)) = E[Z(x) − m][Z(x ) − m] = C(x − x ) = C(h). Such a model implies that the variance of the random variables Z(x) is constant all over the domain D. Indeed, for any x ∈ D, V (Z(x)) = C(0). In the simplest case, the random function is supposed to be Gaussian, and the correlation function isotropic, i.e. not depending on the direction of vector x − x , so that h is a positive distance value h = x − x . A second-order stationary random function will be denoted by SRF in the rest of the paper. 2.1.2
The Intrinsic Model
This model is slightly more general than the previous one: it only assumes that the increments Yh (x) = Z(x + h) − Z(x), and not necessarily the random function Z itself, form a second-order stationary random function Yh , for every vector h. More precisely, for each location x ∈ D, Yh (x) is supposed to have a zero mean and a variance depending only on h and denoted by 2γ (h). In that case, Z is called an intrinsic random function, denoted by IRF in the rest of the paper, and characterized by: E[Yh (x)] = E[Z(x + h) − Z(x)] = 0, (2) V[Yh (x)] = V[Z(x + h) − Z(x)] = 2γ (h).
γ (h) is the so-called variogram. The variogram is a key concept of geostatistics. It is supposed to measure the dependence between locations, as a function of their distance. Every SRF is an IRF, the contrary is not true in general. Indeed, from any covariance function of an SRF, we can derive an associated variogram as: γ (h) = C(0) − C(h).
(3)
274
K. Loquin and D. Dubois
Indeed, 1 V[Z(x + h) − Z(x)] , 2 1 = V[Z(x + h)] + V[Z(x)] − 2Cov(Z(x + h), Z(x)) , 2 1 = 2C(0) − 2C(h) . 2
γ (h) =
In the opposite direction, the covariance function of an IRF is generally not of the form C(h) and cannot be derived from its variogram γ (h). Indeed, the inference from the second to the third line of the above derivation shows that equality (3) only holds if the variance of a random function is constant on the domain D. This is the case for an SRF but not for an IRF. For example, unbounded variograms have no associated covariance function. It does not mean that the covariance between Z(x) and Z(x+ h), when Z is an IRF, does not exist, but it is not, generally, a function of the separation h. The variogram is a more general structuring tool than the covariance function of the form C(h).
2.2 Simple Kriging Kriging boils down to spatially interpolating the data set Zn by means of a linear combination of the observed values at each measurement location. The interpolation weights depend on the interpolation location and the available data over a domain of interest. In such a method, for estimating the value of the random function at an unobserved site, the dependence structure of the random function is used. Once the variogram is estimated, the kriging equations are obtained by least squares minimization. Consider a second-order stationary random function Z, i.e. satisfying (1), informed by the data set Zn = {z(xi ), i = 1, . . . , n}. Any particular unknown value Z(x0 ), x0 ∈ D, is supposed to be estimated by a linear combination of the n collected data points {z(xi ), i = 1, . . . , n}. This estimation, denoted by z∗ (x0 ), is given by: n
z∗ (x0 ) = ∑ λi (x0 )z(xi ).
(4)
i=1
The computation of z∗ (x0 ) depends on the estimation of the kriging weights Λn (x0 ) = {λi (x0 ), i = 1, . . . , n} at location x0 . In the kriging paradigm, each weight λi (x0 ) corresponds to the influence of the value z(xi ) in the computation of z∗ (x0 ). More precisely, the value z∗ (x0 ) is the linear combination of the data set Zn = {z(xi ), i = 1, . . . , n}, weighted by the set of influence weights Λn (x0 ). Kriging weights are computed by solving a system of equations induced by a least squares optimization method. It is deduced from the minimization of the estimation error variance V[Z(x0 ) − Z ∗ (x0 )], where Z(x0 ) is the random variable underlying the SRF Z at location x0 and Z ∗ (x0 ) = ∑ni=1 λi (x0 )Z(xi ) is the “randomized” counterpart of the kriging estimate (4). The minimization of V[Z(x0 ) − Z ∗ (x0 )] is carried out
Kriging and Epistemic Uncertainty: A Critical Discussion
275
under the unbiasedness condition: E[Z(x0 )] = E[Z ∗ (x0 )]. This unbiasedness condition has a twofold consequence: first, it induces the following condition on the kriging weights: n
∑ λi (x0 ) = 1.
i=1
Indeed, due the stationarity of the mean (1), E[Z ∗ (x0 )] = E[Z(x0 )] ⇒ ∑ni=1 λi (x0 )m = m ⇒ ∑ni=1 λi (x0 ) = 1. Second, it implies that minimizing the variance can be rewritten in terms of a 2 mean squared error to minimize. Indeed, V[Z(x0 ) − Z ∗ (x0 )] = E Z(x0 ) − Z ∗ (x0 ) − 2 E[Z(x0 ) − Z ∗ (x0 )] and the second term is zero. Thus, 2 V[Z(x0 ) − Z ∗ (x0 )] = E Z(x0 ) − Z ∗ (x0 ) . Thus, the kriging problem comes down to finding the least squares estimate of Z at location x0 under the constraint: n
∑ λi (x0 ) = 1.
i=1
In order to obtain the kriging equations, the variance V[Z(x0 ) − Z ∗ (x0 )] is rewritten as follows: n
n
n
∑ ∑ λi (x0 )λ j (x0 )C(xi − x j ) − 2 ∑ λ j (x0 )C(x0 − x j ) +C(0),
i=1 j=1
(5)
j=1
where C(0) = V[Z(x0 )], so that kriging weights only depend on the covariance function. In order to minimize the above mean squared error, the derivative according to each kriging weight λi (x0 ) is computed: n ∂ V[Z(x0 ) − Z ∗ (x0 )] = 2 ∑ λ j (x0 )C(xi − x j ) − 2C(x0 − xi ), ∀i = 1, . . . , n. ∂ λi (x0 ) j=1
The equations providing the kriging weights are thus obtained by letting these partial derivatives vanish. The simple kriging equations are thus of the form: C(x0 − xi ) =
n
∑ λ j (x0 )C(xi − x j ), ∀i = 1, . . . , n.
(6)
j=1
The similarity between equations (4) and (6) is striking. The influence weights, in the simple kriging method, are the same weights as the ones that express, for all the locations {xi , i = 1, . . . , n}, the dependence between Z(x0 ) and Z(xi ), quantified by C(x0 − xi ), as the weighted average of the covariances C(xi − x j ) between Z(xi ) and the random variables {Z(x j ), j = 1, . . . , n}. It can be summarized by this remark: the influence weights of the kriging estimate are the same as the influence weights of the dependence evaluations. It is clear that some proper dependence assessments
276
K. Loquin and D. Dubois
should be the basis for any sensible interpolation of the observations. However, it does not seem to exist a direct intuitive interpretation why the observations should be combined (by means of (4)) just like the dependencies (by means of (6)). In the case of ordinary kriging, the random function is supposed to be an IRF, its mean is unknown and the covariance function is replaced by the variogram in the kriging equations (6). Moreover there is an additional Lagrange parameter to be found, needed to ensure the unbiasedness condition (see [9], Section 3.4).
3 Variogram or Covariance Function Estimation In kriging, the dependence information between observations are taken into account to interpolate the set of points {(xi , Z(xi )), i = 1, . . . , n}. The most popular tool that models these dependencies is the variogram and not the covariance function, because the covariance function estimation is biased by the mean. Indeed, if the mean is unknown, which is generally the case, it affects the covariance function estimation. Geostatisticians proposed different functional models of variogram to comply with the observations and with the physical characteristics of a spatial domain [9]. In the first part of this section, we present the characteristics of the most popular variogram models. Choosing one model or even combining some models to propose a new one is a subjective task requiring the geostatistician expertise and some prior descriptive analysis of the dataset Zn . The data is explicitly used only when a regression analysis is performed to fit the variogram model parameters to the empirical variogram. An empirical variogram, i.e. a variogram explicitly obtained from the dataset Zn and not by some regression on a functional model, is called a sample variogram in the literature. In this section, we will see that a sample variogram does not fulfil (in its general expression) the conditional negative definiteness requirement imposed on a variogram model. We will briefly discuss this point, which explains why a sample variogram is never used by geostatisticians to carry out an interpolation by kriging.
3.1 Theoretical Models of Variogram or Covariance Functions For the sake of clarity, we restrict this presentation of variogram models to isotropic models. An isotropic variogram is invariant to the direction of the separation x − x . Thus an isotropic variogram is a function γ (h), defined for h ≥ 0 ∈ R such that h = x − x . Under the isotropy assumption, the variogram models have the following common behavior: they increase with h and, for most models, when h −→ ∞, they stabilize at a certain level. A non-stabilized variogram models a phenomenon whose variability has no limit at large distances. If, conversely, the variogram converges to a limiting value called the sill, it means that there is a distance, called the range, beyond which Z(x) and Z(x + h) are uncorrelated. In some sense, the range gives some meaning to the concept of area of influence. Another parameter of a variogram that can be physically interpreted is the nugget effect: it is the value taken by the
Kriging and Epistemic Uncertainty: A Critical Discussion
277
Fig. 1 Basic parameters of standard theoretical variogram models
variogram when h tends to 0. A discontinuity at the origin is generally due to geological discontinuities, measurement noise or positioning errors. Figure 1 shows a standard theoretical variogram graph where the sill, the range and the nugget effect are represented. Beyond this standard shape, other physical phenomena can be modelled in a variogram. For instance, the hole effect, understood as the tendency for high values to be surrounded by low values, is modelled by bumps on the variogram (or holes in the covariance function). Periodicity, which is a special case of hole effect can appear in the variogram. Explicit formulations of many popular variogram or covariance function models can be found in [9]. Usual theoretical variogram models may fail to perfectly match the dependence structure corresponding to the geostatistician’s physical intuition and sample variogram analysis. Generally, a linear combination of variograms is used, in order to obtain a more satisfactory fitting of the theoretical variogram with both the sample variogram and the geostatistician’s intuition. Such a variogram is obtained by :
γ (h) =
J
∑ γ j (h).
j=1
The main reason is that such linear combinations preserve the negative definiteness conditions requested for variograms, as seen in the next subsection. Moreover, when the variogram changes with the direction of the separation x− x , it is said to be anisotropic. Some particular anisotropic variograms can be derived from marginal models. The most simple procedure to construct an anisotropic variogram on R is to compute the product of its marginal variograms, assuming the separability of the anisotropic variogram.
278
K. Loquin and D. Dubois
3.2 Definiteness Properties of Covariance and Variogram Functions Mathematically, variograms, covariance functions are strongly constrained. Being extensions of the variance, some of its properties are propagated to mathematical definitions of covariance and variogram. In particular, the positive definiteness of the covariance function and similarly the conditional negative definiteness of the variogram are inherited from the positivity of variances. The variance of linear combinations of random variables {Z(xi ), i = 1, . . . , p}, p given by ∑i=1 μi Z(xi ), could become negative if the chosen covariance function model were not positive definite or similarly if the chosen variogram model were not conditionally negative definite [2]. When considering an SRF, the variance of linear combinations of random variables {Z(xi ), i = 1, . . . , p} is expressed, in terms of the covariance function of the form C(h), by p p p V ∑ μi Z(xi ) = ∑ ∑ μi μ jC(x j − xi ). (7) i=1
i=1 j=1
Since the variance is positive, the covariance function C should be positive definite in the sense of the following definition: Definition 1 (Positive definite function). A real function C(h), defined for any h ∈ R , is positive definite if, for any natural integer p, any set of real -tuples {xi , i = 1, . . . , p} and any real coefficients {μi , i = 1, . . . , p}, p
p
∑ ∑ μi μ jC(x j − xi) ≥ 0.
i=1 j=1
Now in the case of a general IRF, i.e. an IRF with no covariance function (1) of the form C(h), it can be shown [9] that the variance of any linear combination of p μi (Z(xi ) − Z(x0 )) can be expressed, under the increments of random variables ∑i=1 p condition that ∑i=1 μi = 0, by V
p p p μ (Z(x ) − Z(x )) = V μ Z(x ) = − i i i i 0 ∑ ∑ ∑ ∑ μi μ j γ (x j − xi ). p
i=1
i=1
(8)
i=1 j=1
p μi = 0, expressions (7) and Let us remark that for an SRF, under the condition ∑i=1 (8) can easily be switched by means of relation (3). Since the variance is positive, the variogram γ should be conditionally negative definite in the sense of the following definition:
Definition 2 (Conditionally negative definite function). A function γ (h), defined for any h ∈ R , is conditionally negative definite if, for any choice of p, {xi , i = p 1, . . . , p} and {μi , i = 1, . . . , p}, conditionally to the fact that ∑i=1 μi = 0,
Kriging and Epistemic Uncertainty: A Critical Discussion p
279
p
∑ ∑ μi μ j γ (x j − xi) ≤ 0.
i=1 j=1
From expression (7), the covariance function of any SRF is necessarily positive definite. Moreover, it can be shown that, from any positive covariance function, there exists a Gaussian random function having this covariance function. But some types of covariance functions are incompatible with some classes of random functions [1]. Note that the same problem holds for variograms and conditional negative definiteness for IRF. This problem, which is not solved yet, was dubbed “internal consistency of models” by Matheron [46, 47]. Since the covariance function of any SRF is necessarily positive definite, it means that any function that is not positive definite (resp. conditionally negative definite) cannot be the covariance of an SRF (resp. the variogram of an IRF).
3.3 Why Not Use the Sample Variogram ? The estimation of spatial dependencies by means of the variogram or the covariance function is the key to any kriging method. The intuition underlying spatial dependencies is that points x ∼ y that are close together should have close values Z(x) ∼ Z(y) because the physical conditions are similar at those locations. In order to make this idea more concrete, it is interesting to plot the increments |z(xi ) − z(x j )|, quantifying the closeness z(xi ) ∼ z(x j ), as a function of the distance ri j = xi − x j , that measures the closeness xi ∼ x j . The variogram cloud is among the most popular visualization tools used by the geostatisticians. It plots the empirical distances ri j on the x-axis against the halved 2 squared increments vi j = 12 z(xi ) − z(x j ) on the y-axis. The choice of the halved squared increments is due to the definition of the variogram of an IRF (2). Figure 2 shows the variogram cloud (on the left) obtained with observations taken from the Jura dataset available on the website http://goovaerts.pierre. googlepages.com/. This dataset is a benchmark used all along Goovaerts book [28]. This dataset presents concentrations of seven pollutants (cadmium, cobalt, chromium, copper, nickel, lead and zinc) measured in the French Jura region. On Figure 2, the distance is the Euclidean distance in R2 and the variogram cloud has been computed from cadmium concentrations at 100 locations. From the variogram cloud it is possible to extract the sample variogram. It is obtained by computing the mean value of the halved squared increments v in classes of distance. The sample variogram can be defined by:
γˆ(h) =
2 1 z(xi ) − z(x j ) , ∑ h 2|VΔ | i, j∈V h Δ
where VΔh is the set of pairs of locations such that xi − x j ∈ [h − Δ , h + Δ ]. |VΔh | is the cardinality of VΔh , i.e. the number of pairs in VΔh .
280
K. Loquin and D. Dubois
Cadmium Concentration sample variogram
8
2.0
Cadmium Concentration variogram cloud q q
q q
q
q
q q q q q q
0
2
4
qq
q qq q
q q qq
q q q
q q q q
q qq
qq q q q
q q
q q q q q q
q
q
q q q q
q
q q q q qq qq
q q q qq q q q q q q q
q
q q
q
q
q q q q q
q
q
q q
q q q q q q q q q q q q q q q qq
q
q
q
q q q q qq q q q q q qq q q q q q q q q q
q
q
q
q q
qq qq q
q q q
q
q q
q q
q
q
qqq q q q qq qq
q q
q
q
q q
q q q q qq q
q q q
q
q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q qqq qq qq qq q qq qq q q q q qq q q q q qq q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q qqqq q q q qq q q qq q q q q qqq q qq q q q q q q q q q q q q q q q q q qq qq q qq q q qq q q q q q q q q qq q q q qq q q qq q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q qqq q q q qq q q q q q q q q qq q q q q q q qq q qq q q q qq q q q q q q qq q q q qqqq q q q q q qq q q qq qq q q q qq q q q q q qq q qq q q q q q q q q q q q qq qq q qqq q q q q q q q qq q q q q qq q q qq q q q qq qq q q q q qq q q q q q qq qq qq q q q q q qq qq q q q q q q qq qqq q q q q qq q qq q q q q q qq q q q q qq q q qqq q q q q qq q qq q q q q qq q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q qq qq qqq qqqq q q q q qq q q qq q q q qq qqq qqq q q q qq qqqq q q q q qqq q q qq qq qq q q qq q qqq qqq q qqq qqq qqqq q q q q qqq q qq q q q qqqq qq q qqqq qqq q qq q q qq q q q qq q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q qq q qqq q q qq q q q qq qq qqqqq qqqq q q qqqqqq q qq qqq q qq q qq qqqq q qq q q q qq qqq qqq q q q q q qqq qqqqqq qqq qqq q qq qq q qq qqq q qq q qq q q qqq q qq qq q q q qq q qq q q q qq q qqq q qq q qqq qq qq q q q q qq q q q q q q q q q q qqq qq q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q qqqq qq qqqqqqq q q qq qqq q q q q q q qqq qq q qq qq qq q q qqq q qqq qqqqq qqq qq qq q qq qqqqqq q q q q q q q q q q qq qqqqq qqq qqqqq qqqq q qqq qq q q q qq qq q q q qqq qq qqqqqq qq qq q qqq qq q q q qqqqqqqqqq qq q qqq q q qq qqqqqqq qq qq qqq q qq qq q q qqqq qq q q q q qqqqqqqqq qq q qqqq qq q qq qq q q q qqqq qqq q qqqq qqq qq qq qqqqqq qq q qqq q q q qqqq q qqq qqq q qq q q q q qq qqqqqqq qqqq q qq qqq q qqqqq qqq q qqqqq qqqq qqqq q q qqqq q qqqq qq q q qqqq q qq q q q qq q qq qq qqq q qq qqq q q q q qqqq q qq qqqq q qqq q q q q q qq qqqq q qqq qqq qqq q q q qq qq qqq qqq q qqqqqq q qqqqq qqqqq qqq qqqq qqq qqqq q qqqqq qqqqqqq qqqq qq qq qqqq q qqqq q q qq qqq qqq q q qq q q q qqqq q qq qqq qqq qqqq qqq qq q q qq q q qq qqqq qqqqqqqqq q q qq q qqq qqqqq q q qq qqq q qqqqq qqqqqq q q q qqqqqq q qqq q qq q q qqqq qqq q qqqqqqqq qq qqq q qqq qqq qq qq qqq qqqq q q qqqqqq q qqq qq qqq q q qq q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq qqq qqqqq q qqq q qqqq qq q q q q qq qq qq q q q qqqq q q qqq qqqq qq qqqqqqq q q qqq qq q q qqq qqqqq qqqq qq q qqq q qq qq q q q q q qq q qqq q q qqqqq qqqqqqq qqq q q qqqqq q qqqq qqq q q q q qqq qq qq qqq qqq q q q qqqqqq qqqq qq qq qqq q q qqqqqqq q q qqqqqqqqqq q qqqqq qq q qqq qq qq qq q qqqqqq q qqq qqqq q q qqqqq q qq q qqqq qqqqqq q q q qqqq qqq qqq qqqqqq qqq qqq q qq qq qq qq qqqq q qq qqq q q qqq qq q qqq qq qqq qqqqq qqqqq q qqqq q q qq qqqqqq q qq q qqqqq qqqqq qq q q q q q qq qq qqqq q qqqqq q q q qqqq q qq qqq qq q qq qqqqqq q qq q qqq qqq q qq q qq qqqq q qqqq q q qqqqq q qqq q qq qqqqqqq qqqq qqqqq q q qq q q qq q qqq q q qq qq qq q qq q qq q q qq qqqqq qq q q qq q q qq qqq qqqq q qqq q q qq q q q q qqqqqq qqqqqqq qqqq q qqqq qq qq qqq qqq qq q q qqq q q q q qq q q q q q qq q q qqq qqqq q q q q qq q q q q qq qqqqq qq q q q q q q qqq qq q qq qq qq q qq q q qq qqqqqq q q q q qqqqq qq q q q qqq qq qq q qq qq q qq q q q qq qqqq qqqq qq qq qqq q q qq q qqq q qqqqqq q q qqq q qq q qq q qq qqq qq q qq q qq qq q qq q qq q qqq q q qq qqq q qq qq q qqq q qq q q q qqqqq qq qqq qqq qq q qqqq q qq q q q q q q qq q q q q q qq qqqq q q qqq q qq q q q q qq qqqq qq q qqqqqqq qq qq qq q qq qq qq q qq q q q q q q q q qq q q qq qqq qq q q q q qqq q q qq qqqq qq q q q qqq q qq q q q q q q q q q qqq qq q qqq qqqq qq q q qq qqq qq q q q qq q q q q q qqq q q q qq qqq q q q q q q q q qq q q qqqqq qqq qqq q qqq q q qq q qq qqq qq q qq q q q q qq q q qqq qqqq qqq q q q qqq q q q qq q qq q q qq q q q q q q q q q q q q q q q qq qqq q q qq q q q q q q q q q qq q q q q q q q q qq q qq q q qq qq q q qq qq q q qqqq q q q q q qq qqqq qq qq qqq q q q qq qqq q q qq q q q q q qq q q q q qq q q q q q q q qqq qq qq q q q q q q q q qq q qq q q qq q qq q q qqq q q q q q q q q q q q q qq q q q q q qq q q q q qq q q qq q q qq q q qq qq q q q q q q q q q q q q q qq q q q q q q q qq q q qqq q qq q q q q q q q q q q qq q qq q q qq qq q q q q q qq q q q q q q q q q q qqq q q qq q q q qq q qq q qq q qq q q q q q q q q q q qqq q q q q q q q q q q q q q qq q qq qqq q q q q qq qq q q qq q q q q q qq q q q q q q qq q q q q q q q qq q q q q q qq q qq qq qqqqq q qqqqq qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq qq q q q q q qq q qq q q q qq q q qq qq q q q q qq q q q qq q q q q qq q q q q q qqq q q q q qq q q q q qq qqq qq qqqqq q q q qqqq q q q q q qq q q q q q q q q qq q q q q qqq q q q q q q q qq q q q q q q q qq q q qq q qq qq q qq q q q qqq q q q q q q q q q q q q qq q qqq q q q q q qq q q q qq qq q q qq q q q q qq q q qq q q q qq q q qq qq qq q q qq q qq q q q qqq q q q q q qq qqq q q q qqqq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qqq qq qq qq q q q q q q q q q q q q q q q q q q q qq qq q q q qqq q q q q q qq q q
0
q
q
q q
q q
qq q
q
q
qq q
q q
q q q q q q
q
q q q q q q qq q qq q q q
q q q q q
q q
q q qq q q
q q q q
q
q q
q
q
q q q q q
1
2
3
4
q q q q
q
q
q
0.5
q qq
qq
qq q q q q qq qq q q q q q q qq q q q
q
5
distance r (km)
0.0
6
q q q
q q
qq q
q qq
q
qq qq
q q
1.5
q q
q q q q
1.0
q q
q q
0
1
2
3
4
5
distance r (km)
Fig. 2 Variogram cloud and sample variogram
Figure 2 shows the sample variogram associated to the plotted cloud variogram on the right part. It has been computed for 10 sampling locations and for a class radius Δ equal to half the sampling distance. As seen in the previous sections, geostatistics rely on sophisticated statistical models, but, in practice, geostatisticians eventually quantify these dependencies by means of a subjectively chosen theoretical variogram. Why don’t they try to use the empirical variogram in order to quantify the influence of the neighborhood of a point on the value at this point ? It turns out that these empirical tools (variogram cloud or sample variogram) generally do not fulfil the conditional negative definite requirement. In order to overcome this difficulty, two methods are generally considered: either an automated fitting (by means of a regression analysis on the parameters of a variogram model) or manual fitting made at a glance. Empirical variograms are considered by the geostatisticians only as visualization or preliminary guiding tools.
3.4 Sensitivity of Kriging to Variogram Parameters The kriging parameters, i.e. range, sill and nugget effect, affect the result of kriging in various ways. For one thing, while the kriging weights sum to 1, they are not necessarily all positive. In particular, the choice of the range of the variogram will affect the sign of the kriging weights. In figures 3 and 4, we consider a set of data points that form two significantly separated clusters: there are many data-points between abcissae 0 and 3.5 with an increasing trend, as well as between 11.5 and 15 with a decreasing trend, but none between 3.5 and 11.5. This configuration suggests an increasing function in one cluster and a decreasing function in the other one. Figure 3 is the result of kriging
Kriging and Epistemic Uncertainty: A Critical Discussion
281
7
6
5
4
3
2
1
0
−1 0 10 20 10
15
5
0
Fig. 3 Kriging with a short-ranged variogram
10
9
8
7
6
5
4
3
2
1
0 0 10 20 15
10
5
0
Fig. 4 Kriging with a long-ranged variogram
with a short-ranged (a range equal to 7) variogram. The area of influence of such a variogram is thus limited to the area in each cluster of points. Figure 4 is the result of kriging with a long-ranged (a range equal to 12) variogram, thus covering the two clusters. In the first case, the range of the variogram does not cover the gap between the clusters. The kriged values then get closer to the mean value of the data for locations far away from these clusters. This effect creates a hollow between the clusters at the center of the gap between them. The kriging weights are then all positive. However, the general trend of the data suggests a hill, which is accounted for by the results of kriging with a long-ranged variogram (Figure 4). It can only be achieved through negative kriging weights between the clusters of data points. A positive nugget effect may prevent the kriged surface from coinciding with the data points. The effect of changing the sill is less significant. Nevertheless, it is clear
282
K. Loquin and D. Dubois
that the choice of the theoretical variogram parameters has a non-negligible impact on the kriged surface. This small example is a usual geostatistical case. It suggests that the theoretical variogram parameters elicitation greatly relies on the geostatistician knowledge about the studied phenomenom. Indeed, the estimation of variogram parameters from the sole data visualization tools (cloud and sample variogram), which is not presented in this article, would lead any geostatistician, not informed by some geological knowledge, to choose a range equal to 3. Such a short range leads to a resulting kriging map with a dale that reaches a plateau with a level equal to the mean of the data values. It is a judicious parameterization if the studied quantity really has a dale, but this choice, guided by the visualization tools, is not correct if the studied spatial phenomenon is known to be bell-shaped.
4 Epistemic Uncertainty in Kriging The traditional kriging methodology is idealized in the sense that it assumes more information than actually available. The stochastic environment of the kriging approach is in some sense too heavy compared to the actual available data, which are scarce. Indeed, the actual data consists of a single realization of the presupposed random function. This issue has been addressed in critiques of the usual kriging methodology. In the kriging estimation procedure, epistemic uncertainty clearly lies in two places of the process: the knowledge of data points and the choice of the mathematical variogram. One source of global uncertainty is the lack of knowledge on the ideal variogram that is used in all the estimation locations of a kriging application. Such uncertainty is global, in the sense that it affects the random function model over the whole kriging domain. This kind of global uncertainty, to which Bayesian approaches can be applied, contrasts with some local uncertainty that may pervade the observations. In the usual approaches (Bayesian or not), these observations are supposed to be perfect, because they are modelled as precise values. However in the 1980’s, some authors were concerned by the fact that epistemic uncertainty also pervades the available data, which are then modelled by means of intervals or fuzzy intervals. Besides, the impact of epistemic uncertainty on the kriged surface should not be confused with the measure of precision obtained by the kriging variance V[Z(x0 ) − Z ∗ (x0 )]. This measure of precision just reflects the lack of statistical validity of kriging estimates at locations far from the data, under the assumption that the real spatial phenomenon is faithfully captured by a random function (which is not the case). The fact that the kriging variance does not depend on the measured data in a direct way makes it totally inappropriate to account for epistemic uncertainty on measurements. Moreover epistemic uncertainty on variogram parameters leads to uncertainty about the kriging variance itself.
Kriging and Epistemic Uncertainty: A Critical Discussion
283
4.1 Imprecision in the Variogram Sample variograms (see for instance Figure 2) are generally far from the ideal theoretical variogram models (see for instance Figure 1) fulfilling the conditional negative definite condition. Whether the fitting is automatic (by means of a regression analysis on the parameters of a model) or the fitting is manual and made at a glance, an important epistemic transfer can be noticed. Indeed, whatever the method, the geostatistician tries to summarize his knowledge about the studied field and some objective information (the sample variogram) by means of a unique subjectively chosen dependence model, the theoretical variogram. As pointed out by A. G. Journel [36]: Any serious practitioner of geostatistics would expect to spend a good half of his or her time looking at all faces of a data set, relating them to various geological interpretations, prior to any kriging.
Except in [5, 6], this fundamental step of the kriging method is never quite discussed in terms of the epistemic uncertainty it creates. Intuitively, however, there is a lack of information to properly assess a single variogram. This lack of information is a source of epistemic uncertainty, by definition [32]. As the variogram model plays a critical role in the calculation of the reliability of a kriging estimation, the epistemic uncertainty on the theoretical variogram fit should not be neglected. Forgetting about epistemic uncertainty in the variogram parameters, as propagated to the kriging estimate, may result in underestimated risks and a false confidence in the results.
4.2 Kriging in the Bayesian Framework The Bayesian kriging approach is supposed to handle this subjective uncertainty about features of the theoretical variogram, as known by experts. In practice, the structural (random function) model is not exactly known beforehand and is usually estimated from the very same data from which the predictions are made. The aim of Bayesian kriging is to incorporate epistemic uncertainty in the model estimation and thus in the associated prediction. In Omre [48], the user has a guess on the non stationary random function Z. This guess is given by a random function Y on the domain D whose moments are known and given by, ∀x, x + h ∈ D, E[Y (x)] = mY , (9) Cov[Y (x),Y (x + h)] = CY (h). From the knowledge of CY (h), the variogram can also be used thanks to the relation γY (h) = CY (0) − CY (h). The random function Y , and more precisely mY and the functions CY and γY form the available prior subjective information about the random function Z whose value must be predicted at location x0 . In the Bayesian updating procedure, how
284
K. Loquin and D. Dubois
uncertainty about Y is transferred to Z is modelled by the law that handles the uncertainty on Z conditionally to Y , i.e. the law of Z|Y . In this context, the covariance function or the variogram of the updating law have to be estimated. From standard works on linear Bayesian statistics [31], Omre extracts the Bayes updating rules for the bivariate characteristic functions of random functions, namely the variogram and the covariance function. Bayes linear methods [27] are based on expectation and covariance structures, rather than on distributional assumptions. The Bayesian updating rules that enable to compute the posterior uncertainty on Z from the prior uncertainty on Y is given by: ⎧ ⎪ ⎨ mZ = a0 + mY , CZ (h) = CZ|Y (h) + CY (h), ⎪ ⎩ γZ (h) = γZ|Y (h) + γY (h), where a0 is an unknown constant, which is (according to Omre) introduced to make the guess less sensitive to the actual level specified, i.e. less sensitive to the assessment of mY . From this updating procedure of the moments, one can retrieve the moments of Z needed for kriging as in the usual kriging approach. What is missing in this procedure is the covariance function (or the variogram) of Z|Y . Omre proposes a usual fitting procedure to estimate these functions. Eventually, the Bayesian kriging system is given by CZ (x0 − xi ) =
n
∑ λ j (x0 )CZ (xi − x j ), ∀i = 1, . . . , n.
j=1
Another approach, more in the Bayesian tradition of distributional guess on the model parameters (mean and variogram parameters) is proposed in the paper of Handcock and Stein [29]. It shows that ordinary kriging with a Gaussian stationary random function and unknown mean m can be interpreted in terms of Bayesian analysis with a prior distribution locally uniform on the mean parameter m. More generally, they propose a systematic Bayesian analysis of the kriging methodology for different mean and variogram parametric models. Several authors worked on this approach [14, 26, 8]. Any Bayesian approach is supposed to take into account epistemic uncertainty in the sense that it is supposed to handle the lack of knowledge on the model parameters by assigning a subjective prior probability distribution to these parameters. This subjective prior is supposed to be incorrect: this is a guess which is then corrected by objective information by means of the Bayes theorem. The term “guess” and the admission of an erroneous prior is supposed to be the way epistemic uncertainty is handled in the Bayesian framework. However, this procedure seems to be more an objective method which needs to be initialised by a judicious starting guess in order to converge, when the number of objective observations is high, to the correct variability model.
Kriging and Epistemic Uncertainty: A Critical Discussion
285
To our view, a unique prior distribution, even if claimed to be non informative in the case of plain ignorance, is not the proper representation to capture epistemic uncertainty on the model. A unique prior models the supposedly known variability of the considered parameter, not ignorance about it. In fact it is not clear that such parameters are subject to variability. As a more consistent approach, a robust Bayesian analysis of the kriging could be performed. Robust Bayesian analysis consists of working with a family of priors in order to lay bare the sensitivity of estimators to epistemic uncertainty on the model’s parameters [7, 50].
4.3 Imprecision in the Data Because available information can be of various types and qualities, ranging from measurement data to human geological experience, the treatment of uncertainty in data should reflect this diversity of origins. Moreover, there is only one observation made at each location, and this value is in essence deterministic. However one may challenge the precision or accurateness of such measurements. Especially, geological measurements are often highly imprecise. Let us take a simple example: the measurement of permeability in an aquifer. It results from the interpretation of a pumping test: when pumping water from a well, the water level will decrease in that well and also in neighboring wells. The local permeability is obtained by fitting theoretical draw-down curves to the experimental ones. There is obviously some imprecision in such fitting that is based on approximations to the reality (e.g., homogeneous medium). Epistemic uncertainty due to measurement imperfections should pervade the measured permeability data. For the inexact (imprecise) information resulting from unique assessments of deterministic values, a nonfrequentist or subjective approach reflecting imprecision could be used. Epistemic uncertainty about such deterministic numerical values naturally takes the form of intervals. Asserting z(x) ∈ [a, b] comes down to claiming that the actual value of a quantity z(x) lies between a and b. Note that while z(x) is an objective quantity, the nature of the interval [a, b] is epistemic, it represents expert knowledge about z(x) and has no existence per se. The interval [a, b] is a set of mutually exclusive values one of which is the right one: the natural interpretation of the interval is that z(x) ∈ [a, b] is considered impossible. A fuzzy subset F [21, 58] is a richer representation of the available knowledge in the sense that the membership degree F(r) is a gradual estimation of the conformity of the value z(x) = r to the expert knowledge. In most approaches, fuzzy sets are representations of knowledge about underlying precise data. The membership grade F(r) is interpreted as a degree of possibility of z(x) = r according to the expert [59]. In this setting, membership functions are interpreted as possibility distributions that handle epistemic uncertainty due to imprecision on the data. Possibility distributions can often be viewed as nested sets of confidence intervals [18]. Let Fα = {r ∈ R : F(r) ≥ α } be called an α -cut. F is called a fuzzy interval if and only if ∀0 < α ≤ 1, Fα is an interval. When α = 1, the core F1 is called the mode of F if reduced to a singleton. If the membership function is continuous, the degree of certainty of z(x) ∈ Fα is equal to 1 − α , in the sense that any value outside
286
K. Loquin and D. Dubois
Fα has possibility degree at most α . So it is sure that z(x) ∈ S(F) = limα →0 Fα (this is the support of F), while there is no certainty that the most plausible values in the core F1 contain the actual value. Note that the membership function can be retrieved from its α -cuts, by means of the relation: F(r) = sup α . r∈Fα
Therefore, suppose that the available knowledge supplied by an expert comes in the form of nested confidence intervals {Ik , k = 1, . . . , K} such that I1 ⊂ I2 ⊂ · · · ⊂ IK with increasing confidence levels ck > ck if k > k , the possibility distribution defined by F(r) = min max(1 − ck , Ik (r)), k=1,...,K
is a faithful representation of the supplied information. Viewing a possibility degree as an upper probability bound [55], F is an encoding of the probability family {P : P(Ik ) ≥ ck }. If cK = 1 then the support of this fuzzy interval is IK . If an expert only provides a mode c and a support [a, b], it makes sense to represent this information as the triangular fuzzy interval with mode c and support [a, b] [19]. Indeed F then encodes a family of (subjective) probability distributions containing all the unimodal ones with mode c and support included in [a, b].
5 Intervallist Kriging Approaches This section and the next one refer to works done in the 1980’s. Even if some of them can be considered obsolete, their interest lies in their being early attempts to handle some form of epistemic uncertainty in geostatistics. While some of the proposed procedures look questionable, it is useful to understand their merits and limitations in order to avoid pitfalls and propose a well-founded methodology to that effect. Since then, it seems that virtually no new approaches have been proposed in the recent past, even if some of the problems posed more than 20 years ago have now received more efficient solutions, for instance the solving of interval problems via Gibbs sampling [24].
5.1 The Quadratic Programming Approach In [23, 38], the authors propose to estimate z∗ (x0 ), from imprecise information available as a set of constraints on the observations. Such constraints can also be seen as inequality-type data, i.e. the observation located at the position xi is of the form z(xi ) ≥ a(xi ) and/or z(xi ) ≤ b(xi ). This approach also assumes a global constraint which is that whatever the position x0 ∈ D, the kriging estimate z∗ (x0 ) is bounded, which can be translated by ∀x0 ∈ D, z∗ (x0 ) ∈ [a, b]. For instance any ore mineral grade is necessary a value within [0, 100%].
(10)
Kriging and Epistemic Uncertainty: A Critical Discussion
287
Any kind of data, i.e. precise or inequality-type, can always be expressed in terms of an interval constraint: z(xi ) ∈ [a(xi ), b(xi )], ∀i = 1, . . . , n.
(11)
Indeed precise data can be modelled by constrained data (11) with equal upper and lower bound and an inequality-type data z(xi ) ≥ a(xi ) (resp. z(xi ) ≤ b(xi )) can be expressed as [a(xi ), b] (resp. [a, b(xi )]). Thus the data set is now given by Z¯n = {¯z(xi ) = [a(xi ), b(xi )], i = 1, . . . , n}. As mentioned by A. Journel [37], this formulation of the problem allows to cope with the recurring question of the positiveness of the kriging weights, which the basic kriging approaches cannot ensure. Negative weights are generally seen as being “evil”, due to the fact that the measured spatial quantity is positive and their linear combination (4) with some negative weights could lead to a negative kriging estimate. More generally, nothing prevents the kriged values from violating range constraints induced by practical considerations on the studied quantity. Hence, one is tempted by the incorrect conclusion that all kriging weights should be positive. Actually, having some negative kriging weights is quite useful, since it allows a global kriging estimate to fall outside the range [mini z(xi ), maxi z(xi )]. Instead of forcing the weights to be positive, the constraint-based approach forces the estimate to be positive by adding a constraint on the estimate to the least squares optimization problem. More generally, the global constraint (10), solves the problem of getting meaningful kriging estimates. In [41], J.L. Mallet proposes a particular solution to the problem of constrained optimization given by means of quadratic programming, i.e. to the problem of minimizing a quadratic form (the error variance) under the constraint that the solution of this optimization program is inside the range [a, b]. The dual expression [9] of the kriging estimate (4) is of the form : n
z∗ (x0 ) = ∑ νiC(xi − x0 ).
(12)
i=1
This expression is obtained by incorporating in the linear combination of the observations (4), the kriging weights that are the solutions of the kriging system (6). Thus the dual kriging weights {νi , i = 1, . . . , n} now reflect the dependence between covariances {C(xi − x j ), i, j = 1, . . . , n} and the observations {z(xi ), i = 1, . . . , n}1 . 1
It can be noted that, in the precise framework, the dual formalism of kriging is computationally interesting. Indeed, the kriging system to be solved is obtained by minimization of (13), whatever the position of estimation x0 . It means that the kriging system has to be solved only once to provide an interpolation over all the domain. However, this system is difficult to solve and badly conditioned. Whereas the non dual systems, where the matrices’ coefficients are generally scarce, are more tractable. Therefore, it should be preferred to solve the dual kriging system in case of a high quantity of estimation points but with a small dataset and it should be preferred to solve the usual kriging system in case of a small number of estimation points but with a large dataset.
288
K. Loquin and D. Dubois
Built on Mallet’s approach [41], Dubrule and Kostov [23, 38] proposed a solution to this interpolation problem, that takes the form (12), where the dual kriging weights {νi , i = 1, . . . , n} are obtained by means of the quadratic program minimizing n
n
∑ ∑ νi ν jC(xi − x j ),
(13)
i=1 j=1
subject to n constraints: a(xi ) ≤
n
∑ ν jC(x j − xi ) ≤ b(xi ),
j=1
induced by the dataset Z¯n = {¯z(xi ) = [a(xi ), b(xi )], i = 1, . . . , n}. When only precise observations (i.e. when no inequality-type constraint) are present, the system reduces to a standard simple kriging system. However, the ensuing treatment of these constraints is ad hoc. Indeed, the authors propose to select one bound among a(xi ), b(xi ) for each constraint, namely the one supposed to affect the kriging estimate. They thus select a precise data set made of the selected bounds. The choice of this data set is just influenced by the wishes of the geostatistician in front of the raw data and on the basis of some preliminary kriging steps performed from some available precise data (if any).
5.2 The Soft Kriging Approach Methodology In 1986, A. Journel [37] studied the same problem of adapting the kriging methodology in order to deal with what he called “soft” information. According to him, “soft” information consists of imprecise data z˜(xi ), especially intervals, encoded by cumulative distribution functions (cdf) Fxi . The cumulative distribution function Fxi , attached to a precise value z(xi ) = ai = bi can be modelled by a step-function cdf with parameter ai = bi , i.e.: 1, if s ≥ a(xi ) = b(xi ), Fxi (s) = 0, otherwise. (cf. Figure 5.(a)). At each location xi where a constraint interval z¯(xi ) of the form (11) is present, the associated cdf Fxi is only known outside the constraint interval where it is either 0 or 1, i.e. : ⎧ ⎪ ⎨ 1, if s ≥ b(xi ), (14) Fxi (s) = 0, if s ≤ a(xi ), ⎪ ⎩ ? otherwise.
Kriging and Epistemic Uncertainty: A Critical Discussion
289
Fig. 5 Prior information on the observations
(cf. Figure 5.(c)). If the expert is unable to decide where, within an interval z¯(xi ) = [a(xi ), b(xi )], the value z(xi ) may lie, a non informative prior cdf (14) should be used. It should not be the uniform cdf within that interval, as the principle of maximum entropy would suggest, since it is not equivalent to a lack of information. In addition to the constraint interval z¯(xi ) of Dubrule and Kostov [23, 38], some prior information allows quantifying the likelihood of value z(xi ) within that interval. The corresponding cumulative distribution function Fxi (cf. Figure 5.(b)) is thus completed with prior subjective probabilities. At any other location, a minimal interval constraint exists (cf. (10) and Figure 5.(d)): z∗ (x) ∈ [a, b]. This constraint, as in the quadratic programming approach of Dubrule and Kostov, enables the problem of negative weights to be addressed. From this set of heterogeneous prior pieces of information, that we will denote by Z˜n = {˜z(xi ) = Fxi , i = 1, . . . , n}, Journel [37] proposes to construct a “posterior” cdf at the kriging estimation location x0 , denoted by Fx0 |Z˜n (s) = P(Z(x0 ) ≥ s|Z˜n ). In its simplest version, the so-called “soft” kriging estimate of the “posterior” cdf Fx0 |Z˜n is defined as a linear combination of the prior cdf data, for a given threshold value s ∈ [a, b], i.e. n
Fx0 |Z˜n (s) = ∑ λi (x0 , s)Fxi (s),
(15)
i=1
where the kriging weights, for a given threshold s ∈ [a, b], are obtained by means of usual kriging based on the random function Y (x) = Fx (s) at location x. Despite its interest, there are some aspects of this approach that are debatable: 1. The use of Bayesian semantics. Journel proposes to use the terminology of Bayesian statistics, by means of the term prior for qualifying the probabilistic
290
K. Loquin and D. Dubois
information attached to each piece of data and the term posterior, for qualifying the probabilistic information on the estimation point. However, in his approach, the computation of the posterior cdf is not made by means of the Bayesian updating procedure. He probably made this terminological choice because of the subjectivist nature of the information. However, this choice is not consistent with the Bayesian terminology. 2. The choice of a linear combination of the cdfs to compute the uncertain estimate. A more technical criticism of his approach concerns the definition of the kriged “posterior” cdf (15). The appropriateness of this definition supposes that the cdf of a linear combination of random variables is the linear combination of cdfs of these random variables. However, this is not correct. Propagating uncertainty bearing on arguments of an operation is not as simple as just replacing those arguments by their cdfs in the operation. Indeed, the cdf of Z∗ (x0 ) in (4), when {Z(xi ), i = 1, . . . , n} are random variables with cdfs given by {Fxi , i = 1, . . . , n}, is not given by (15), but via a convolution operator that could be approximated by means of a Monte Carlo method. If we assume complete dependence between measurements of Z(xi ), one may also construct the cdf of Z ∗ (x0 ) as a weighted sum of their quantile functions (inverse of cdf). These defects make this approach theoretically unclear, with neither an interpretation in the Bayesian framework nor in the frequentist framework. Note that the author [37] already noted the strong inconsistency of his method, namely the fact that the “posterior” cdf (15) may fail to respect the monotonicity property, inherent to the definition of a cumulative distribution function. Indeed, when some kriging weights are negative, it is not warranted that for s > s , Fx0 |Z˜n (s) > Fx0 |Z˜n (s ). He proposes an ad hoc correction of the kriging estimates, replacing the decreasing parts of Fx0 |Z˜n by flat parts. In spite of these criticisms of the well-foundedness of the Journel’s approach, a basic idea for handling epistemic uncertainty in the data appears in his paper. Indeed, the way Journel proposes to encode the dataset is the first attempt by some geostatisticians, to our knowledge, to handle incomplete information (or epistemic uncertainty) in kriging. Indeed the question mark in the encoding of a uniform intervallist data (14) is the first modelling of ignorance in geostatistics. This method tends to confuse subjective, Bayesian, and epistemic uncertainty. This confusion can now be removed in the light of recent epistemic uncertainty theories. Interestingly, their emergence [21, 55] occurred when the confusion between subjectivism (de Finetti’s school of probability [13]) and Bayesianism began to be clarified.
6 Fuzzy Kriging There are two main fuzzy set counterparts of statistical methods : The first one extends statistical principles like error minimisation, unbiasedness or stationarity to fuzzy set-valued realisations. Such an adaptation of prediction by kriging to triangular fuzzy data was suggested by Diamond [17]. The second one applies the extension principle to the kriging estimate [4, 5, 6] in the spirit of sensitivity analysis.
Kriging and Epistemic Uncertainty: A Critical Discussion
291
6.1 Diamond’s Fuzzy Kriging In the late 1980’s Phil Diamond was the first to extend Matheronian statistics to the fuzzy set setting, with a view to handle imprecise data. The idea was to exploit the notion of fuzzy random variables which had emerged a few years earlier after several authors (see [10] for a bibliography). Diamond’s approach relies on the Puri and Ralescu version of fuzzy random variables [49], which is influenced by the theory of random sets developed in the seventies by Matheron himself [44]. Diamond also proposed an approach to fuzzy least squares in the same spirit [15]. 6.1.1
Methodology
The data used by Diamond [17] are modelled by triangular fuzzy numbers, because of both their convenience and their applicability in most practical cases. Those triangular fuzzy numbers Tˆ are defined by their mode T m and the left and right bounds of their support T − and T + . They are then denoted by Tˆ = (T m ; T − , T + ). The set of all fuzzy triangular numbers is denoted by T . Diamond proposes to work with a distance D2 on T that makes the metric space (T , D2 ) complete [17] : ˆ Bˆ ∈ T , D2 (A, ˆ B) ˆ = (Am − Bm )2 + (A− − B− )2 + (A+ − B+)2 . ∀A, A Borel σ -algebra B can be constructed on this complete metric space. This allows the definition of fuzzy random variables [49], viewed as mappings from a probability space to a specific set of functions, namely a set (T , B) of triangular fuzzy random numbers. The expectation of a triangular fuzzy random number Xˆ is obtained by extending ˆ the concept of Aumann integral [3], defined for random sets, to all α -cuts of X. Definition 3. Let Xˆ be a triangular fuzzy random number, i.e. a T -valued random ˆ X], ˆ are given by: variable, the α -cuts of its expectation, denoted by E[ ˆ X] ˆ α = EAumann [Xˆ α ] ∀α ∈ [0, 1], E[ It can be shown that the expected value of a triangular fuzzy random number X is a ˆ X] ˆ = (E[X]m ; E[X]− , E[X]+ ). triangular fuzzy number, that will be denoted by E[ From those definitions, Diamond proposes to extend the concept of random function to T -valued (triangular fuzzy) random functions. He proposes to work with ˆ that verify, ∀x, x + h ∈ D, second-order stationary ones Z, ˆ Z(x)] ˆ ˆ E[ = (Mm ; M − , M + ) = M, ˆ ˆ ˆ + h)) = (Cm (h);C− (h),C+ (h)) = C(h), Cov(Z(x), Z(x where the triangular fuzzy expected value is constant on D and the triangular fuzzy covariance function is defined by :
292
K. Loquin and D. Dubois
⎧ m m m m 2 ⎪ ⎨ C (h) = E[Z (x)Z (x + h)] − (M ) C− (h) = E[Z − (x)Z − (x + h)] − (M−)2 ⎪ ⎩ + C (h) = E[Z + (x)Z + (x + h)] − (M+)2
(16)
Now, from this definition of fuzzy covariance function, the problem is to predict ˆ 0 ) at x0 . For this the value of the regionalized triangular fuzzy random variable Z(x prediction the following linear estimator is used zˆ∗ (x0 ) =
n
λi (x0 )ˆz(xi ),
i=1
where {ˆz(xi ), i = 1, . . . , n} are fuzzy data located on precise locations {xi , i = 1, . . . , n}, is the extension of the Minkowski addition of intervals to fuzzy triangular numbers. A set of precise kriging weights {λi (x0 ), i = by minimiza 1, . . . , n} is obtained ˆ 0 ))2 . The unbiasedtion of the precise mean squared error D = E D2 (Zˆ ∗ (x0 ), Z(x ness condition is extended to fuzzy quantities and induces the usual condition ∑ni=1 λi (x0 ) = 1 on kriging weights. Due to the form of distance D2 , the expression to be minimized, along the same line as simple kriging, can be expressed by : n
n
n
D = ∑ ∑ λi (x0 )λ j (x0 )C(xi − x j ) − 2 ∑ λ j (x0 )C(x0 − x j ) + C(x0 − x0 ), i=1 j=1
(17)
j=1
with C(xi − x j ) = Cm (xi − x j ) +C− (xi − x j ) +C+ (xi − x j ), ∀i, j = 0, . . . , n. The minimization of the error (17) leads to the following kriging system: ⎧ n ⎪ ⎪ ⎪ ∑ λ j (x0 )C(xi − x j ) − C(x0 − xi ) − θ − Li = 0, ∀i = 1, . . . , n ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ⎪ n ⎪ ⎪ ⎨ λi (x ) = 1
∑
0
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∑ Li λi (x0 ) = 0 ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎩ Li , λi (x0 ) ≥ 0, ∀i = 1, . . . , n. i=1 n
Where L1 , L2 , . . . , Ln and θ are Lagrange multipliers which allow, under KuhnTucker conditions, solving the optimization program for finding the set of kriging weights {λi (x0 ), i = 1, . . . , n} minimizing the error D. It should be noted that, in 1988, i.e. one year before the publication of his fuzzy kriging article, Philip Diamond published the same approach, restricted to interval data [16].
Kriging and Epistemic Uncertainty: A Critical Discussion
6.1.2
293
Discussion
Despite its mathematical rigor, there are several aspects of this approach that are debatable: 1. the shift from a random function to a fuzzy-valued random function, 2. the choice of a scalar distance D2 between fuzzy quantities, 3. the use of a Hukuhara difference in the computation of fuzzy covariance (16). 1. The first point presupposes a strict adherence to the Matheron school of geostatistics. However, it makes the conceptual framework (both at the conceptual and practical level) even more difficult to grasp. The metaphor of a fuzzy random field looks like an elusive artefact. The fuzzy random function is a mere substitute to a random function, and leads to a mathematical model with more parameters than the standard kriging technique. The key question is then: does it properly handle epistemic uncertainty? 2. The choice of a precise distance between fuzzy intervals is in agreement with the use of a precise variogram and it leads to a questionable way of posing the least square problem. First, a precise distance is used to measure the variance of the difference between ˆ 0 ) and Zˆ ∗ (x0 ). This is in contradiction the triangular fuzzy random variables Z(x with using a fuzzy-valued covariance when defining the stationarity of the triangular ˆ fuzzy random function Z(x). Why not then define the covariance between the fuzzy ˆ ˆ + h) as E[D2 (Z(x), ˆ ˆ 2 (Z(x ˆ + h), M)], ˆ i.e. like random variables Z(x) and Z(x M)D ˆ 0 ), Zˆ ∗ (x0 ))? Stationarity should then be expressed as C(h) = the variance of D2 (Z(x ˆ ˆ 2 (Z(x ˆ + h), M)]. ˆ E[D2 (Z(x), M)D However, insofar as fuzzy sets represent epistemic uncertainty, the fuzzy random function might represent a fuzzy set of possible standard random functions, one of which is the right one. Then, the scalar variance of a fuzzy random variable based on distance D2 evaluates the precise variability of functions representing the knowledge about ill-known crisp (i.e. fuzzy) realizations. However, it does not evaluate the imprecise knowledge about the variability of the underlying precise realizations [10]. The meaning of extracting a precise variogram from fuzzy data and minimizing the scalar variance of the membership functions (17) remains unclear. The variability of the knowledge about the quantity z(x) across domain D has hardly any relationship with the actual variability of the actual values z(x). To our opinion, the approach of Diamond is not cogent for handling epistemic uncertainty. In [10] a survey of possible notions of variance of fuzzy random variables, with discussions on their significance in the scope of epistemic uncertainty is proposed. It is argued that if a fuzzy random variable represents epistemic uncertainty, its variance should be imprecise or fuzzy as well. 3. The definition of second-order stationarity for triangular fuzzy random funcˆ tions is highly questionable. The fuzzy covariance function C(h) (16) proposed by Diamond is supposed to reflect the epistemic uncertainty on the covariance beˆ ˆ + h), which finds its source in the epistemic uncertainty contween Z(x) and Z(x ˆ ˆ veyed by Z. In his definition (16) of C(h), Diamond uses the Hukuhara difference [33] between supports of triangular fuzzy numbers (E[Z m (x)Z m (x + h)];
294
K. Loquin and D. Dubois
ˆ The Hukuhara difference between E[Z − (x)Z − (x + h)], E[Z + (x)Z + (x + h)]) and M. two intervals is of the form [a, b] [c, d] = [a − c, b − d]. Note that, the result may be such that a − c > b − d, i.e. not an interval. So, it is not clear that the inequalities ˆ Z(x ˆ + h)] Mˆ 2. C− (h) ≤ Cm (h) ≤ C+ (h) always hold when computing E[Z(x) The Hukuhara difference [33] between intervals is actually defined such that, [a, b] [c, d] = [u, v] ⇐⇒ [a, b] = [c, d] ⊕ [u, v] = [c + u, d + v], where ⊕ is the usual Minkowski addition of intervals. This property of the Hukuhara difference allows interpreting the epistemic transfer induced by this difference in the covariance definition of Diamond. In the standard case, the identity E[Z(x) − m][Z(x + h) − m] = E[Z(x)Z(x + h)]) − m2 = C(h) holds. When extending it to the fuzzy case in Diamond method, it is assumed that : ˆ Z(x ˆ + h) and Mˆ 2 are triangular fuzzy intervals when Z(x) ˆ • Z(x) and Mˆ are such. This is only a coarse approximation. • [E[Z − (x)Z − (x + h)], E[Z + (x)Z + (x + h)]] = [C− (h),C+ (h)] ⊕ [M − , M + ], so that ˆ ⊕ Mˆ 2 is identified to the imperfect knowlthe imperfect knowledge about C(h) ˆ Z(x) ˆ Z(x ˆ + h)]. An alternative definition is to let edge about E[ [E[Z − (x)Z − (x + h)], E[Z + (x)Z + (x + h)]] [M −, M+ ] = [C− (h),C+ (h)], using Minkowski difference of fuzzy intervals instead of Hukuhara difference in equation (16). It would ensure that the resulting fuzzy covariance is always a fuzzy interval, but it would be more imprecise. Choosing between both expressions require some assumption about the origin of epistemic uncertainty in this calculation. ˆ Z(x) ˆ ˆ ˆ − E(Z(x)))( ˆ + h) − • Besides, stating the fuzzy set equality C(h) = E[( Z(x ˆ E(Z(x + h)))] does not enforce the equality of the underlying quantities on each side. Finally, the Diamond approach precisely interpolates between fuzzy observations at various locations. Hence, the method does not propagate the epistemic uncertainty bearing on the variogram. Albeit fuzzy kriging provides a fuzzy interval estimate zˆ∗ (x0 ), it is difficult to interpret this fuzzy estimate as picturing our knowledge about the actual z∗ (x0 ) one would have obtained via kriging if the data had been precise. Indeed, the scalar influence coefficients in Diamond method reflect both the spatial variability of Z and the variability of the epistemic uncertainty of observations. This way of handling intervals or fuzzy intervals as “real” data is in fact much influenced by Matheron’s random sets where set realizations are understood as real objects (geographical areas), not as imprecise information about precise locations. The latter view of sets as epistemic constructs is more in line with Shafer’s theory of evidence [51], which also uses the formalism of random sets, albeit with the purpose of grasping incomplete information.
Kriging and Epistemic Uncertainty: A Critical Discussion
295
Overall, from the point of view of epistemic uncertainty, this approach to kriging looks questionable both at the philosophical and computational levels. Nevertheless the technique has been used in practical applications [54] by Taboada et al. in a context of evaluation of reserves in an ornamental granite deposit in Galicia in Spain.
6.2 Bardossy’s Fuzzy Kriging Not only may the epistemic uncertainty about the data Zn be modelled by intervals or fuzzy intervals, but one may argue that the variogram itself in its mathematical version should be a parametric function with interval-valued or fuzzy set-valued parameters. While Diamond was proposing a highly mathematical approach to fuzzy kriging, Bardossy et al. [4, 5, 6] between 1988 and 1990 also worked on this issue of extending kriging to epistemic uncertainty caused by fuzzy data. Beyond this adaptation of the kriging methodology to fuzzy data, they also propose in their method to handle epistemic uncertainty on the theoretical variogram model. In their approach, the variogram is tainted with epistemic uncertainty because the parameters of the theoretical variogram model are supposed to be fuzzy subsets. The epistemic uncertainty of geostatisticians regarding these parameters is then propagated to the variogram by means of the extension principle. Introduced by Lotfi Zadeh [58], it provides a general method for extending non fuzzy models or functions in order to deal with fuzzy parameters. For instance, fuzzy set arithmetics [21], that generalizes interval arithmetics, has been developed by applying the extension principle to the classical arithmetic operations like addition, subtraction... Definition 4. Let U,V and W be sets, and f a mapping from U × V to W . Let A be a fuzzy subset on U with a membership function denoted by μA , likewise a fuzzy set B on V . The image of (A, B) by the mapping f is a fuzzy subset C on W whose membership function is obtained by:
μC (w) =
sup (u,v)∈U×V |w= f (u,v)
min(μA (u), μB (v)).
In terms of possibility theory, it comes down to computing the degree of possibility Π ( f −1 (w)), w ∈ W . Actually, in their approach, Bardossy et al. do not directly use such a fuzzy variogram model in the kriging process. Their approach is, in a sense, more global since they propose to apply the extension principle, not only to the variogram model, but to the entire inversed kriging system and to the obtained kriging estimate z∗ (x0 ), because it is a function of the observations {z(xi ), i = 1, . . . , n}, of the parameters of the variogram model {a j , j = 1, . . . , p} and of the estimation position x0 . In other words, they express the kriging estimate as z∗ (x0 ) = f (z(x1 ), . . . , z(xn ), a1 , . . . , a p , x0 ), and they apply the extension principle to propagate the epistemic uncertainty of the fuzzy observations {ˆz(xi ), i = 1, . . . , n} and of the fuzzy parameters of the variogram
296
K. Loquin and D. Dubois
model {aˆ j , j = 1, . . . , p} to the kriging estimate zˆ∗ (x0 ). They propose to numerically solve the optimisation problem induced by their approach, without providing details. This approach is more consistent with the epistemic uncertainty involved in the kriging methodology than the Diamond’s method. However, there does not seem to be a tractable solution that can be applied to large datasets because of the costly optimisation involving fuzzy data. The question whether the epistemic uncertainty conveyed in an imprecise variogram is connected or not to the epistemic uncertainty about the data is worth considering. However, even in the presence of a precise dataset, one may argue that the chosen variogram is tainted with epistemic uncertainty that only the expert, who chooses it, could estimate.
7 Uncertainty in Kriging: A Prospective Discussion The extensions of kriging studied above may lead to a natural questioning about the nature of the uncertainty that pervades this interpolation method. Indeed, taking into account this kind of imperfect knowledge suggests, in the first stance, that the usual approach does not properly handle the available information. Being aware that information is partially lacking is in itself a piece of (meta-)information. Questioning the proper handling of uncertainty in kriging leads to examine two issues: • Is the random function model proposed by Matheron and followers cogent in spatial prediction? • How to adapt the kriging method to epistemic uncertainty without making the problem intractable? These questions seem require a reassessment of the role of probabilistic modeling in the kriging task, supposed to be of an interpolative nature, while it heavily relies on the use of least squares methods that are more central to regression techniques than to interpolation per se.
7.1 Spatial vs. Fictitious Variability It is commonly mentioned that probabilistic models are natural representations of phenomena displaying some form of variability. Repeatability is the central feature of the idea of probability as pointed out by Shafer and Vovk [52]. This is embodied by the use of probability trees, Markov chains and the notion of sample space. A random variable V (ω ) is a mapping from a sample space Ω to the real line, and variability is captured by binding the value of V to the repeated choices of ω ∈ Ω . The probability measure that equips Ω summarizes the repeatability pattern. In the case of the random function approach to geostatistics, the role of this scenario is not quite clear. Geostatistics is supposed to handle spatial variability of a numerical quantity z(x) over some geographical area D. Taken at face value, spatial variability means that when the location x ∈ D changes, so does z(x). However, when x is fixed z(x) is a precise deterministic value. Strictly speaking, these
Kriging and Epistemic Uncertainty: A Critical Discussion
297
considerations would lead us to identify the sample space with D, equipped with the Lebesgue measure. However, the classical geostatistics approach after Matheron is at odds with this simple intuition. It postulates the presence of a probability space Ω such that the quantity z depends on both x and ω ∈ Ω . z is taken as a random function: for each x, the actual value z(x) is substituted with a random variable Z(x) from a sample space Ω to the real line. The probability distribution of Z(x) is thus attached to a random quantity of interest z(x, ω ) at each location x. It implicitly means that this quantity of interest is variable (across ω ) and that you can quantify its variability. In the spatial interpolation problem solved by kriging, this kind of postulated variability at each location x of a spatial domain D, corresponds to no actual phenomenon. It is a mathematical artefact. As Chil`es and Delfiner [9] (p. 24) acknowledge, The statement “z(x) is a realization of a random function Z(x)” or even “of a stationary random function,” has no objective meaning.
Indeed, the quantity of interest at an estimation site x is deterministic and a single observation z(xi ) for a finite set of locations xi is available. It does not look sufficient to determine a probability distribution at each location x even if each Z(x) were actually tainted with variability. In fact, geostatisticians consider random functions not as reflecting randomness or variability actually present in natural phenomena, but as a pure mathematical model whose interest lies in the quality of predictions it can deliver. As Matheron said: Il n’y a pas de probabilit´e en soi, il y a des mod`eles probabilistes2
The great generality of the framework, whereby a deterministic spatial phenomenon is considered as a (unique) realisation of a random function is considered to be nonconstraining because it cannot be refuted by reality, and is not directly viewed as an assumption about the phenomenon under study. The spatial ergodicity assumption on the random function Z(x) is instrumental to relate its fictitious variability at each location of the domain to the spatial variability of the deterministic quantity z(x). While this assumption is easy to interpret in the temporal domain, it is less obvious in the spatial domain. The role of spatial ergodicity and stationarity assumptions is mainly to offer theoretical underpinnings to the least square technique used in practice. In other words, the random function approach is to be taken as a formal black-box model for data-based interpolation, and has no pretence to represent any real or epistemic phenomenon (beyond the observed data z(xi )). Probability in geostatistics is neither objective nor subjective: it is mathematical.
7.2 A Deterministic Justification of Simple Kriging One way of interpreting random functions in terms of actual (spatial) randomness is to replace pointwise locations by subareas (“blocks”) over which average 2
Cited by J.-P. Chil`es.
298
K. Loquin and D. Dubois
estimations can be computed. Such blocks must be small enough for ensuring a meaningful spatial resolution but large enough to contain a statistically significant number of measurements. This is called the trade-off between objectivity and spatial resolution. At the limit, using a single huge block, the random function is the same at each point and reflects the variability of the whole domain. On the contrary, if the block is very small, only a single observation is available, and an ill-known deterministic function is obtained. Some authors claim the deterministic nature of the kriging problem should be acknowledged. Journel [35] explains how to retrieve all equations of kriging without resorting to the concept of a random function. This view is close to what Matheron calls the transitive model. The first step is to define experimental mean m, ˆ standard deviation σˆ and variogram γˆ from the set of observation points {(xi , z(xi )), i = 1, . . . , n} in a block A . The two first quantities are supposed to be good enough approximations of the actual mean mA and standard deviation σA of z(x) in block A , viewed as a random variable with sample space A (and no more a fictitious random variable with an elusive sample space Ω ). The sample variogram value γˆ(h) approximates the quantity :
γA (h) =
2 Ah (z(x + h) − z(x)) dx
2|Ah |
,
taken over the set Ah formed by intersecting A and its translated by −h. In fact γA (h) = γA (−h) and the variogram value γA (h) applies to the domain Ah ∪ A−h . For h small enough it is representative of A itself. Journel [35] shows that there exists a stationary random function ZA (x) having such empirical characteristics: mA , σA and γA . Thus, if we define z∗ (x) = ∑ni=1 λi (x)z(xi ), the estimation variance (under the unbiasedness condition), defined by V[ZA (x) − Z ∗ (x)] = E[(ZA (x) − Z ∗ (x))2 ] (where Z ∗ (x) is the “randomized” kriging estimate of ZA (x)) coincides with the spatial in (z(x)−z∗ (x))2 dx
tegral A . Hence, ordinary kriging is basically the process of minimiz|A | ing a spatially averaged squared error over the domain A on the basis of available observations. The following assumption is made: A {x : x ∈ A , x + hi ∈ A , i = 1, . . . , n},
where hi = x0 − xi . It means that we restrict the kriging to the vicinity of the sample points xi and that this estimation area is well within A . It leads to retrieve the kriging equations. The unbiasedness assumption of the stochastic kriging is replaced by requiring a zero average error over A that no longer depends on x0 :
eA (x0 ) =
∗ A (z(x) − z (x))dx
|A |
= 0.
Kriging and Epistemic Uncertainty: A Critical Discussion
299
Note that A z∗ (x)dx = ∑ni=1 λi (x0 ) A z(x + hi )dx, and that due to the above assumption, A z(x + hi )dx = A z(x)dx. So, A z(x)dx = ∑ni=1 λi (x0 ) A z(x)dx and therefore, ∑ni=1 λi (x0 ) = 1. Then the squared error can be developed as n
n
(z(x)−z∗ (x))2 = z(x)2 −2 ∑ λi (x)z(x)z(x+hi )+ ∑ i=1
n
∑ λi(x)λ j (x)z(x+hi )z(x+h j ).
i=1 j=1
The spatially averaged squared error is obtained by integrating this expression over A . If we introduce the counterpart of a covariance in the form
CA (h) =
A
z(x)z(x + h)dx − m2A = σA2 − γA (h), |A |
it can be shown that we recognize, in the above mean squared error, the expression (5) of the simple kriging variance based on stationary random functions. Of course, the obtained linear system of equations is also the same and requires positive definiteness of the covariance matrix, hence the use of a proper variogram model fitted from the sample variogram. However under the purely deterministic spatial approach, this positiveness condition appears as a property needed to properly solve the least square equations. It is no longer related to the covariance of a random function. Failure of this condition on the sample variogram may indicate an ill-conditioning of the measured data that precludes the possibility of a sensible least square interpolation. In summary, the whole kriging method can be explained without requiring the postulate of a random function over D. There is no random function Z a realization of which is the phenomenon under study, but rather a random variable A on each block A the sample space of which is the block itself, that we can bind to a stationary random function ZA on the block. While this remark will not affect the kriging practice (since both the deterministic and the stochastic settings lead to the same equations in the end), it becomes important when epistemic uncertainty enters the picture, as it sounds more direct to introduce it in the concrete deterministic approach than in the abstract stochastic setting. It also suggests that teaching the kriging method may obviate the need for deep, but non-refutable, stochastic concepts like ergodicity and stationarity.
7.3 Towards Integrating Epistemic Uncertainty in Spatial Interpolative Prediction The above considerations lead us to a difficult task if epistemic uncertainty is to be inserted into the kriging method. Generalizing the random function framework to fuzzy random functions, whose mathematical framework is now well developed, looks hopeless. Indeed it certainly would not help providing a tractable approach, since the simplest form of kriging already requires a serious computational effort.
300
K. Loquin and D. Dubois
Adding interval uncertainty to simple kriging would also be mathematically tricky. It has been shown above that the method proposed by Diamond is not quite cogent, as it handles intervals or fuzzy intervals as objective values to which a scalar distance can be applied. The approach of Bardossy looks more convincing, even if the use of interval arithmetic is questionable. Computing an interval-valued sample variogram via optimisation is a very difficult task. Indeed, the computation of an interval-valued sample variance is a NP-hard problem [25]. The extension of the least squares method to interval-valued functions, if done properly, is also a challenging task as it comes down to inverting a matrix having interval-valued coefficients. In this respect the fuzzy least squares approach of Diamond [15], based on a scalar distance between fuzzy intervals is also problematic. It is not clear to see what the result tells us about the uncertainty concerning all least squares estimates that can be found from choosing precise original data inside the input intervals. Diamond’s kriging approach produces a scalar variogram, hence scalar influence coefficients, to be derived, which does not sound natural, as one may on the contrary expect that the more uncertain the data, the more uncertain the ensuing variogram. On the other hand, extending the least square method to ill-known data modeled by fuzzy intervals in a meaningful way, that is by letting the imprecision of the variogram impact the influence coefficients, looks computationally challenging. One may think of a method dual to the Diamond’s approach, that would be based on precise data plus an imprecise variogram, thus leading to imprecise interpolation between precise data. Such an imprecise variogram would be seen as a family of theoretical variograms induced by the sample variogram. Even if we could compute fuzzy influence coefficients in an efficient way from such imprecise or fuzzy variograms, it is not correct to apply interval or fuzzy interval arithmetic to the linear combination of fuzzy data when the influence coefficients are fuzzy, even if their uncertainty were independent from the uncertainty pervading the data, due to the normalisation constraint [20]. But the epistemic uncertainty of influence coefficients partially depends on the quality of the data (especially if an automatic fitting procedure is used for choosing the variogram). So it is very difficult to handle data uncertainty in a non-redundant way in the resulting fuzzy kriging estimates. As far as epistemic uncertainty is concerned, there is a paradox in kriging that is also present in interpolation techniques if considered as prediction tools: the kriging result is precise. However, intuitively, the farther is x0 from the known points xi , the less we know about z(x0 ). A cogent approach to estimating the loss of information when moving away from the known locations is needed. Of course, within the kriging approach, one can resort to using the kriging variance as an uncertainty indicator, but it is known not to depend on the data values z(xi ), and again relies on assumptions on the underlying fictitious random function that is the theoretical underpinning of kriging. It is acknowledged [53] that kriging variance is not estimation variance but rather some index of data configuration. Thus, it seems obvious that techniques more advanced than the usual kriging variance are required for producing a useful estimation of the kriging error or imprecision.
Kriging and Epistemic Uncertainty: A Critical Discussion
301
So, a rigorous handling of epistemic uncertainty in kriging looks like a non-trivial task. Is it worth the effort? In fact, kriging is a global interpolation method that does not take into account local specificities of terrain since the variogram relies on averages of differences of measured values at pairs of points located at a given distance from each other. Indeed parameters of the variogram are estimated globally. This critique can be found repeatedly in the literature. This point emphasizes the need to use other kinds of possibly imprecise knowledge about the terrain than the measured points. Overall, the handling of epistemic uncertainty in spatial prediction (independently of the problem of the local validity of the kriging estimates) could be carried out using one of the following methodologies 1. Replace the kriging approach by techniques that would be mathematically simpler, more local, and where the relationship between interpolation coefficients and local dependence information would be more direct. For instance we could consider interpolation techniques taking into account local gradient estimates from neighboring points (even interpolating between locally computed slopes). This would express a more explicit impact of epistemic uncertainty, present in the measured data and in the knowledge of local variations of the ill-known spatial function, on the interpolation expression, obviating the need for reconsidering a more genuine fuzzy least square method from scratch. This move requires further investigations of the state of the art in the interpolation area so as to find a suitable spatial prediction technique. 2. Use probabilistic methods (such as Monte-Carlo or Gibbs sampling) to propagate uncertainty taking the form of epistemic possibility distributions (intervals or fuzzy intervals) on variogram parameters and/or observed data. Such an idea is at work for instance in the transformation method of Michael Hanss [30] for mechanical engineering computations under uncertainty modelled by fuzzy sets. The idea is to sample a probability distribution so as to explore the values of a complex function over an uncertainty domain. In such a method the probability distribution is just a tool for guiding the computation process. The set of obtained results (scenarii) should not be turned into a histogram but into a range of possible outputs. The use of fuzzy sets would come down to explore a family of nested confidence domains with various confidence values, thus yielding a fuzzy set of possible outputs (e.g. a kriged value). The merit of this approach, recently developed by the authors [40] is to encapsulate already existing kriging methods within a stochastic simulation scheme, the only difference with other similar stochastic methods being the non-probabilistic exploitation of the results.
8 Conclusion The stochastic framework of geostatistics and the ensuing kriging methodology are criticized in the literature for three reasons:
302
K. Loquin and D. Dubois
• The purely mathematical nature of the random function setting and the attached assumptions of stationarity and ergodicity, that are acknowledged to be nonrefutable; • the questionable legitimacy, for local predictions, of a global index of spatial dependence such as the variogram, that averages out local trends; of course, the use of selected neighborhoods of measured values that change with each kriged location point can address this issue, albeit at the expense of a loss of continuity of the kriged surface. • the computational burden of the kriging interpolation method and the poor interpretability of its influence coefficients. On the first point, it seems that the choice of modeling a deterministic quantity by a random variable does not respect the principle of parsimony. If a deterministic model yields the same equations as the stochastic one, and moreover seems to coincide with our perception of the underlying phenomenon, the simpler model should be preferred (this is the case with simple kriging, as shown above). And the practical test according to the best prediction should be mitigated by the appraisal of the complexity of the used modeling framework. On the second point, a variogram represents global information about a domain. Here, we do face a major difficulty common to all statistical approaches. Even if the set of observations is large over the whole domain, local predictions will have a very poor validity if the number of observations in the vicinity of the predicted value location is too small. This conflict between the requested precision of predicted values and the necessity of large observation samples is pointed out by the advocates of kriging too. The computational burden of kriging, even if not actually so high in the simpler versions, may pose a difficulty if epistemic uncertainty must be taken into account. As shown in section 4, available methods that try to introduce epistemic uncertainty into this technique seem to make it even more complex, and sometimes mathematically debatable, while by construction, they are supposed to provide imprecise outputs. Besides, it is not so easy to relate the form of the variogram and the expressions of the kriging coefficients, and to figure out how they affect the derivatives of the interpolated function, while one may have some prior information on such derivatives from geological knowledge of a prescribed terrain. Devising a spatial prediction method that could be simple enough to remain tractable under epistemic uncertainty, and realistic enough to provide faithful information about a given terrain where some measurements are available remains a challenging task, and an open research problem. Three lines of research have been explored so far • Treating fuzzy observations like complex crisp observations in a suitable metric space: this approach is not really treating epistemic uncertainty, as discussed in section 6.1. • Applying fuzzy arithmetics. This is used also by Diamond when computing the interpolation step. However, it cannot be used throughout the whole kriging method, because there is no explicit expression of the influence weights in terms
Kriging and Epistemic Uncertainty: A Critical Discussion
303
of the variogram parameters. And would there be one, replacing scalar arithmetic operations by fuzzy ones would lead to a considerable loss of precision. • Using optimisation techniques as popular in the interval analysis area. This was suggested very early by Bardossy in the fuzzy case, Dubrule and Kostov in the interval case. But it looks already computationally intractable to study the sensitivity of the kriging estimates to variogram parameters lying in intervals via optimisation. The most promising line of research is to adapt the stochastic simulation methods to the handling of fuzzy interval analysis [40]. Indeed, it would enable existing kriging methods and stochastic exploration techniques to be exploited as such. The only difference is that the input data would be specified as representing epistemic uncertainty by nested sets of confidence intervals, and that the results of the computation would not be interpreted as a probability distribution, but exploited levelwise to form the fuzzy kriged values. Acknowledgements. This work is supported by the French Research National Agency (ANR) through the CO2 program (project CRISCO2 ANR-06-CO2-003). The issue of handling epistemic uncertainty in geostatistics was raised by Dominique Guyonnet. The authors also wish to thank Jean-Paul Chil`es and Nicolas Desassis for their comments on a first draft of this paper and their support during the project.
References 1. Armstrong, M.: Positive Definiteness is Not Enough. Math. Geol. 24, 135–143 (1990) 2. Armstrong, M., Jabin, R.: Variogram Models Must Be Positive Definite. Math. Geol. 13, 455–459 (1981) 3. Aumann, R.J.: Integrals of set-valued functions. J. Math. Anal. Appl. 12, 1–12 (1965) 4. Bardossy, A., Bogardi, I., Kelly, W.E.: Imprecise (fuzzy) information in geostatistics. Math. Geol. 20, 287–311 (1988) 5. Bardossy, A., Bogardi, I., Kelly, W.E.: Kriging with imprecise (fuzzy) variograms. I: Theory. Math. Geol. 22, 63–79 (1990) 6. Bardossy, A., Bogardi, I., Kelly, W.E.: Kriging with imprecise (fuzzy) variograms. II: Application. Math. Geol. 22, 81–94 (1990) 7. Berger, J.: An overview of robust Bayesian analysis [with Discussion]. Test 3, 5–124 (1994) 8. Berger, J.O., de Oliveira, V., Sanso, B.: Objective Bayesian analysis of spatially correlated data. Journal of the American Statistical Association 96, 1361–1374 (2001) 9. Chil`es, J.P., Delfiner, P.: Geostatistics, Modeling Spatial Uncertainty. Wiley, New York (1999) 10. Couso, I., Dubois, D.: On the variability of the concept of variance for fuzzy random variables. IEEE Transactions on Fuzzy Systems 17(5), 1070–1080 (2009) 11. Cressie, N.A.C.: The origins of kriging. Math. Geol. 22, 239–252 (1990) 12. Cressie, N.A.C.: Statistics for Spatial Data. Revised edn. John Wiley & Sons, New York (1993) 13. de Finetti, B.: Theory of probability: a critical introductory treatment. John Wiley & Sons, New York (1974)
304
K. Loquin and D. Dubois
14. de Oliveira, V., Kedem, B., Short, D.A.: Bayesian prediction of transformed Gaussian random fields. Journal of the American Statistical Association 92, 1422–1433 (1997) 15. Diamond, P.: Fuzzy least squares. Information Sciences 46, 141–157 (1988) 16. Diamond, P.: Interval-valued random functions and the kriging of intervals. Math. Geol. 20, 145–165 (1988) 17. Diamond, P.: Fuzzy kriging. Fuzzy Sets and Systems 33, 315–332 (1989) 18. Dubois, D.: Possibility theory and statistical reasoning. Computational Statistics & Data Analysis 51, 47–69 (2006) 19. Dubois, D., Foulloy, L., Mauris, G., Prade, H.: Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities. Reliable Computing 10, 273–297 (2004) 20. Dubois, D., Prade, H.: Additions of interactive fuzzy numbers. IEEE Trans. on Automatic Control 26(4), 926–936 (1981) 21. Dubois, D., Prade, H.: Possibility Theory. Plenum Press, New York (1988) 22. Dubrule, O.: Comparing splines and kriging. Comp. Geosci. 10, 327–338 (1984) 23. Dubrule, O., Kostov, C.: An interpolation method taking into account inequality constraints: I. Methodology. Math. Geol. 18, 33–51 (1986) 24. Emery, X.: Disjunctive kriging with hard and imprecise data. Math. Geol. 35, 699–718 (2003) 25. Ferson, S., Ginzburg, L., Kreinovich, V., Longpr´e, L., Aviles, M.: Computing variance for interval data is NP-hard. SIGACT News 33, 108–118 (2002) 26. Gaudard, M., Karson, M., Linder, E., Sinha, D.: Bayesian spatial prediction. Environmental and Ecological Statistics 6, 147–171 (1999) 27. Goldstein, M., Wooff, D.: Bayes Linear Statistics. In: Theory and Methods, Wiley, Chichester (2007) 28. Goovaerts, P.: Geostatistics for Natural Resources Evaluation. Oxford Univ. Press, NewYork (1997) 29. Handcock, M.S., Stein, M.L.: A Bayesian analysis of kriging. Technometrics 35, 403–410 (1993) 30. Hanss, M.: The transformation method for the simulation and analysis of systems with uncertain parameters. Fuzzy Sets and Systems 130, 277–289 (2002) 31. Hartigan, J.A.: Linear Bayesian methods. J. Royal Stat. Soc. Ser. B 31, 446–454 (1969) 32. Helton, J.C., Oberkampf, W.L.: Alternative representations of epistemic uncertainty. Reliability Engineering and System Safety 85, 1–10 (2004) 33. Hukuhara, M.: Integration des applications measurables dont la valeur est un compact convexe. Funkcialaj Ekvacioj 10, 205–223 (1967) 34. Journel, A.G., Huijbregts, C.J.: Mining Geostatistics. Academic Press, New York (1978) 35. Journel, A.G.: The deterministic side of geostatistics. Math. Geol. 17, 1–15 (1985) 36. Journel, A.G.: Geostatistics: Models and Tools for the Earth Sciences. Math. Geol. 18, 119–140 (1986) 37. Journel, A.G.: Constrained interpolation and qualitative information - The soft kriging approach. Math. Geol. 18, 269–286 (1986) 38. Kostov, C., Dubrule, O.: An interpolation method taking into account inequality constraints: II. Practical approach. Math. Geol. 18, 53–73 (1986) 39. Krige, D.G.: A statistical approach to some basic mine valuation problems on the Witwatersrand. Journal of the Chemical, Metallurgical and Mining Society of South Africa 52, 119–139 (1951) 40. Loquin, K., Dubois, D.: Kriging with ill-known variogram and data. In: Deshpande, A., Hunter, A. (eds.) SUM 2010. LNCS (LNAI), vol. 6379, pp. 219–232. Springer, Heidelberg (2010)
Kriging and Epistemic Uncertainty: A Critical Discussion
305
41. Mallet, J.L.: R´egression sous contraintes lin´eaires: application au codage des variables al´eatoires. Revue de Statistiques appliqu´ees 28, 57–68 (1980) 42. Matheron, G., Blondel, F.: Trait´e de g´eostatistique appliqu´ee. Editions Technip, Paris (1962) 43. Matheron, G.: Le krigeage universel. Cahiers du Centre de Morphologie Math´ematique ´ de Fontainebleau, Fasc. 1, Ecole des Mines de Paris (1969) 44. Matheron, G.: Random Sets and Integral Geometry. John Wiley & Sons, NewYork (1975) ´ 45. Matheron, G.: Estimer et choisir: essai sur la pratique des probabilit´es. Ecole des Mines de Paris (1978) ´ 46. Matheron, G.: Suffit-il pour une covariance d’ˆetre de type positif? Etudes G´eostatistiques V, S´eminaire CFSG sur la G´eostatistique, Fontainebleau, Sciences de la Terre Informatiques (1987) 47. Matheron, G.: The Internal Consistency of Models in Geostatistics. In: Armstrong, M. (ed.) Geostatistics, Proceedings of the Third International Geostatistics, Avignon, pp. 21–38. Kluwer Academic Publishers, Dordrecht (1989) 48. Omre, H.: Bayesian Kriging - merging observations and qualified guesses in kriging. Math. Geol. 19, 25–39 (1987) 49. Puri, M.L., Ralescu, D.A.: Fuzzy random variables. J. Math. Anal. Appl. 114, 409–422 (1986) 50. Rios Insua, D., Ruggieri, F.: Robust Bayesian Analysis. Springer, Berlin (2000) 51. Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976) 52. Shafer, G., Vovk, V.: Probability and Finance: It’s Only a Game! Wiley, New York (2001) 53. Srivastava, R.M.: Philip and Watson–Quo vadunt? Math. Geol. 18, 141–146 (1986) 54. Taboada, J., Rivas, T., Saavedra, A., Ord´on˜ ez, C., Bastante, F., Gir´aldez, E.: Evaluation of the reserve of a granite deposit by fuzzy kriging. Engineering Geol. 99, 23–30 (2008) 55. Walley, P.: Statistical reasoning with imprecise probabilities. Chapman and Hall, London (1991) 56. Watson, G.S.: Smoothing and interpolation by kriging and with splines. Math. Geol. 16, 601–615 (1984) 57. Yaglom, A.M.: An introduction to the theory of stationary random functions. Courier Dover Publications (2004) 58. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965) 59. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems 1, 3–28 (1978)
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases Francesco Parisi, Austin Parker, John Grant, and V.S. Subrahmanian
Abstract. SPOT databases have been proposed as a paradigm for efficiently reasoning about probabilistic spatio-temporal data. A selection query asks for all pairs of objects and times such that the object is within a query region with a probability within a stated probability interval. Two alternative semantics have been introduced for selection queries: optimistic and cautious selection. It has been shown in past work that selection is characterized by a linear program whose solutions correspond to certain kinds of probability density functions (pdfs). In this chapter, we define a space called the SPOT PDF Space (SPS for short) and show that the space of solutions to a cautious selection query is a convex polytope in this space. This convex polytope can be approximated both by an interior region and a containing region. We show that both notions can be jointly used to prune the search space when answering a query. We report on experiments showing that cautious selection can be executed in about 4 seconds on databases containing 3 million SPOT atoms.
1 Introduction The rapid recent growth in GPS, RFID, and cell phone technology has significantly increased the importance of applications where we need to reason about probabilistic spatio-temporal data. For instance, a military agency reasoning about enemy vehicles might predict when and where those vehicles might be in the future, and with what probability [26, 16]. A cell phone provider might want to predict Francesco Parisi Universit`a della Calabria, Rende (CS), Italy e-mail: [email protected] Austin Parker · V.S. Subrahmanian University of Maryland, College Park, Maryland, USA e-mail: [email protected],[email protected] John Grant Towson University, Towson, Maryland, USA, and University of Maryland, College Park, Maryland, USA e-mail: [email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 307–340. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
308
F. Parisi et al.
load on its cell towers by determining what cell phones will be in range of a tower, when they might be in range, and with what probability. Likewise, a transportation company such as Fedex might predict when and where packages will be (and with what probability) in order to allocate scarce resources (trucks, drivers) to deliver packages on time. [30, 29] has developed a framework called SPOT databases for such reasoning. Informally speaking, a SPOT atom is a sentence of the form “object id is/was/will be in region r at time t with a probability in the interval [, u].” A SPOT database is a finite set of such sentences. [30] develops a logical semantics for SPOT databases with no independence assumptions and describes how various relational algebra style operations can be handled. Informally, a selection query says “Find all pairs id,t such that the object id is in a given region q at time t with probability in the interval [, u].” We distinguish between the optimistic semantics (all the id,t pairs for which id might possibly be in query region q at time t with probability between and u) and the cautious semantics (all the id,t pairs for which id is guaranteed to be in q at time t with probability between and u). [30] defined cautious selection — in their initial experiments [30, Figure 4], it took nine minutes to answer cautious selection queries on SPOT databases consisting of just 5000 atoms. When SPOT databases were restricted to the so-called “disjoint” SPOT databases (i.e. all atoms’ regions are disjoint), however, they were able to answer the same queries in under ten seconds. Later, [29] developed algorithms and a data structure called a SPOT -tree to efficiently process optimistic (but not cautious) selection queries. The principal goal of this chapter is to efficiently answer cautious selection queries (without the disjointness restriction) on SPOT databases consisting of as many as 3 million SPOT atoms (in just over 4 seconds) with both a real world US Navy database on a 180 × 360 region and synthetic data sets on 1000 × 1000 regions. This substantial speedup is made through a suite of pruning strategies in conjunction with data structures from [29]. The pruning strategies use the concept of a SPOT PDF Space (SPS for short) which is the set of all probability density functions (pdfs) that correspond to solutions of a certain linear program that can be associated with any id,t pair and SPOT database S . We show that this set forms a convex polytope that is often very complex and has many vertices; however, it can often be approximated by an interior region and a containing region. The containing regions do for the SPOT PDF Space what minimal bounding rectangle like constructs do for regions in a spatial database system. However, we have not seen much use of interior regions in spatial databases — it turns out that they are very useful in scaling cautious selection. In this chapter, we propose efficient ways of finding interior and containing regions and show how these two concepts can be used to efficiently check if the convex region determined by the query intersects the containing region or contains the interior region. Either of these two conditions yields a quick improvement in terms of the efficiency of cautious selection. We further show how multiple interior regions can be used and combined to improve the effectiveness
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
309
of our pruning technique. We conduct detailed experiments on both synthetic and real-world data sets showing that our algorithms for cautious selection are far superior to those in [30].
2 SPOT Databases: Background This section reviews past work on the syntax and semantics of SPOT databases given in [30]. A summary of the most important notations used throughout the chapter can be found in Appendix A.
2.1 SPOT Syntax We assume the existence of a set ID of vehicle ids, a set T of time points ranging over the integers, and a finite set Space of points. Unless stated otherwise, we assume that Space is a grid of size N × N where we only consider integer coordinates.1 . We assume that an object can be in only one location at a time, but that a single location may contain more than one object. A rectangle is any region that can be described by constraints of the form le f t ≤ x ≤ right and bottom ≤ y ≤ top where le f t, right, bottom,top are all integers. Thus, all rectangles are forced to have edges parallel to the x and y-axes. Definition 1 (SPOT atom/database). A SPOT atom is a tuple (id, r,t, [, u]), where id ∈ ID is an object id, r ⊆ Space is a rectangular region in space, t is a time point, and , u ∈ [0, 1] are probability bounds with ≤ u. A SPOT database is a finite set of SPOT atoms. Example 1. Fig. 1 shows an example SPOT database. The first row in this table specifies that Phone1 is in region R1 at time 0 with probability between 0.7 and 0.75. Given a SPOT database S , a fixed vehicle id and a fixed time t, we use the notation S id,t to refer to the set: S id,t = {(id , r ,t , [ , u ]) ∈ S | id = id ∧ t = t}. Definition 2 (Selection query). A selection query is an expression of the form (?id, q, ?t, [, u]) where q is a region of Space, not necessarily rectangular, [, u] is a probability interval, ?id is a variable ranging over ids in ID, and ?t is a variable ranging over time points in T . Intuitively, a selection query says: “Find all objects id and times t such that the object id is inside the specified region q at time t with a probability in the [, u] interval.” We will show later that there are two semantics for interpreting this statement, leading to two types of selection queries. 1
The framework is easily extensible to higher dimensions.
310
F. Parisi et al.
ID Phone1 Phone1 Phone2 Phone2 Phone3 Phone3
Region Time Lower Upper Bound Bound R1 0 0.7 0.75 R2 1 0.6 0.9 R3 0 0.9 1 R3 1 0.95 1 R4 0 0.8 0.9 R5 0 0.7 0.9
Fig. 1 An example SPOT database with three cell phone carriers and two time points
Example 2. We might ask which phones are in range of the tower (and when) with at least 75% probability in the SPOT database from Fig. 1. The region q represents the tower’s range, making the selection query: (?id, q, ?t, [0.75, 1]).
2.2 SPOT Semantics We now review the semantics of SPOT databases introduced in [30]. Definition 3 (SPOT interpretation). A SPOT interpretation is a function I : ID × Space × T → [0, 1] such that for each id ∈ ID and t ∈ T ,
∑
I(id, p,t) = 1.
p∈Space
For a given an interpretation I, we sometimes abuse notation and write I id,t (p) = I(id,t, p). In this case, I id,t is a pdf. We will switch between both these notations in the chapter.
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
311
Example 3. An interpretation I1 for the SPOT database in Fig. 1 might assign probability 1 to each of {Phone1, Phone2, Phone3} being at location (0, 0) at both times 0 and 1. This interpretation does not reflect the information in the database. One that does might be I2 defined as follows: I2 (Phone1, (20, 15), 0) = 0.75 I2 (Phone1, (5, 25), 1) = 0.7 I2 (Phone2, (10, 15), 0) = 1 I2 (Phone3, (10, 5), 0) = 0.9 I2 (Phone3, (15, 15), 1) = 0.5
I2 (Phone1, (25, 10), 0) = 0.25 I2 (Phone1, (15, 20), 1) = 0.3 I2 (Phone2, (10, 15), 1) = 1 I2 (Phone3, (20, 5), 0) = 0.1 I2 (Phone3, (15, 16), 1) = 0.5
For all unmentioned parameters I2 (·, ·, ·) = 0. Given an interpretation I and region r, the probability that object id is in r at time t according to I is Σ p∈r I(id, p,t). We now define satisfaction by an interpretation. Definition 4 (Satisfaction). Let sa = (id, r,t, [, u]) be a SPOT atom and let I be a SPOT interpretation. We say that I satisfies sa (denoted I |= sa) iff ∑ p∈r I(id, p,t) ∈ [, u]. I satisfies SPOT database S (denoted I |= S ) iff I satisfies every atom in S . Example 4. For I1 and I2 in Example 3, I2 satisfies the SPOT database in Fig. 1, while I1 does not satisfy the database in Fig. 1. We use I(S ) to denote the set of interpretations that satisfy a SPOT database S , that is, I(S ) = {I | I |= S }. Definition 5 (Consistency). SPOT database S is consistent iff I(S ) = 0. / Definition 6 (Compatibility). We say that SPOT atom sa is compatible with SPOT database S (denoted sa S ) iff S ∪ {sa} is consistent. Definition 7 (Entailment). A SPOT database S entails a SPOT atom sa iff every I ∈ I(S ) satisfies sa. We denote this as S |= sa. The notions of consistency and entailment given above yield two alternative semantics for selection queries. Definition 8 (Optimistic/Cautious selection) Suppose S is a SPOT database and (?id, q, ?t, [, u]) is a selection query. The optimistic answer to (?id, q, ?t, [, u]) is the set {id,t | id ∈ ID ∧ t ∈ T ∧ (id, q,t, [, u]) S }. The cautious answer to (?id, q, ?t, [, u]) is the set {id,t | id ∈ ID ∧ t ∈ T ∧ S |= (id, q,t, [, u])}. We now present a simple example.
312
F. Parisi et al.
Example 5. The cell phone company is interested in knowing who will be using the cell tower and when. Since the tower only services phones in region q, this question can be answered via the selection query (?id, q, ?t, [0.75, 1]), which asks for id,t pairs such that id is in the region served by the cell tower with a probability of at least 75%. The optimistic answer tells us at which times what objects may possibly be in q with at least a 75% probability. The optimistic answer is: Phone1 at times 0 and 1, Phone2 at times 0 and 1, and Phone3 at time 1. Phone3 cannot be in the query region at time 0 because the probabilities of it being in regions R4 and R5 make this impossible. If the cell phone company wants to know at which times what objects can be guaranteed to be in the query region with a probability of at least 75%, the query can be posed as a cautious query. The cautious answer is: Phone2 at times 0 and 1. Neither Phone1 nor Phone3 can be guaranteed to be in q with at least 75% probability at any time.
2.3 Linear Programming and SPOT DBs Given a SPOT database S , a vehicle id, and a time point t, [30] defined a set LC(S , id,t) of linear constraints. LC(S , id,t) uses variables v p , to denote the probability that vehicle id will be at point p ∈ Space at time t. Definition 9 (LC(·)). For SPOT database S , id ∈ ID, and t ∈ T , LC(S , id,t) contains: id,t - ∀(id, r,t, [, u]) ∈S , v ≥ ∈ LC(S , id,t), ∑ p∈r p v ≤ u ∈ LC(S , id,t) ∑ p∈r p ∑ p∈Space v p = 1 ∈ LC(S , id,t), - ∀p ∈ Space (v p ≥ 0) ∈ LC(S , id,t) - No other constraints are in LC(S , id,t).
[30] shows that a SPOT database is consistent iff LC(S , id,t) is solvable for all id,t pairs. This yields an immediate linear programming algorithm to check the consistency of a SPOT database. The complexity of the algorithm was shown to be O(|ID| · |T | · (|Space| · |S |)3 ). Throughout the rest of this chapter, we assume that S is consistent. The compatibility and entailment of a SPOT atom can be checked via the following result shown in [30]. Theorem 1. Given a SPOT database S and a SPOT atom (id, r,t, [, u]), i) (id, r,t, [, u]) S iff LC(S ∪ {(id, r,t, [, u])}, id,t) has a solution. ii) S |= (id, r,t, [, u]) iff [ , u ] ⊆ [, u] where - = minimize Σ p∈r v p subject to LC(S , id,t) - u = maximize Σ p∈r v p subject to LC(S , id,t). Theorem 1 gives us a method to compute optimistic and cautious selection. Given the selection query Q = (?id, q, ?t, [, u]) over the SPOT database S , the optimistic
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
313
answer to Q can be computed by solving, for each pair id,t in S , the linear program LC(S ∪ {(id, q,t, [, u])}, id,t): if it has a solution then id,t is in the optimistic answer of Q. The cautious answer to Q can be computed by solving, for each pair id,t in S , the two optimization problems which return the interval [ , u ]. Then, checking if [ , u ] ⊆ [, u] is sufficient for deciding if the pair id,t is in the cautious answer of Q. We refer to these approaches as the naive algorithms for optimistic and cautious selection.
3 Interior and Containing Regions in the Space of Interpretations Recall that I id,t (defined just after Definition 3) assigns a probability value to every point in Space. Hence, we can represent I id,t as a vector v¯ whose length is proportional to the number of points in Space. We write v¯ p for I id,t (p). Clearly, v¯ is also a pdf. As each component of v¯ is a probability value for a point in Space, v¯ ∈ [0, 1]Space . We call [0, 1]Space the SPOT PDF Space.2 Let P(S , id,t) = { v ∈ [0, 1]Space | v is a solution to LC(S , id,t) }. The following example shows that P(S , id,t) is a polytope in SPS space. Example 6. Let Space = {p1 , p2 , p3 } and Sexm = {(id, {p1 },t, [0.2, 0.5]), (id, {p2 },t, [0.1, 0.6]), (id, {p1 , p2 },t, [0.2, 0.8])}. P(Sexm , id,t) is the 3-dimensional polytope in [0, 1]{p1,p2 ,p3 } whose projection on the SPS space [0, 1]{p1,p2 } is that depicted in Fig. 2. Next, for a query Q = (?id, q, ?t, [, u]), we define Space Q(q, ,u) = v ∈ [0, 1] ∑ v p ≥ and p∈q
∑ vp ≤ u
.
p∈q
Thus, a query Q implicitly defines a convex polytope Q(q, ,u) in SPS space. Example 7. For Space = {p1 , p2 , p3 }, the query Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]) produces the 3-dimensional polytope Q({p1 , p2 }, 0.4, 0.8) whose projection on the SPS space [0, 1]{p1 ,p2 } is represented by the trapezoid shown in Fig. 3. The points of polytope Q({p1 , p2 }, 0.4, 0.8) are such that the values of coordinate v p3 can be any value in [0, 1]. 2
Clearly, as v¯ is a vector whose dimensionality is N 2 , it can be represented in a standard spatial data structure. However, in our experiments, N can be 1000 or more, leading to vectors in a million dimensional space. In short, even if we are looking at a 2-dimensional spatial region of size 1000 × 1000 (which spatial databases can handle very well), the SPS space generated by uncertainty over such a region has a million dimensions. It is well known that techniques such as R-trees and its variants cannot handle such large dimensional spaces well. We do not claim we can handle arbitrary large dimensional spaces - just that we can efficiently handle the specific type of SPS space used in cautious selection in SPOT databases.
314
Fig. 2 Projection on the SPS space [0, 1]{p1 ,p2 } of the polytope specified by the SPOT database Sexm in Example 6
F. Parisi et al.
Fig. 3 Projection on the SPS space [0, 1]{p1 ,p2 } of the polytope representing an example query region
If Space were {p1 , p2 }, then the query Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]) would produce the 2-dimensional polytope depicted in Fig. 3. The following Corollary to Theorem 1 provides a method to check if an id,t pair is an answer (optimistic or cautious) to a query using polytopes from SPS space (as opposed to using the linear program in Theorem 1). Corollary 1. Given a SPOT database S and query Q = (?id, q, ?t, [, u]), for each id,t pair, i) (id, q,t, [, u]) S iff P(S , id,t) ∩ Q(q, , u) = 0. / ii) S |= (id, q,t, [, u]) iff P(S , id,t) ⊆ Q(q, , u). Example 8. Consider the SPOT database Sexm of Example 6 and the query Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]) of Example 7. Checking whether id,t is an optimistic answer to query Q means checking whether atom (id, {p1 , p2 },t, [0.4, 0.8])) is compatible with Sexm . Every point of the polytope P(Sexm , id,t) ∩ Q({p1 , p2 }, 0.4, 0.8) corresponds to a satisfying interpretation of the SPOT database Sexm ∪ (id, {p1 , p2 },t, [0.4, 0.8])). If this intersection is empty, then there are no satisfying interpretations. Thus, the existence of point (v p1 , v p2 , v p3 ) = (0.3, 0.4, 0.3) (shown as point (0.3, 0.4) in the projections on the SPS space [0, 1]{p1,p2 } in Figures 2 and 3) contained in both Q({p1 , p2 }, 0.4, 0.8) and P(Sexm , id,t) implies the existence of a v¯ satisfying both Sexm and the query conditions, and so id,t is an optimistic answer. On the other hand, checking whether id,t is a cautious answer to query Q means checking whether (id, {p1 , p2 },t, [0.4, 0.8])) is entailed by Sexm , which is equivalent to checking whether the polytope P(Sexm , id,t) is contained in Q({p1 , p2 }, 0.4, 0.8). This containment relationship does not hold in our running example, as
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
315
the point (v p1 , v p2 , v p3 ) = (0.2, 0.1, 0.7) (shown as point (0.2, 0.1) in the projections in Figures 2 and 3) belongs to P(Sexm , id,t), but not to Q({p1 , p2 }, 0.4, 0.8), meaning that id,t is not a cautious answer. For an arbitrary region r ⊆ Space and point v in SPS space, we define the probability mass of v for r as pr(v, r) = ∑ p∈r v p . As an example, the probability mass of v¯ with v¯ p1 = 0.3, v¯ p2 = 0.4, and v¯ p3 = 0.0, belonging to polytope P(Sexm , id,t) of Fig. 2, is pr(v, ¯ r) = 0.3 + 0.4 = 0.7 for the region r = {p1 , p2 }. For a convex region R in SPS space and a region r in Space, we define in f (R, r) = {v ∈ R such that pr(v, r) is minimum }, and sup(R, r) = {v ∈ R such that pr(v, r) is maximum }. In the following, we will write pr(in f (R, r)) (resp., pr(sup(R, r))), to denote the minimum (resp., maximum) value of the probability mass in R for r. Example 9. For the polytope P(Sexm , id,t), in f (P(Sexm , id,t), {p1 , p2 }) consists of the point (v p1 , v p2 , v p3 ) = (0.2, 0.1, 0.7) whereas sup(P(Sexm , id,t), {p1 , p2 }) consists of the set of points representing the face of the polytope containing the slanting line segment depicted in Fig. 2. Moreover, pr(in f (P(Sexm , id,t), {p1 , p2 })) = 0.3 and pr(sup(P(Sexm , id,t), {p1 , p2 })) = 0.8. A set of points in n-dimensional space defines a convex region called its convex envelope. In our case, we have a set V = {v1, . . . , vk} of k points in [0, 1]Space (SPS space). The convex envelope of V is the polytope created by drawing lines between the members of V . Formally: convEnv(V) = { v¯ ∈ [0, 1]Space | ∃α1 , . . . , αk , αi ∈ [0, 1], k
k
i=1
i=1
∑ αi = 1, such that ∀p ∈ Space, v¯ p = ∑ αi vi p}.
This is a convex region in SPS space. If V = {(0.2, 0.1), (0.5, 0.1), (0.5, 0.3), (0.2, 0.6)} is a set of points in [0, 1]{p1,p2 } , then convEnv(V) is the polytope shown in Fig. 2. The following theorem shows how to check if a convex region intersects with (or is contained in) a query’s polytope in SPS space. It states that these two relationships can be checked by considering only the convex envelope of an appropriate set of points in the region. Furthermore, it suffices to check only the relationship between two probability intervals. Theorem 2. For a convex region R and query region Q(q, , u), all in [0, 1]Space , i) 1. 2. 3.
the following are equivalent: R ∩ Q(q, , u) = 0/ convEnv({in f (R, q) ∪ sup(R, q)}) ∩ Q(q, , u) = 0/ [pr(in f (R, q)), pr(sup(R, q))] ∩ [, u] = 0. /
316
ii) 1. 2. 3.
F. Parisi et al.
the following are equivalent: R ⊆ Q(q, , u) convEnv({in f (R, q) ∪ sup(R, q)}) ⊆ Q(q, , u) [pr(in f (R, q)), pr(sup(R, q))] ⊆ [, u].
Example 10. Consider the convex region P(Sexm , id,t) of Example 6 and the query region Q({p1 , p2 }, 0.4, 0.8) of Example 7, whose projections on the SPS space [0, 1]{p1,p2 } are shown in Fig. 2 and Fig. 3, respectively. We have already shown in Example 8 that the intersection between P(Sexm , id,t) and Q({p1 , p2 }, 0.4, 0.8) is not empty. In Example 9, we defined sets in f (P(Sexm , id,t), {p1 , p2 }) and sup(P(Sexm , id,t), {p1 , p2 }). The projection on the SPS space [0, 1]{p1 ,p2 } of in f (P(Sexm , id,t), {p1 , p2 }) is the point (0.2, 0.1), whereas that of sup(P(Sexm , id,t), {p1 , p2 }) consists of the set of points belonging to the slanting segment depicted in Fig. 2. Region convEnv({in f (P(Sexm , id,t), {p1 , p2 }) ∪ sup(P(Sexm , id,t), {p1 , p2 })}) is a polytope in the SPS space [0, 1]{p1 ,p2 ,p3 } whose vertices are the points (0.2, 0.1, 0.7), (0.5, 0.3, 0.2), (0.2, 0.6, 0.2) (shown as points (0.2, 0.1), (0.5, 0.3), (0.2, 0.6) in the projection in Fig. 2). It is easy to see that this polytope and Q ({p1 , p2 }, 0.4, 0.8) have a non-empty intersection (for instance, point (0.3, 0.4, 0.3) belongs to both polytopes). Finally, as [, u] = [0.4, 0.8] and, as shown in Example 9, pr(in f (P(Sexm , id,t), {p1 , p2 })) = 0.3 and pr(sup (P(Sexm , id,t), {p1 , p2 })) = 0.8, condition i)3) is satisfied. As an example for the equivalence of items for ii) consider the following. As shown in Example 8, polytope P(Sexm , id,t) is not contained in Q ({p1 , p2 }, 0.4, 0.8). Clearly, polytope convEnv({in f (P(Sexm, id,t), {p1 , p2 }) ∪ sup(P(Sexm , id,t), {p1 , p2 })}), described above, is not contained in the polytope Q({p1 , p2 }, 0.4, 0.8), since, for instance, point (v p1 , v p2 , v p3 ) = (0.2, 0.1, 0.7) (shown as point (0.2, 0.1) in the projection in Figures 2 and 3) belongs to P(Sexm , id,t) but not to Q({p1 , p2 }, 0.4, 0.8). Finally, the largest probability mass interval [pr(in f (P(Sexm , id,t), {p1 , p2 })), pr(sup (P(Sexm , id,t), {p1 , p2 }))] = [0.3, 0.8] is not contained in [, u] = [0.4, 0.8]. We will use interior and containing regions for the polytope P(S , id,t) to answer cautious selection queries.
3.1 Cautious Semantics The following theorem provides two sufficient conditions that will be exploited for answering cautious queries. Specifically, for a given candidate answer id,t, when one of these two conditions is satisfied, no linear programs like those in Theorem 1
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
317
vp
vp
1
1
(0.18,0.62)
sup(Rint ,q) inf(Rint ,q) (0.27,0.28)
(0.22,0.12)
R int (0.38,0.11) (0.18,0.09)
R con
(0.71,0.09)
vp
vp
2
2
Fig. 4 An example of an internal region
Fig. 5 An example of a containing region
have to be solved. In that case, we say that id,t is pruned via these sufficient conditions. The first condition in the theorem ensures that id,t belongs to the answer, whereas the second condition ensures that it does not. Theorem 3. Let S be a SPOT database and Q = (?id, q, ?t, [, u]) a query. For each id,t in S , let Rint (S , id,t) and Rcon (S , id,t) be two convex regions in SPS space such that Rint (S , id,t) ⊆ P(S , id,t) ⊆ Rcon (S , id,t). i) If Rcon (S , id,t) ⊆ Q(q, , u) then id,t is a cautious answer to Q. ii) If Rint (S , id,t) ⊆ Q(q, , u) then id,t is not a cautious answer to Q. By Theorem 2, the two conditions in the above theorem can be checked by using the intervals IR = [pr(in f (R, q)), pr(sup(R, q))], where R ∈ {Rint , Rcon }. Example 11. Consider the SPOT database Sexm of Example 6 and the query Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]) of Example 7, and the corresponding polytopes P(Sexm , id,t) and Q({p1 , p2 }, 0.4, 0.8), whose projections on the SPS space [0, 1]{p1,p2 } are shown in Fig. 4. Consider the interior region Rint (S , id,t) = convEnv((0.22, 0.12, 0.66), (0.38, 0.11, 0.51), (0.27, 0.28, 0.45)), whose projection on the SPS space [0, 1]{p1,p2 } is represented by the triangle depicted in Fig. 4. As Rint (S , id,t) is not fully contained in Q({p1 , p2 }, 0.4, 0.8), id,t is not a cautious answer to Q. This containment relationship can also be checked by considering the convex envelope of in f (Rint (S , id,t), {p1 , p2 }) and sup(Rint (S , id,t), {p1 , p2 }), which is the segment between the two points indicated in Fig. 4. Also in this case, convEnv({in f (Rint (S , id,t), {p1 , p2 })) ∪ sup(Rint (S , id,t), {p1 , p2 })) is not contained in Q(q, , u). Finally, this relationship could also be verified by checking that [pr(in f (Rint (S , id,t), {p1 , p2 })), pr(sup(Rint (S , id,t), {p1 , p2 }))] = [0.34, 0.55] is not contained in [, u] = [0.4, 0.8].
318
F. Parisi et al.
Example 12. Let Q be the query (?id, {p1 , p2 }, ?t, [0.2, 0.8]). Fig. 5 shows the projection on the SPS space [0, 1]{p1 ,p2 } of both Q({p1 , p2 }, 0.2, 0.8) and P(Sexm , id,t). Moreover, Fig. 5 shows the projection on the SPS space [0, 1]{p1,p2 } of the region Rcon (S , id,t) defined by convEnv((0.18, 0.09, 0.0), (0.71, 0.09, 0.0), (0.18, 0.62, 0.0), (0.18, 0.09, 1.0), (0.71, 0.09, 1.0), (0.18, 0.62, 1.0)). As Rcon (S , id,t) is contained in Q({p1 , p2 }, 0.2, 0.8) and it contains P(Sexm , id,t), the id,t is a cautious answer to Q. This containment relationship can be also checked by considering the convex envelope of in f (Rcon (S , id,t), {p1 , p2 }) and sup(Rcon (S , id,t), {p1 , p2 }), which, in this specific case, equals Rcon . Finally, this relationship can also be verified by checking that [pr(in f (Rcon (S , id,t), {p1 , p2 })), pr(sup(Rcon (S , id,t), {p1 , p2 }))] = [0.27, 0.8] is contained in [, u] = [0.2, 0.8]. We now consider some specific kinds of regions, and give conditions equivalent to that of Theorem 3 which are obtained by rewriting the numeric interval IR in an appropriate way. Hyper-rectangles, also called boxes, are probably the most common objects used to bound regions. For example, minimum bounding rectangles (MBRs) are used in R-trees for spatial indexing [5, 34]. When we consider box regions, the interval IR can be easily related to the box sides. Given SPS space, [0, 1]Space , a box B is a (convex) region defined by the Cartesian product ∏ p∈Space I p where for each p ∈ Space, I p ⊆ [0, 1]. We use (B, p) and u(B, p) to denote the lower and upper bounds of a bounding box B on the dimension corresponding to the point p. Corollary 2. Let S be a SPOT database and Q a query (?id, q, ?t, [, u]). Let Bint and Bcon be two boxes such that Bint ⊆ P(S , id,t) ⊆ Bcon . Also define -
lower(B) = ∑ p∈q (B, p), and upper(B) = ∑ p∈q u(B, p).
Then, if [lower(Bcon ), upper(Bcon )] ⊆ [, u], then id,t is a cautious answer to Q, and ii) if [lower(Bint ), upper(Bint )] ⊆ [, u], then id,t is not a cautious answer to Q.
i)
Example 13. Let Bint be defined by I p1 × I p2 × I p3 , where I p1 = [0.2, 0.4], I p2 = [0.1, 0.4] and I p3 = [0.0, 0.2], and let Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]). It is easy to see that Bint ⊆ P(Sexm , id,t). We have lower(Bint ) = (Bint , p1 ) + (Bint , p2 ) = 0.2 + 0.1 = 0.3 and upper(Bint ) = u(Bint , p1 ) + u(Bint , p2 ) = 0.4 + 0.4 = 0.8. As [lower(Bint ), upper(Bint )] = [0.3, 0.8] is not contained in [, u] = [0.4, 0.8], then id,t is not a cautious answer to Q. Theorem 1 provides an exact method for computing cautious selection by solving two optimization problems for each candidate answer, id,t. On the other hand, Theorem 3 gives us a strategy that can be exploited when the database is not updated frequently. Assume that for each candidate id,t and p ∈ Space, the lower
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
319
and upper bounds (Bint , p), u(Bint , p), (Bcon , p) and u(Bcon , p) for the boxes Bint and Bcon are known. Then Corollary 2 implies that cautious selection can be computed in constant time when one of the two conditions of the corollary apply. Box regions belong to the class of regular polytopes. A non-regular polytope can be easily described by specifying a set of points such that its convex envelope results in the polytope. We use interior regions specified by the convex envelope of a set of points in our experiments. The following corollary identifies pruning conditions for such regions. Corollary 3. Let S be a SPOT database and Q a query (?id, q, ?t, [, u]). Consider two sets V1 ,V2 in SPS space such that convEnv(V1) ⊆ P(S , id,t) ⊆ convEnv(V2). Let: -
smallest(V ) = minv∈V pr(v, q), and largest(V ) = maxv∈V pr(v, q).
i) If [smallest(V2 ), largest(V2 )] ⊆ [, u] then id,t is a cautious answer to Q, and ii) If [smallest(V1 ), largest(V1 )] ⊆ [, u] then id,t is not a cautious answer to Q. Example 14. Let V1 be the set of points {v1 = (0.22, 0.12, 0.66), v2 = (0.38, 0.11, 0.51), v3 = (0.27, 0.28, 0.45)} in the SPS space [0, 1]{p1,p2 ,p3 } . The projection on the SPS space [0, 1]{p1 ,p2 } of polytope convEnv(V1) is represented by the triangle depicted in Fig. 4. It is easy to see that convEnv(V1) is an internal region of polytope P(Sexm , id,t). Consider the query Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]), whose query region q is {p1 , p2 }. Then, pr(v1 , q) = ∑ p∈q v1p = v1p1 + v1p2 = 0.22 + 0.12 = 0.34, pr(v2 , q) = 0.38 + 0.11 = 0.49 and pr(v3 , q) = 0.27 + 0.28 = 0.55. Hence, [smallest(V1 ), largest(V1 )] = [min(0.34, 0.49, 0.55), max(0.34, 0.49, 0.55)] = [0.34, 0.55], and since [0.34, 0.55] ⊆ [0.4, 0.8] then id,t is not a cautious answer to Q (as we already concluded in Example 11 when we considered the triangle depicted in Fig. 4 as a general internal region, not one defined by the convex envelope of a set of points).
4 Multiple Interior Regions Interior regions are used in cautious selection in order to prune those id,t pairs which do not belong to the answer of a given query. Ideally, we would like to use maximal interior, i.e. inscribed, regions in this process, and that is our approach in this section. We consider ellipsoids as the appropriate geometry for inscribing regions because there is always guaranteed to be a unique maximum volume ellipsoid inscribed in a convex polytope [2]. This is not the case for other geometric objects. For instance, there may be more than one maximum volume box inscribed in a convex region (as an example, consider maximum area rectangles in a regular pentagon). Although intuition says that increasing the volume of an inscribed region results in a more effective pruning strategy, the following example shows that this is not always true.
320
F. Parisi et al.
(0.38, 0.42) (0.48, 0.32)
(0.32, 0.15)
(0.2, 0.12)
Fig. 6 Inscribed ellipsoids for cautious selection
Example 15. Recall Sexm from Example 6 and query Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]) from Example 7. Let E1 and E2 be two ellipsoids inscribed in P(S , id,t) such that E1 has maximum volume. The projection into the subspace {p1 , p2 } of E1 and E2 is shown in Fig. 6. The projection of the query region Q({p1 , p2 }, 0.4, 0.8) is represented by the area between the two parallel oblique lines. Fig. 6 also shows the segments I1 and I2 representing, respectively, the projection on SPS space [0, 1]{p1 ,p2 } of polytope convEnv({in f (E1, {p1 , p2 }) ∪ sup (E1 , {p1 , p2 })}) and convEnv({in f (E2, {p1 , p2 }), ∪sup(E2 , {p1 , p2 })}) (see Theorems 2 and 3). It is easy to see that although ellipsoid E2 has a volume smaller than that of E1 , it is associated with a pruning interval [2 , u2 ] = [0.32, 0.8] whose width is greater than that of E1 , i.e. [1 , u1 ] = [0.47, 0.8]. Thus, using E2 results in a more effective pruning strategy for the query Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]). On the other hand, ellipsoid E1 is better than E2 when the query (?id, {p1 }, ?t, [0.4, 0.8]) (or (?id, {p2 }, ?t, [0.4, 0.8])) is considered, since the projection of E1 on axis p1 (or p2 ) is greater than the projection of E2 on the same axes. The above example suggests that the optimality of an inscribed region for pruning depends on the query: an optimal inscribed region may become sub-optimal when the locations in the query change. Moreover, in order to obtain more effective pruning, the quantity to be maximized is the length of the pruning intervals, not the volume of the inscribed regions. Several inscribed regions, can be used together to obtain more efficient pruning strategies. It is worth noting that, this holds for general inscribed convex regions, not only for ellipsoidal ones. The following theorem shows how multiple interior regions, can be used and combined to improve pruning. Theorem 4. Let S be a SPOT database and Q = (?id, q, ?t, [, u]) a query. For each id,t pair, let {R1 , . . . , Rk } be a set of convex regions in SPS space such that Ri ⊆ P(S , id,t) with i ∈ [1..k], and let E be the convex envelope covering R1 , . . . , Rk . Then, E ⊆ P(S , id,t) and the maximum pruning interval for pruning id,t pairs which do not belong to the cautious answer to Q, using {R1 , . . . , Rk }, is
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
321
[pr(inf(E , q)), pr(sup(E , q))] = [min{pr(in f (Ri , q)) | i ∈ [1..k]}, max{pr(sup(Ri , q)) | i ∈ [1..k]} ]. Example 16. Consider Sexm from Example 6, query Q = (?id, {p1 , p2 }, ?t, [0.4, 0.8]) from Example 7, and convex regions E1 and E2 inscribed in P(S , id,t), whose projections into the subspace [0, 1]{p1,p2 } are shown in Fig. 6. The projections into the subspace [0, 1]{p1 ,p2 } of in f (E1 , {p1 , p2 }) and sup(E1 , {p1 , p2 }) are the points (0.32, 0.15) and (0.38, 0.42), respectively, whereas the projections of in f (E2 , {p1 , p2 }) and sup(E2 , {p1 , p2 }) are the points (0.2, 0.12) and (0.48, 0.32), respectively. Then, pr(in f (E1 , {p1 , p2 })) = 0.32 + 0.15 = 0.47, pr(sup(E1 , {p1 , p2 })) = 0.38+0.42 = 0.8, pr(in f (E2 , {p1 , p2 })) = 0.2+0.12 = 0.32 and sup(E2 , {p1 , p2 })) = 0.48 + 0.32 = 0.8, and the interval [ , u ] of Theorem 4 is [0.32, 0.8], which ensures a pruning strategy more effective than that using only one of these convex regions.
5 Computing Interior Regions In this section, we introduce some strategies for incrementally computing interior regions for cautious selection queries performed on a given SPOT database.3 We use two approaches, an inline computation approach, which works naturally with the notion of a convex envelope, and a precomputation approach. Inline Approach. Here, we assume no precomputation of interior polytopes in SPS space. We can iteratively construct and grow such polytopes on successive query operations for a fixed database S . Each time we answer a cautious query via the naive algorithm, we must solve the linear programs in Theorem 1.ii). The solutions thus achieved are points in SPS space contained in the polytope P(S , id,t). We save each solution s¯ in a set Sol[id,t] (storing separate sets for each id,t pair). Each set Sol[id,t] contains only points within the polytope P(S , id,t). Therefore, the convex envelope convEnv(Sol[id,t]) is contained in P(S , id,t), and we can directly apply Corollary 3 to prune future cautious queries. We use the term inline envelope to refer to this technique in implementation and experimentation. Precomputation Approach. Precomputation is preferable to inlining when spare resources are available for precomputation (i.e. off-peak cycles on a cluster), or when one needs the first several queries to run as fast as possible. In precomputation strategies, we compute a convex envelope with k points. To compute the convex envelope with k points for a SPOT database S , we construct k random optimization functions, each of them consisting of the sum of a couple of variables in LC(S , id,t) (Definition 9), then minimize each function subject to LC(S , id,t). This results in k (likely distinct) solutions per id,t, which are points in the polytope P(S , id,t) from SPS space. As such, those k solutions can be placed in a set Sol[id,t] and used to produce a convex envelope for pruning via Corollary 3. We use the term precomputed envelope to refer to this technique in implementation and experimentation. 3
Variants of these techniques are also suitable for answering optimistic selection queries.
322
F. Parisi et al.
6 Bounding via Composite Atoms To create regions in SPS space containing P(S , id,t), we use composite atoms [29] that bound the set of interpretations satisfying a given set of SPOT atoms in SPS space, behaving much like an MBR in classical spatial DBs. Definition 10 (Composite Atom). For a set of pairs of ids and time points, δ = {id1 ,t1 , . . . , idn ,tn }, R a subset of Space (R ⊂ Space), and probability interval [, u], (δ , R, [, u]) is a composite atom. Example 17. Recalling the example database in Fig. 1, the atom ({Phone1, 1, Phone2, 0, Phone2, 1}, R2 ∪ R3, [0.6, 1]) is an example composite atom. A composite atom ca = (δ , R, [, u]) represents a set of SPOT atoms, which we denote: repr(ca) = {(idi , R,ti , [, u]) | idi ,ti ∈ δ }. Since repr(ca) is a set of SPOT atoms, it is a SPOT database, and we can talk of the polytope in SPS space containing interpretations satisfying ca: P(repr(ca), id,t). In many cases, we will abuse notation and say P(ca, id,t) instead of P(repr(ca), id,t). Given a SPOT database S , a composite atom is said to be entailed by that database if P(repr(ca), id,t) ⊃ P(S , id,t) for all id and t. Consider the following example. Example 18. Recall Sexm from Example 6. The composite atom caex = ({id,t, {p1 , p2 }, [0.2, 1.0]) is entailed by Sexm because P(caex , id,t) ⊃ P(S , id,t). The projection of this relationship to {v p1 , v p2 } is shown in Fig. 7.
1.0
0.6
0.2
0.2
0.6
1.0
Fig. 7 P(caex , id,t) from Example 18
We now show how to construct a composite atom entailed by a database S . First we show how to modify the bounds of a composite atom to guarantee entailment.
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
323
Algorithm 1 This function returns a composite SPOT atom car such that every SPOT database that entails both a and ca also entails car . UpdateCSA(ca,a) Let a = (id, r,t, [a , ua ]) Let ca = (δ , R, [c , uc ]). if id,t ∈ δ then Let [ , u ] = combine(ca, a) return (δ , R ∪ r, [min( , c ), max(u , uc )] else return (δ ∪ {id,t}, R ∪ r, [min(a , c ), 1]) end if
Definition 11 (region-merge). Let a = (id, r1 ,t, [1 , u1 ]) be a SPOT atom and ca = (δ , r2 , [2 , u2 ]) a composite atom. ⎧ [max(1 , 2 ), min(u1 , u2 )] if r1 = r2 ⎪ ⎪ ⎪ ⎪ [min(1, + ), min(1, u + u )] if r1 ∩ r2 = 0/ ⎪ 1 2 1 2 ⎪ ⎪ ⎪ if r1 r2 ⎨ [max(1 , 2 ), u2 ] [max(1 , 2 ), u1 ] if r2 r1 Let combine(ca, a) = ⎪ ⎪ [max( , ), min(1, u + u )] if r1 ∩ r2 = 0/ ⎪ 1 2 1 2 ⎪ ⎪ ⎪ ∧(r
0) / ⎪ 1 \ r2 = ⎪ ⎩ ∧(r2 \ r1 = 0) /
As shown in [29], combine(·, ·) ensures that the composite atom’s SPOT interpretation region contains the SPOT atom’s region whenever δ = {id,t}. Proposition 1 ([29]). For a given id,t, composite atom ca = (δ , R, [c , uc ]) and SPOT atom a = (id, r,t, [a , ua ]), if [ , u ] = combine(ca, a) then if id,t ∈ δ , then the SPOT database {a} ∪ repr(ca) entails composite atom (δ , R ∪ r, [min( , c ), max(u , uc )]). - if id,t ∈ / δ , then the SPOT database {a} ∪ repr(ca) entails composite atom (δ ∪ {id,t}, R ∪ r, [min(a , c ), 1]). -
A straightforward application of the above proposition shows how one may construct bounding regions. This is shown as Algorithm 1. We now give an example showing how a composite atom can be created. Example 19. Consider SPOT database Sexm from Example 6. We now construct a composite atom for that database by iterating through all the atoms. a1 = (id, {p1 },t, [0.2, 0.5]) trivially produces the composite atom ({id,t, {p1 }, [0.2, 0.5]), which we call c1 . - Adding a2 = (id, {p2 },t, [0.1, 0.6]) to c1 , we notice that c1 only references id,t, so we may apply combine(c1 , a2 ) to get [0.3, 1] (since {p1 } and {p2 } -
324
F. Parisi et al.
are disjoint), giving the composite atom: ({id,t, {p1 , p2 }, [0.3, 1]), which we call c2 . - Finally adding a3 = (id, {p1 , p2 },t, [0.2, 0.8]), we again notice that c2 only references id,t, and apply combine(c2 , a3 ) to get ({id,t, {p1 , p2 }, [0.2, 1]), which we call c3 . Notice that c3 is the same composite atom as caex from Example 18, and clearly contains P(Sexm , id,t) as shown in Fig. 7. Since a composite atom ca’s polytope P(ca, id,t) is known to contain a SPOT database S ’s polytope P(S , id,t), we can apply Theorem 3 to achieve some pruning with ca. We will discuss the results of applying this technique in the experimental section (Section 7).
7 Experiments We experimentally validated our pruning technique for answering cautious queries on both real and synthetic datasets.
7.1 Algorithms Used Given a SPOT database S and a query Q, our basic pruning techniques for deciding whether an id,t pair is in the cautious answer to Q works as follows. First, we try to prune the id,t pair by testing if one of the (sufficient) conditions in Theorem 3 holds — in this case, appropriate interior and containing regions (namely Rint (S , id,t) and Rcon (S , id,t)) are used. Otherwise, we answer the cautious query by the naive algorithm, that is, by solving linear programs like those in Theorem 1. In our experiments we used interior regions determined by convex envelopes of sets of points. We call this approach convEnv(k), where k is the number of points defining the convex envelope. The elements in convEnv(k) can be produced via the precomputed envelope strategy or the inline approach described in Section 5. When the computation time needed to create the solutions is not included in query time, we refer to the algorithm as convEnv(k). When using the alternative, inline approach, there is no precomputation, but instead computation saved from previous queries provides pruning for future queries. For the containing region, we use composite atoms described in Section 6. We only used one composite atom for each id,t pair. When adding a new atom to the database, we use U pdateCSA (Algorithm 1) to recompute the probability bounds for only the composite atom associated with the new atom’s id,t. Datapoints generated using composite atoms are labelled “CA”. Composite atoms and the convex envelope employ compatible pruning operations that can be stacked: i.e. all id,t pairs not pruned by one method can be checked via the other pruning method before any linear programming. When we do this, we prune both via the composite atoms first and the convex envelope first. When pruning with composite atoms first, these datapoints are labelled “CA and
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
325
convEnv(k)”. When we check the convex envelope first and then the composite atoms, the datapoints are labelled “convEnv(k) and CA”. For comparison, we also ran experiments with the SPOT tree from [29]. SPOT trees use composite atoms to construct an R-tree-like index of a SPOT database, which can be used to speed cautious selection (though we will see that sometimes the SPOT tree incurs a performance hit). As we know of no other work pertaining directly to SPOT atoms, our final algorithm will be that implementing the linear programs described in Theorem 1. These datapoints are labeled “Naive”. Our implementations (including the naive algorithms) use an optimized version PLC (introduced in [29]) of the linear program LC.
7.2 Real World Ship Data We did a batch of experiments using real US Navy ship location data. The database tuples contain lat-long coordinates of various ships at various times given a space of size 180 × 360. To create SPOT atoms from these tuples, we created a 10 by 10 bounding box around the reported latitude longitude locations. A lower bound of 0.3 was chosen to allow at least 3 disjoint atoms to be true at a given time point, while an upper bound of 0.8 was chosen because we suspect that only 80% of the tuples in the database are actually accurate. The database contains 401,986 atoms for 3249 ships at 740 time points providing 131,045 id,t pairs present in at least one atom. Because most subsets of the database with more than 200,000 atoms are inconsistent, we run our tests up to 200,000 atoms. When testing with subsets of the database, we chose a random subset of the appropriate size for each trial. We tested with both small query regions of size 3 by 3 and large query regions of size 100 by 100. All data points are averages over at least 30 trials. Experiment 1: In the first experiment, we examined the inline approach described in Section 5. Using this algorithm, we asked several consecutive queries of the same database. The queries used were with a small query region (3 × 3), a large query region (100 × 100), and intervals of either [0.9, 1] or [0, 0.1]. The results in Fig. 8 show a substantial difference between the first and second consecutive queries, with very little change in running time afterwards. This suggests that the solution computed to solve the first query is in fact the most important solution for pruning. This raises a question for pre-computation approaches: how many solutions should we store when using the convex envelope? To determine a number, we first looked at the fraction of pairs pruned by each of the techniques in Experiment 2. Experiment 2: In this experiment, we measured the fraction of pairs pruned by each technique. The results are shown in Figures 9 and 10, where we record the average fraction of pairs pruned by various techniques for queries with regions of size either 100 × 100 or 3 × 3 and with varying probability bounds in size 10,000 subsets of the real database. The most important point shown by these figures is that if any pruning at all occurs via a convex envelope, then the majority of it occurs when the envelope contains exactly one point: convEnv(1) prunes marginally less than
326
F. Parisi et al. 35 30
seconds
25 20 15 Small [0.9,1] Big [0.9,1] Small [0,0.1] Big [0,0.1]
10 5 0 0
2
4 6 Consecutive Queries
8
10
Fig. 8 Experiment 1: Performance of inlined convex envelope on 10,000 atom subsets of ship-data database
convEnv(2), convEnv(4), or convEnv(8). The intuition behind this fact is that since the effectiveness of our pruning algorithm is proportional to the length of the pruning interval [smallest(Vi ), largest(Vi )], where Vi is the set of points in convEnv(i) (see Corollary 3), which in turn depends on the SPS dimensions occurring in the query region, then extending this interval by adding new points to Vi is not easy. In fact, one should add to Vi some ‘optimal points’ which increase the length of [smallest(Vi ), largest(Vi )] interval to whatever the SPS dimensions of the query region will be — this is not likely to occur since the query region virtually changes every time a new query is issued and there are a lot of SPS dimensions from which those occurring in the query region can be selected. Therefore, if convEnv(1) provides a good pruning interval then the same occurs for convEnv(2), convEnv(4), or convEnv(8) (as shown in Figure 9), otherwise changing convEnv(1) by adding to it few new points is not likely to result in a better pruning (as shown in Figure 10). Other interesting aspects of the data include the fact that while the SPOT tree provides some pruning, in general, it prunes less than any other technique, as well as the fact that composite atoms alone appear quite competitive with any convEnv(k) technique. Experiment 3: We also examined the precomputation time necessary for each convEnv(k). These results are shown in Fig. 11 and are quite reasonable in comparison to previous work: SPOT trees take about 5 hours to create with 50,000 atoms, while convEnv(1) proposed in this paper takes about 3 minutes to perform a (one time only) pre-computation step on a 200,000 size SPOT database. Experiment 4: The notion that convEnv(1) is the most preferable of the convEnv(k) pruning techniques is further supported by Fig. 12, which shows results for cautious queries with a small (3 × 3) query region and bounds of [0.9, 1] for up to 200,000 of atoms from the real-world data. Here convEnv(1) performs slightly better than convEnv(k > 1). This unintuitive result can be explained by the fact that virtually all pruneable id,t pairs are quickly pruned by convEnv(1), and the overhead needed to store the extra solutions in convEnv(k > 1) actually results in a performance hit.
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases Query Region Bounds 100x100 [0.1,1] 100x100 [0.2,1] 100x100 [0.3,1] 100x100 [0.4,1] 100x100 [0.5,1] 100x100 [0.6,1] 100x100 [0.7,1] 100x100 [0.8,1] 100x100 [0.9,1] 3x3 [0.1,1] 3x3 [0.2,1] 3x3 [0.3,1] 3x3 [0.4,1] 3x3 [0.5,1] 3x3 [0.6,1] 3x3 [0.7,1] 3x3 [0.8,1] 3x3 [0.9,1]
k=1 0.925582 0.892893 0.932069 0.914696 0.904393 0.898242 0.931035 0.902663 0.919258 0.999878 0.999939 0.999861 0.999961 0.999830 0.999894 0.999922 0.999965 0.999992
convEnv(k) k=2 k=4 0.925722 0.925821 0.893508 0.893921 0.932467 0.932739 0.915659 0.916322 0.905546 0.906469 0.899104 0.899739 0.931808 0.932486 0.903359 0.903962 0.919368 0.919459 0.999880 0.999882 0.999944 0.999949 0.999867 0.999878 0.999971 0.999979 0.999861 0.999891 0.999933 0.999968 0.999951 0.999968 0.999972 0.999974 0.999992 0.999992
k=8 0.925854 0.894037 0.932846 0.916587 0.906829 0.899981 0.932720 0.904227 0.919495 0.999882 0.999950 0.999880 0.999983 0.999902 0.999979 0.999976 0.999977 0.999992
CA 0.989897 0.979608 0.989130 0.914702 0.904395 0.898289 0.931069 0.902634 0.919218 0.999878 0.999938 0.999858 0.999961 0.999821 0.999894 0.999932 0.999965 0.999992
327 SPOT Tree 0.064390 0.086843 0.057126 0.000000 0.000000 0.000000 0.000000 0.885971 0.903102 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.999117 0.999556
Fig. 9 Experiment 2: The fraction of potential pairs pruned from queries with the given sized region (randomly choosen) and probability bounds asked of the ship dataset by the various techniques
In the results thus far, convEnv(1) has provided nearly the best pruning, has run the fastest, and has been the easiest to generate, so in the following we do our tests with convEnv(1) and not convEnv(k > 1). Experiment 5: Figures 13 and 14 show the speedup obtained by pruning with precomputed data. Testing with a small 3 × 3 region and bounds of [0.9, 1] shows that using any sort of pruning runs nearly instantaneously compared to the naive algorithm, which always takes over 50 seconds with 100 thousand atoms. These graphs also show that for such queries, using the composite atoms (CA) is superfluous – any algorithm using CA takes more time than convEnv(1) only. This is consistent with the data presented in Figures 9 and 10: in those results the CA atom rarely provided more pruning than the convex envelope. However, as shown in the next section, convex envelopes and composite atoms perform nearly identically on a different, artificial data set (see Figures 22 and 21). Experiment 6: Small query regions involve very few points in Space (9 in the 3 × 3 case last considered) and therefore access very few of the dimensions in SPS space. If the points stored in convEnv(k) tend to maximize on only a couple of those dimensions of SPS space, then there is little chance for overlap between the query and the convex envelope. So it is not surprising that convEnv(k) performs best in these experiments and it raises the question of what performance will be like with a larger query region referencing many dimensions in SPS space. We therefore
328
F. Parisi et al. Query Region Bounds 100x100 [0,0.1] 100x100 [0,0.2] 100x100 [0,0.3] 100x100 [0,0.4] 100x100 [0,0.5] 100x100 [0,0.6] 100x100 [0,0.7] 100x100 [0,0.8] 100x100 [0,0.9] 3x3 [0,0.1] 3x3 [0,0.2] 3x3 [0,0.3] 3x3 [0,0.4] 3x3 [0,0.5] 3x3 [0,0.6] 3x3 [0,0.7] 3x3 [0,0.8] 3x3 [0,0.9]
k=1 0.074945 0.077555 0.077350 0.112798 0.079047 0.064181 0.073459 0.070142 0.077447 0.000116 0.000067 0.000065 0.000027 0.000121 0.000087 0.000056 0.000029 0.000015
convEnv(k) k=2 k=4 0.075127 0.075258 0.078040 0.078432 0.077934 0.078430 0.114142 0.115144 0.079989 0.080688 0.065149 0.065846 0.073881 0.074202 0.070660 0.070980 0.077580 0.077685 0.000116 0.000118 0.000078 0.000082 0.000075 0.000079 0.000034 0.000043 0.000168 0.000218 0.000128 0.000142 0.000068 0.000077 0.000040 0.000045 0.000015 0.000015
k=8 0.075304 0.078648 0.078654 0.115521 0.080960 0.066087 0.074330 0.071089 0.077711 0.000118 0.000083 0.000085 0.000044 0.000224 0.000152 0.000081 0.000044 0.000014
CA 0.074917 0.077590 0.077330 0.112836 0.079070 0.064135 0.073459 0.070130 0.077431 0.000116 0.000069 0.000070 0.000023 0.000118 0.000101 0.000054 0.000468 0.000469
SPOT Tree 0.062848 0.065049 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000452 0.000465
Fig. 10 Experiment 2: The fraction of potential pairs pruned from queries with the given sized region (randomly chosen) and probability bounds asked of the ship dataset by the various techniques
350
2000 1600
seconds
1400 1200
convEnv(1) convEnv(2) convEnv(4) convEnv(8) convEnv(16)
300 250 ms
1800
1000
200 convEnv(1) convEnv(2) convEnv(4) convEnv(8) convEnv(16)
800 150
600 400
100
200 50
0 60
80
100
120
140
Atoms (in thousands)
160
180
200
60
80
100
120
140
160
180
200
Atoms (in thousands)
Fig. 11 Experiment 3: The precomputation Fig. 12 Experiment 4: The running time time needed for convEnv(k) of cautious queries using pruning via convEnv(k) for various values of k. Queries asked in these experiments had a small (3x3) region and bounds of [0.9, 1]
also experimented with large query regions sized 100 × 100. These results appear in Figures 15 and 16. While all our algorithms perform worse in this case than when the query region is small and the interval is constant, it still turns out that convEnv(1) performs best.
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases 120
seconds
80 60
120 Naive SPOT Tree CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
100 80 ms
100
60
40
40
20
20
0
0 60
80
100
120
140
160
180
200
5
10
15
20
Atoms (in thousands)
350
200
30
35
40
45
50
Experiment 5: Zoomed view of
20000 Naive SPOT Tree CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
18000 16000 14000 12000 ms
seconds
250
25
Atoms (in thousands)
Fig. 13 Experiment 5: Naive algorithm Fig. 14 compared against other algorithms for cau- Fig. 13 tious queries with real ship data, small query region, and interval [0.9, 1]
300
329
10000
150
8000
100
6000 4000
50
2000
0
0 60
80
100
120
140
160
180
200
5
10
15
Atoms (in thousands)
20
25
30
35
40
45
50
Atoms (in thousands)
Fig. 15 Experiment 6: Cautious query Fig. 16 Experiment 6: Zoomed view of times with real ship data, large query region, Fig. 15 and interval [0.9, 1] 700 600
seconds
500 400
Naive SPOT Tree CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
300 200 100 0 60
80
100 120 140 Atoms (in thousands)
160
180
200
Fig. 17 Experiment 7: Cautious query times with real ship data, large query region, and random interval
Experiment 7: In another experiment, we tested queries with a large 100 × 100 region and a randomly generated interval (an upper bound chosen uniformly from [0, 1], and a lower bound chosen uniformly between zero and the upper bound). The results shown in Fig. 17 corroborate previous results showing that the convex envelope technique works well in a wide variety of situations even with only one precomputed solution.
330
F. Parisi et al.
1600
12 Naive CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
seconds
1200 1000
CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
10 8 seconds
1400
800 600
6 4
400 2
200 0
0 0
0.5
1
1.5
2
2.5
3
0
0.5
1
Millions of Atoms
1.5
2
2.5
3
Millions of Atoms
Fig. 18 Experiment 8: Timing runs with ar- Fig. 19 Experiment 8: Zoomed version of tificial data of density 3, cautious queries with Fig. 18 small regions and an interval of [0.9, 1]
120
600 Naive SPOT Tree CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
ms
80 60
Naive SPOT Tree CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
500 400 ms
100
300
40
200
20
100
0
0 0
50
100
150
200
Density (atoms per (id,t))
Fig. 20 Experiment 9: Cautious selection times with artificially generated data of given density. Queries have a small region and an interval of [0, 0.1]
0
50
100
150
200
Density (atoms per (id,t))
Fig. 21 Experiment 9: Cautious selection times with artificially generated data of given density. Queries have a large region and an interval of [0, 0.1]
7.3 Artificial Data Queries To test the scalability of these techniques, we implemented a method for producing artificial datasets allowing us to generate databases with more than 200,000 atoms. To generate a random SPOT database, we generated a certain number, d, of SPOT atoms for each of a certain number of id,t pairs. We call d the density of the generated database – for comparison, the ship data database had a density of 3. Each randomly generated atom’s rectangle’s width and height are randomly chosen integers between 1 and 10, while the rectangle’s upper left corner was chosen uniformly at random from a 1000 × 1000 space. For the probability interval, a random draw was taken from [0, 1] for u and another random draw was taken from [0, u] for . If a database generated in this way is inconsistent, then we iteratively multiply the lower bounds of the atoms by 0.75 until consistency is achieved. Experiment 8: For checking the scalability of our techniques, we generated databases of density 3 out to 3 million atoms and queried them with a 3 × 3 query region and probability bound of [0.9, 1]. The results are shown in Figures 18 and 19. As before, convEnv(1) performs best.
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases 90
250 Naive SPOT Tree CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
70 60 50
Naive SPOT Tree CA only convEnv(1) only CA and convEnv(1) convEnv(1) and CA
200 150 ms
80
ms
331
40
100
30 20
50
10 0
0 0
50
100
150
200
Density (atoms per (id,t))
Fig. 22 Experiment 9: Cautious selection times with artificially generated data of given density. Queries have a small region and an interval of [0.9, 1]
0
50
100
150
200
Density (atoms per (id,t))
Fig. 23 Experiment 9: Cautious selection times with artificially generated data of given density. Queries have a large region and an interval of [0.9, 1]
Experiment 9: Fig. 20, 21, 22, and 23 show the results of testing on databases with exactly one id,t pair and a given density. These experiments show the running time needed to execute queries over randomly placed small (3 × 3) and large (100 × 100) regions with intervals of [0, 0.1] and [0.9, 1]. A reader will notice that in these graphs, the running time of the Naive method decreases initially and then increases as one might expect. This is an artifact of the generation procedure. When the density is low, the database is likely to be consistent initially, and the generation procedure will not need to multiply the lower bounds by 0.75. However, as density increases, it becomes more likely that the randomly generated atoms are inconsistent and require their lower bounds be multiplied by 0.75 to achieve consistency. The lower the lower bounds on the atoms, the looser the linear constraints, so initially an increase in the number of atoms decreases the running time. In Fig. 20, 21, 22, and 23 the SPOT tree runs slowly — slower even than the naive method. This happens because in these databases there is only one id,t pair with varying density. Much of the pruning power of the SPOT tree lies in its ability to prune multiple id,t pairs with one pass through the SPOT tree. Thus much of the pruning power of SPOT trees is lost in these databases, making it perform worse than even the Naive method. In Fig. 20, and 21, all pruning techniques except the SPOT tree perform equivalently. That is, in these cases, no pruning is achieved by the CA or the convEnv(1) approaches. However, in Fig. 22, and 23, all pruning approaches apart from SPOT trees provide similar speedup over the Naive approach as the density of the SPOT database increases. However, none of these pruning techniques are noticeably faster than the others. Fig. 22 and 21 show convex envelopes and composite atoms performing nearly identically. Overall, these experiments suggest that the convex envelopes with one solution yield the best overall behavior.
332
F. Parisi et al.
8 Related Work There are numerous applications emerging today where companies and other organizations need to predict where a large number of moving objects (cell phones, objects tagged with RFID tags, etc.) will be in the future, when they will be there, and with what probability. Cell phone companies, military organizations, virtually any company with supply chain logistics, and transportation companies all need to make such predictions. The ability to store and query these predictions is significant for various kinds of decision making and planning tasks. Several researchers have attempted to store such predictions. Tao et al [35] develop an indexing structure for spatial probabilistic data designed to solve a more specific problem. They assume that there is a single probability distribution function detailing where an object might be at a given point in time in the entire space and their focus is on optimizing access to that probability density function. We do not assume that such a pdf is available. Their technique specifies regions with particular probabilities at each time point and uses hyperplanes to approximate the evolution of these probabilistically constrained regions between time points. They then store these hyperplanes in an R-tree-inspired U-tree indexing structure. By assuming that there is exactly one pdf that satisfies the database, they are able to efficiently prune with these hyperplanes. Key differences between their work and ours are that they assume the existence of a single pdf while ours allows multiple pdfs and thus yields both an optimistic and cautious semantics. Moreover, because of their assumptions, they are able to use spatial data structures directly, while we need to adapt them to address logical conditions in a space containing pdfs. In [28], methods for dealing with positionally uncertain spatial data are considered. The authors introduce a data model motivated by survey data, where surveyors discover clusters of locations that will have similar error properties: if one point measured in one survey is off by five meters, then other points from the same survey also tend to be off by five meters. As such, the data model associates each point with a cluster, where points in the same cluster have the same error. The model allows only one pdf and the cluster error model does not extend well beyond standard survey data. The authors describe a PrR-tree for storing and querying such data, which uses a rectangular bounding region whose corners are defined via Gaussian distributions. Similarly to the above mentioned work, this method does not allow more than one distribution, nor is it designed to handle non-Gaussian probability distributions. In [23], Lian et al examine group nearest neighbor queries in probabilistic databases. Like the above work, their data model provides one pdf over each object’s locations. They introduce probabilistic group nearest neighbor queries, wherein one is given a set of points and a probability threshold and one returns the set of objects that have minimal aggregate distance to the set of points with a probability under the threshold. They introduce a query answering procedure that indexes the “uncertainty regions”, or regions containing a given object with a given probability, into an R-tree, and uses that data to quickly prune potential answers. Dai et al [8] focus on probabilities for the existence of a given object at a given point without worrying about the possibility of the object being at another point.
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
333
They show how to build an augmented R-tree and use that tree to answer selection queries more effectively than considering probability as an extra dimension in the R-tree. Their work differs from ours most significantly in that they do not allow the possibility of an object being in several different locations with individual probabilities. Other differences are similar to those for [35]. [6] uses a paradigm called “line simplification” to approximate trajectories of moving objects, though their framework does not involve uncertainty. [4] develop a framework to track uncertainty, time, and the pedigree of data, but do not handle spatial information. SPOT databases were developed by the authors in past work [30, 29] to store such predictions without making the assumptions in prior work[35, 8, 6, 4]. While there is substantial work in indexing spatial temporal data without probabilities [31, 36, 20, 1, 32, 15], none of these works address a data model compatible with ours: they suppose no probabilities and model object movement as linear. Further, while these works each approach indexing by making clever and domainappropriate modifications to the R-tree data structure, exploiting the notion of a bounding box, none of them address any possible use for contained interior regions such as the convex envelope technique introduced in this paper. This work builds on a history of excellent work by many in the probabilistic database community. Kiessling and his group [14] develop the DUCK framework for reasoning with uncertainty. They provide an elegant, logical, axiomatic theory for uncertain reasoning in the presence of rules. In the same spirit as Kiessling et al., Ng and Subrahmanian [27] provide a probabilistic semantics for deductive databases — they assume absolute ignorance, and furthermore, assume that rules are present in the system. In an important paper, Lakshmanan and Sadri [21] show how selected probabilistic strategies can be used to extend the previous probabilistic models. Lakshmanan and Shiri [22] show how deductive databases may be parameterized through the use of conjunction and disjunction strategies. Barbara et al. [3] develop a probabilistic data model and propose probabilistic operators. Their work is based on the assumption that probabilities of compound events can always be precisely determined, an assumption valid for few combination strategies. They also assume that all events are independent. Also, as they point out, unfortunately their definition leads to a “lossy” join. Cavallo and Pittarelli [7, 33], [7] propose a model for probabilistic relational databases. In their model, tuples in a probabilistic relation are interpreted using an exclusive or, meaning at most one of the data-tuples is assumed to be present in the underlying classical relation. This is a rather restrictive assumption, and we make no such assumptions. Furthermore, due to the above assumptions, they only propose probabilistic projection and join operations, but the other relational algebra operations are not specified. Dey and Sarkar [11] propose an elegant 1NF approach to handling probabilistic databases. In another work Kifer and Li [18] examine quantitative logic programming and introduce formal semantics for such systems. Other systems from the probabilistic database community also provide insight into probabilistic information reasoning and storage [12, 13, 9, 19, 17]. Lukasiewicz and his colleagues [24, 25] study probabilistic reasoning in logic programming, as does Dekhtyar [10]. However, none of these works explicitly handle space or time.
334
F. Parisi et al.
9 Conclusion In this chapter, we have developed efficient algorithms to execute cautious selection queries against SPOT databases. We have developed the novel concept of an SPS space within which we apply both interior and containing regions. We have shown how finding convex envelopes that are interior regions can lead to significant improvements in performance. When taken in conjunction with containing regions that fully enclose a convex polytope, two kinds of pruning operators are possible in the computation required to answer cautious selection queries. We have conducted detailed experiments on both a synthetic and a real data set showing that the proposed methods significantly outperform previous methods and showing that we can answer cautious queries on SPOT databases containing millions of SPOT atoms in a matter of seconds.
References 1. Agarwal, P.K., Arge, L., Erickson, J.: Indexing moving points. Journal of Computer and System Sciences 66(1), 207–243 (2003) 2. Ball, K.: Ellipsoids of maximal volume in convex bodies. Geometriae Dedicata 41, 241–250 (1992) 3. Barbar´a, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE TKDE 4(5), 487–502 (1992) 4. Benjelloun, O., Sarma, A.D., Halevy, A.Y., Widom, J.: Uldbs: Databases with uncertainty and lineage. In: VLDB, pp. 953–964 (2006) 5. Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: An Index Structure for HighDimensional Data. In: VLBD (1996) 6. Cao, H., Wolfson, O., Trajcevski, G.: Spatio-temporal data reduction with deterministic error bounds. VLDB Journal 15, 211–228 (2006) 7. Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: VLDB, pp. 71–81 (1987) 8. Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic spatial queries on existentially uncertain data. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 400–417. Springer, Heidelberg (2005) 9. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J 16(4), 523–544 (2007) 10. Dekhtyar, A., Dekhtyar, M.I.: Possible worlds semantics for probabilistic logic programs. In: Demoen, B., Lifschitz, V. (eds.) ICLP 2004. LNCS, vol. 3132, pp. 137–148. Springer, Heidelberg (2004) 11. Dey, D., Sarkar, S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339–369 (1996) 12. Eiter, T., Lukasiewicz, T., Walter, M.: A data model and algebra for probabilistic complex values. Ann. Math. Artif. Intell. 33(2-4), 205–252 (2001) 13. Fagin, R., Halpern, J.Y., Megiddo, N.: A logic for reasoning about probabilities. Inf. Comput. 87(1/2), 78–128 (1990) 14. G¨untzer, U., Kiessling, W., Th¨one, H.: New direction for uncertainty reasoning in deductive databases. In: SIGMOD 1991: Proceedings of the 1991 ACM SIGMOD international conference on Management of data, pp. 178–187. ACM, New York (1991), http://doi.acm.org/10.1145/115790.115815
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
335
15. Hadjieleftheriou, M., Kollios, G., Tsotras, V.J., Gunopulos, D.: Efficient indexing of spatiotemporal objects. LNCS, pp. 251–268. Springer, Heidelberg (2002) 16. Hammel, T., Rogers, T.J., Yetso, B.: Fusing live sensor data into situational multimedia views. In: Multimedia Information Systems, pp. 145–156 (2003) 17. Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C., Haas, P.J.: MCDB: a monte carlo approach to managing uncertain data. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 687–700. ACM, New York (2008) 18. Kifer, M., Li, A.: On the semantics of rule-based expert systems with uncertainty. In: Gyssens, M., Van Gucht, D., Paredaens, J. (eds.) ICDT 1988. LNCS, vol. 326, pp. 102–117. Springer, Heidelberg (1988) 19. Koch, C., Olteanu, D.: Conditioning probabilistic databases. In: Proceedings of the VLDB Endowment archive, vol. 1(1), pp. 313–325 (2008) 20. Kollios, G., Gunopulos, D., Tsotras, V.J.: On indexing mobile objects. In: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 261–272. ACM, New York (1999) 21. Lakshmanan, L.V., Sadri, F.: Modeling uncertainty in deductive databases. In: Karagiannis, D. (ed.) DEXA 1994. LNCS, vol. 856, pp. 724–733. Springer, Heidelberg (1994) 22. Lakshmanan, L.V.S., Shiri, N.: A parametric approach to deductive databases with uncertainty. IEEE Trans. on Knowl. and Data Eng. 13(4), 554–570 (2001) 23. Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE Transactions on Knowledge and Data Engineering 20(6), 809–824 (2008), http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.41 24. Lukasiewicz, T.: Probabilistic logic programming. In: ECAI, pp. 388–392 (1998) 25. Lukasiewicz, T., Kern-Isberner, G.: Probabilistic logic programming under maximum entropy. In: Hunter, A., Parsons, S. (eds.) ECSQARU 1999. LNCS (LNAI), vol. 1638, p. 279. Springer, Heidelberg (1999) 26. Mittu, R., Ross, R.: Building upon the coalitions agent experiment (coax) - integration of multimedia information in gccs-m using impact. In: Multimedia Information Systems, pp. 35–44 (2003) 27. Ng, R.T., Subrahmanian, V.S.: Probabilistic logic programming. Information and Computation 101(2), 150–201 (1992), citeseer.csail.mit.edu/ng92probabilistic.html 28. Ni, J., Ravishankar, C.V., Bhanu, B.: Probabilistic spatial database operations. In: Advances in Spatial and Temporal Databases: 8th International Symposium, Santorini Island, Greece (2003) 29. Parker, A., Infantes, G., Grant, J., Subrahmanian, V.S.: Spot databases: Efficient consistency checking and optimistic selection in probabilistic spatial databases. IEEE TKDE 21(1), 92–107 (2009) 30. Parker, A., Subrahmanian, V.S., Grant, J.: A logical formulation of probabilistic spatial databases. IEEE TKDE, 1541–1556 (2007) 31. Pelanis, M., Saltenis, S., Jensen, C.S.: Indexing the past, present, and anticipated future positions of moving objects. ACM Trans. Database Syst. 31(1), 255–298 (2006) 32. Pfoser, D., Jensen, C.S., Theodoridis, Y.: Novel approaches to the indexing of moving object trajectories. In: Proceedings of VLDB (2000) 33. Pittarelli, M.: An algebra for probabilistic databases. IEEE TKDE 6(2), 293–303 (1994) 34. Sellis, T., Roussopoulos, N., Faloutsos, C.: The R+-tree: A Dynamic Index for MultiDimensional Objects. In: VLBD (1987) 35. Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multidimensional uncertain data with arbitrary probability density functions. In: VLDB, pp. 922–933 (2005)
336
F. Parisi et al.
36. Tao, Y., Papadias, D., Sun, J.: The TPR*-tree: an optimized spatio-temporal access method for predictive queries. In: Proceedings of the 29th international conference on Very large data bases-VLDB Endowment, vol. 29, pp. 790–801 (2003)
Appendix A A summary of the most important notations used throughout the chapter is reported in the following table. Table 1 Summary of notations used throughout the chapter Name ID id T t Space p r , u S sa, a S id,t q I I id,t , v¯ v¯ p I Q LC(S , id,t) P(S , id,t) Q(q, , u) R, B pr(v, r) pr(in f (R, r)), pr(sup(R, r))) convEnv(V )
Description set of object identifiers object identifier (in ID) set of time points time point (in T ) set of points in the space point in Space rectangular region (in Space) probability bounds SPOT database SPOT atom (id, r,t, [, u]) portion of the SPOT database S for (id,t) pair query region (in Space) SPOT interpretation restriction of interpretation I to (id,t) pair I id,t for point p set of SPOT interpretations selection query (?id, q, ?t, [, u]) set of linear constraints for S id,t polytope corresponding to LC(S , id,t) polytope corresponding to the query (?id, q, ?t, [, u]) convex region, box probability mass of v for r minimum, maximum value of the probability mass in R for r convex envelope of set of points V
Appendix B This appendix contains proofs of the results stated in the chapter.
Proof of Corollary 1 Proof. We first prove item i). By contraposition. (⇒) Assume that (id, q,t, [, u]) S and P(S , id,t) ∩ Q(q, , u) = 0. / Then, by definition of P(S , id,t) and Q(q, ,u), there is no point v
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
337
in [0, 1]Space such that v is a solution to LC(S , id,t) and ∑ p∈q v p ≥ and ∑ p∈q v p ≤ u hold. Thus, the system of linear inequalities obtained by assembling LC(S , id,t) with the two inequalities ∑ p∈q v p ≥ and ∑ p∈q v p ≤ u has no solution. It is easy to see that such a system is equivalent to LC(S ∪ {(id, r,t, [, u])}, id,t). Hence, by applying Theorem 1, it follows that atom (id, q,t, [, u]) is not compatible with S . (⇐) Assume that P(S , id,t) ∩ Q(q, , u) = 0/ and (id, q,t, [, u]) is not compatible with S . By Theorem 1, LC(S ∪ {(id, r,t, [, u])}, id,t) has no solution. Hence there is no solution to the system of linear inequalities obtained by assembling LC(S , id,t) with the two inequalities ∑ p∈q v p ≥ and ∑ p∈q v p ≤ u, which, in turn, means that there is no point v in [0, 1]Space such that v is a solution to LC(S , id,t) and ∑ p∈q v p ≥ and ∑ p∈q v p ≤ u, i.e., P(S , id,t) ∩ Q(q, ,u) = 0. / We now prove item ii). By contraposition. (⇒) Assume that S |= (id, q,t, [, u]) and P(S , id,t) ⊆ Q(q, , u). Thus, by definition of P(S , id,t) and Q(q, ,u), there is point v in [0, 1]Space such that v is a solution to LC(S , id,t) and either ∗ = ∑ p∈q v p < or u∗ = ∑ p∈q v p > u. As, ∗ ≥ = minimize Σ p∈r v p subject to LC(S , id,t) and u∗ ≤ u = maximize Σ p∈r v p subject to LC(S , id,t), either < or u > u, that is, [ , u ] ⊆ [, u]. Hence, by Theorem 1, it follows that S |= (id, q,t, [, u]). (⇐) Assume that P(S , id,t) ⊆ Q(q, , u) and S |= (id, q,t, [, u]). By Theorem 1, [ , u ] ⊆ [, u], where = minimize Σ p∈r v p subject to LC(S , id,t), u = maximize Σ p∈r v p subject to LC(S , id,t). Thus, there is a solution v of LC(S , id,t) such that Σ p∈r v p is equal to either or u . As [ , u ] ⊆ [, u], v does not satisfy either constraint ∑ p∈q v p ≥ or ∑ p∈q v p ≤ u, which define Q(q, ,u), that is, v is in P(S , id,t) but not in Q(q, , u).
Proof of Theorem 2 Proof. We first prove that subitems of item i) are equivalent, by showing that 1) =⇒ 2) =⇒ 3) =⇒ 1). (i.1 =⇒ i.2). By contraposition, assume that R ∩ Q(q, ,u) = 0/ and convEnv({in f (R, q) ∪ sup(R, q)}) ∩ Q(q, , u) = 0. / Since we are assuming that the intersection between convEnv({in f (R, q)∪sup(R, q)}) and Q(q, , u) is empty, for each point v ∈ convEnv({in f (R, q) ∪ sup(R, q)}), it holds that either ∑ p∈q v p < or ∑ p∈q v p > u. We consider these two cases separately. a) Assume that ∀ v ∈ convEnv({in f (R, q) ∪ sup(R, q)}), ∑ p∈q v p < . Let vi with i ∈ {1, . . . , k} be the set of points in {in f (R, q) ∪ sup(R, q)}. The definition of convEnv({in f (R, q) ∪ sup(R, q)}) entails that, every point v ∈ convEnv ({in f (R, q) ∪ sup(R, q)}) is such that v p = ∑ki=1 αi vi p where αi ∈ [0, 1] and ∑ki=1 αi = 1. By considering αi = 1 and α j = 0 with i ∈ {1, . . ., k} and i = j, we have that vi with i ∈ {1, . . . , k} is in convEnv({in f (R, q) ∪ sup(R, q)}). Hence, each vi is such that ∑ p∈q vi p < . By definition of {in f (R, q) ∪ sup(R, q)}, there is no point w ∈ R such that ∑ p∈q w p > ∑ p∈q vi p with i ∈ {1, . . . , k}, then for each point w ∈ R, it holds that ∑ p∈q w p < , which entails that R ∩ Q(q, , u) = 0. /
338
F. Parisi et al.
b) Assume that ∀ v ∈ convEnv({in f (R, q) ∪ sup(R, q)}), ∑ p∈q v p > u. Reasoning analogously to the previous case, it is easy to see that the definition of convEnv({in f (R, q) ∪ sup(R, q)}) entails that, ∀ vi ∈ {in f (R, q) ∪ sup(R, q)}, ∑ p∈q vi p > u. Moreover, since definition of {in f (R, q) ∪ sup(R, q)} implies that there is no point w ∈ R such that ∑ p∈q w p < ∑ p∈q vi p with i ∈ {1, . . . , k}, then for each point w ∈ R, it holds that ∑ p∈q w p > u, which entails that R ∩Q(q, , u) = 0. / (i.2 =⇒ i.3). By contraposition, assume that convEnv({in f (R, q) ∪ sup(R, q)}) ∩ Q(q, , u) = 0/ and [pr(in f (R, q)), pr(sup(R, q))] ∩ [l, u] = 0. / From definition of probability mass in R for q, it follows that for each point w ∈ R, either ∑ p∈q w p < or ∑ p∈q w p > u. As vi ∈ {in f (R, q) ∪ sup(R, q)} (with i ∈ {1, . . . , k}) are in R, it is the case that either ∑ p∈q vi p < or ∑ p∈q vi p > u. Hence, applying the definition of convEnv({in f (R, q) ∪ sup(R, q)}), it is easy to see that for each point v ∈ convEnv({in f (R, q) ∪ sup(R, q)}), it holds that either ∑ p∈q v p < or ∑ p∈q v p > u, that is, convEnv({in f (R, q) ∪ sup(R, q)}) ∩ Q(q, , u) = 0. / (i.3 =⇒ i.1). By contraposition, assume that [pr(in f (R, q)), pr(sup(R, q))] ∩ [l, u] = 0/ and R ∩ Q(q, , u) = 0. / Then, there is no point v in [0, 1]Space such that v is in R and ≤ ∑ p∈q v p ≤ u, Thus, for each point v ∈ R, either ∑ p∈q v p < l or ∑ p∈q v p > u. Hence, either pr(in f (R, q)) > u or pr(sup(R, q)) < l holds, which means that [pr(in f (R, q)), pr(sup(R, q))] ∩ [l, u] = 0. / We now prove that subitems of item ii) are equivalent, by showing that 1) =⇒ 2) =⇒ 3) =⇒ 1). (ii.1 =⇒ ii.2). By contraposition, assume that R ⊆ Q(q, , u) and convEnv({in f (R, q) ∪ sup(R, q)}) ⊆ Q(q, , u). Then, there is at least a point v in [0, 1]Space such that v is in convEnv({in f (R, q)∪sup(R, q)}) and either ∑ p∈q v p < or ∑ p∈q v p > u. It is easy to see that convEnv({in f (R, q)∪sup(R, q)}) ⊆ R. Hence, there is at least a point v in R such that either ∑ p∈q v p < or ∑ p∈q v p > u, which entails that R ⊆ Q(q, , u). (ii.2 =⇒ ii.3). By contraposition, assume that convEnv({in f (R, q) ∪ sup(R, q)}) ⊆ Q(q, , u) and [pr(in f (R, q)), pr(sup(R, q))] ⊆ [, u]. Thus, either pr(in f (R, q)) < or pr(sup(R, q)) > u. If pr(in f (R, q)) < , then, from definition of probability mass in R for q, it follows that for each v ∈ in f (R, q), ∑ p∈q v p < , that is, v is not in Q(q, , u). However, as in f (R, q) ⊆ convEnv({in f (R, q) ∪ sup(R, q)}), v is in convEnv({in f (R, q) ∪ sup(R, q)}). Analogously, if pr(sup(R, q)) > u, then for each v ∈ sup(R, q), ∑ p∈q v p > u, that is, v is not in Q(q, , u). However, since it is in sup(R, q), it is in convEnv({in f (R, q) ∪ sup(R, q)}). (ii.3 =⇒ ii.1). By contraposition, assume that [pr(in f (R, q)), pr(sup(R, q))] ⊆ [l, u] and R ⊆ Q(q, , u). Then, there is a point v in [0, 1]Space such that v is in R and either ∑ p∈q v p < or ∑ p∈q v p > u. If v is such that ∑ p∈q v p < , then pr(in f (R, q)) < . If v is such that ∑ p∈q v p > u, then pr(sup(R, q)) > u. Hence, [pr(in f (R, q)), pr(sup(R, q))] ⊆ [l, u].
Scaling Cautious Selection in Spatial Probabilistic Temporal Databases
339
Proof of Theorem 3 Proof. We first prove item i). By contraposition, assume that Rcon (S , id,t) ⊆ Q(q, l, u) and id,t is not a cautious answer to Q. By applying Corollary 1, since S |= (id, q,t, [, u]), then P(S , id,t) ⊆ Q(q, l, u). As P(S , id,t) ⊆ Rcon (S , id,t), it follows that Rcon (S , id,t) ⊆ Q(q, l, u). We now prove item ii). By contraposition, assume that Rint (S , id,t) ⊆ Q(q, l, u) and id,t is a cautious answer to Q. By Corollary 1, since S |= (id, q,t, [, u]), then P(S , id,t) ⊆ Q(q, l, u). As Rint (S , id,t) ⊆ P(S , id,t), it follows that Rint (S , id,t) ⊆ Q(q, l, u).
Proof of Corollary 2 Proof. To see this, we first note that ∑ p∈q (B, p) = pr(inf(B, q)) and ∑ p∈q u(B, p) = pr(sup(B, q)), since these are the minimum and maximum values that can assigned to the region q in bounding box B. Part (i) then follows then from part (ii) of Theorem 2, where have that B ⊆ Q(q, , u) is implied by the assumed condition: [pr(inf(B, q)), pr(sup(B, q))] ⊂ [, u]. We then apply part (i) of Theorem 3 to get that id,t is in the cautious answer to Q. Part (ii) follows similarly: apply Theorem 2 to get that B ⊆ Q(q, , u) is implied by the assumed condition: [pr(inf(B, q)), pr(sup(B, q))] ⊂ [, u]. We then apply part (ii) of Theorem 3 to get that id,t is not a cautious answer to Q.
Proof of Corollary 3 Proof. To see this, we first note that smallest(V ) = pr(inf(convEnv(V), q)) and largest(V ) = pr(sup(convEnv(V), q)), since all points in convEnv(V) are linear combinations of members of V . Part (i) then follows from part (ii) of Theorem 2, where have that convEnv(V) ⊆ Q(q, , u) is implied by the assumed condition: [pr(inf(convEnv(V), q)), pr(sup(convEnv(V), q))] ⊂ [, u]. We then apply part (i) of Theorem 3 to get that id,t is in the cautious answer to Q. Part (ii) follows similarly: apply Theorem 2 to get that convEnv(V) ⊆ Q(q, , u) is implied by the assumed condition: [pr(inf(convEnv(V), q)), pr(sup (convEnv(V), q))] ⊂ [, u]. We then apply part (ii) of Theorem 3 to get that id,t is not a cautious answer to Q.
Proof of Theorem 4 Proof. The only points which may be used for pruning given only the regions R1 , . . . , Rk are those from the convex envelope covering R1 , . . . , Rk . Name that convex envelope E . As E is the convex hull of R1 , . . . , Rk that are contained in P(S , id,t), it is the case that E ⊆ P(S , id,t). The greatest interval for pruning candidate answers will be the interval with the lowest lower bound possible in E
340
F. Parisi et al.
and the greatest upper bound possible in E . We now show that mini pr(inf(Ri , q)) is the lowest lower bound possible in E by showing that mini pr(inf(Ri , q)) = pr(inf(E , q)). Since E is convex we know that inf(E , q) is a linear combination of the points inf(Ri , q), i.e. inf(E , q) = ∑i ai inf(Ri , q) for some ai ∈ [0, 1] s.t. ∑i ai = 1. Let j be such that pr(inf(R j , q)) is minimal among all pr(inf(Ri , q)). Since pr(inf(R j , q)) is minimal, we know that pr(∑i ai inf(Ri , q)) = pr(inf(R j , q)), and therefore that pr(inf(E , q)) = pr(inf(R j , q)). Similar logic shows that maxi pr(sup(Ri , q)) = pr(sup(E , q)).
Imperfect Spatiotemporal Information Analysis in a GIS: Application to Archæological Information Completion Hypothesis Cyril de Runz and Eric Desjardin
Abstract. While Geographical Information System (GIS) is a classic in geography, we can denote a growing interest for its use in archæology. This science, dealing with the past, partial discoveries and hypotheses, has to handle spatiotemporal information which is often incomplete and imprecise or uncertain. So, one needs to focus on the management of imperfection. The aim of this chapter is to expose a way to integrate the archaeological knowledge imperfection from the modeling of data to its graphical visualization in a spatiotemporal analysis process. The first goal of our approach is to propose valuated completion hypothesis along the time. In order to obtain it, we use a pattern recognition method derived from the Hough transform in accordance with the chosen data modeling. We apply our method in an archæological GIS devoted to Roman street excavation in Reims.
1 Introduction Archaeology is the science which studies human culture through the recovery, the analysis and the interpretation of material remains and environmental data such as stamps, buildings, etc. Archæologists construct the major part of their information through excavation, drilling, prospecting. Archæological data is obtained at the excavation site scale. Objects such as wall fragments are produced by composition of stratigraphic units. Their thematic, spatial and temporal aspects may be linked together. For instance, in order to determine the activity periods of objects, experts can use either absolute dating methods such as radiocarbon dating [1] or relative dating approaches such as the Harris Matrix [21] Cyril de Runz · Eric Desjardin CReSTIC-SIC, IUT de Reims Chˆalons Charleville, Rue des Cray`eres, BP 1035, 51687 Reims Cedex 2, France e-mail: {cyril.de-runz,eric.desjardin}@univ-reims.fr R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 341–356. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
342
C. de Runz and E. Desjardin
which studies the overlapping, the neighborhood and juxtaposition between objects or stratigraphical units. Thus, archæological information is defined by ground information and by involved interpretations. To exploit their information, archæologists need spatiotemporal analyses at local and larger scale in the road to reconstructing the past. The triplet “function — space — time” [28] is very close to the peer “description — space”, the essential components of geographical information, because the descriptive aspect supports semantic information and could support temporal one. Exploiting this closeness in order to make their analyses, in archæology, it is a classic to use a Geographical Information System (GIS) [5] which is “a powerful set of tools for collecting, storing, retrieving at will, transforming and displaying spatial data from the real world for a particular set of purposes” [4]. Experts use those systems to organize, to handle archæological information and to make spatial analyses. A particularity of archæological information is that it generally does not recover the whole space and especially it does not define the past reality in the eyes of the evolutions over the time. There is a lot of lost information. Excavation information only represents fragments or extracts of a scene in time and space. Moreover, archæological data results from interpretation. Therefore, archæological data are in essence imperfect. The imperfection may be multiple: imprecision due to subjectivity of interpretation processes, uncertainty, incompleteness, ambiguity, etc. It is important to take imperfection into account for storage as well as for spatiotemporal analyses. Considering this framework, as remarked by [12, 18] for geographical information, the quality of information, and so the imperfection of data, should be studied during the storage and the analyses. In spatial context, the literature exposes many modeling approaches [2, 6, 10, 16, 27, 29, 30]. Those approaches use classical uncertainty theories. Archæologists aspire to reconstruct past scenes and their dynamics. In fact, as GIS is “a database system in which most of the data are spatially indexed, and upon which a set of procedures operated in order to answer queries about spatial entities in the database” [31], archæologists use it to store excavation data acquired at a local scale and to make analyses at a larger scale. In this context, it seems pertinent to propose some treatments, handling the spatiotemporal incompleteness of information, which present reconstruction scenarios depending on the studied period and on an hypothesis on past structures. We expose in this chapter a study of archæological data, coming from SIGRem project [7], and their imperfection. The project goal is the data storage at the excavation site scale and the analysis at the city scale of excavation data about Durocortorum (Reims during the Roman period). In this chapter, the used database is BDRues which contains excavation information on Roman street in Reims ; data are defined with orientation, location and activity period. The information of this database is subject to problems of precision, of temporal semantic codification, etc. According to [17, 24], data is modeled using the fuzzy set theory. Moreover, only street sections are excavated, thus information does not represent the complete past situation. The usual shape hypothesis for Roman streets is that
Imperfect Spatiotemporal Information Analysis in a GIS
343
they are linear. In this direction, we propose to exploit the principle of a well-known pattern recognition method in images, the Hough transform (HT) [14, 20, 22], according to the fuzzy set theory. The HT is based on the use of a parametric space represented by an accumulator (defined by shape equation) where data will be projected using a vote procedure. Maxima of this accumulator will present the recognized shape instances. This chapter uses HT principle on the set of archaeological objects. Data are projected in an accumulator for each data aspect (orientation, location and activity period). The accumulators are merged in order to obtain a final 2D fuzzy set that allows us to compute valued completion hypothesis. The following section introduces the SIGRem Program and the fuzzy data modeling. Section 3 exposes the problem of pattern recognition in images and explains the Hough transform. Section 4 proposes a HT adaptation in our context. Section 5 discusses the results.
2 The SIGRem Project According to the fact that Durocortorum1 became the capital of Gallia Belgica2 during the Roman Empire period, the knowledge about this past is essential for the development and the identity of Reims.
2.1 Introduction to the SIGRem Project In the perspective of the promotion and the management of Reims archæological patrimony, the thought process, carried out by the University of Reims ChampagneArdenne, INRAP (National Institute for Preventive Archaeological Research) and the French Culture Ministry, integrates the tool of geo-informatics and takes into consideration the archæological information in the urban and regional analysis. Above the archæological map making of Reims during the Roman Empire, the SIGRem project first goal is to develop a geographical information system to manage archæological knowledge. The SIGRem project stores archæological information collected for more than 20 years. The Figure 1 shows the map of excavation sites exploited in the SIGRem project. This system should propose and present some ad-hoc spatiotemporal analysis tools for the study of excavation data according to time. The data is stored at the excavation site level but the analysis should be done at the city level.
2.2 Description of BDRues We illustrate this chapter using an archæological database storing information about streets of Reims at Roman period and called BDRues. This database is offered to 1 2
Durocortorum was the name of Reims during the Roman Empire period. Gallia Belgica was a Roman province composed by the current southern part of the Netherlands, Belgium, Luxembourg, northeastern France and western Germany.
344
C. de Runz and E. Desjardin
Fig. 1 Map of archaeological excavation sites referenced in the SIGRem project
us by Francois Berthelot (Champagne-Ardenne Department of Archæology of the French Culture Ministry). BDRues data describes the street sections of Durocortorum. Sections are stored as objects characterized by location (georeferenced 2D points), by orientation (an angle) and by activity period (see Table 1). We can note that data is stored as points but that the Roman streets are linear at the city scale and are close to constitute a regular grid at the ancient urban scale.
Table 1 Example of archæological data in BDRues ID Location Xa
Location Ya
Orientationb
Activity Period
12 723325 14 723240 15 723070
174361 174780 174730
30 120 120
Gallo-Roman 3rd,4th Centuries AD High Empire
a b
Lambert II extended. Degree.
Imperfect Spatiotemporal Information Analysis in a GIS
345
Moreover, we observe that temporal information (activity periods) comes, in its major part, from (relative) interpretation and that location is defined by either absolute or relative methods. Due to tool resolutions and flexible temporal definitions, the BDRues data is imprecise. Moreover, due to the fact that we only have information on fragments of streets, the BDRues data also is spatially incomplete.
2.3 BDRues and Imperfection: Construction of BDFRues In fact, the fieldwork is an interpretation step. During this step, orientation and position (street centers in BDRues) are estimated according to materials (gutter, orientation of stones, house streetside foundation, etc.). Furthermore, activity periods are often obtained using a relative chronology in which the beginning and the end are not clearly defined. The dating of an object could be specified in accordance with its materials or with neighbourhood artifacts, such as coins found in stratigraphic units. Thus, all the components of BDRues data are subject to imprecision and incompleteness. We want to model and analyzse the BDRues information according to its imperfection (imprecision). According to [17, 24, 37], some convex and normalized fuzzy models are used for locations, activity periods and orientations. Those models are presented in Figure 2 and are inspired for fuzzy geometric models by [15] and for temporal fuzzy representation by [13, 26].
a: f Locep
b: f Orienep
c: f Dateep
Fig. 2 Fuzzy models for location, orientation and date of an excavation data ep
Using those three fuzzy models, we store data in the BDFRues database. In archæology, excavation objects usually are parts of bigger objects (for example BDFRues objects are fragments of streets). The bigger object shape can be simple (line, circle, ellipse, etc.), such as the linear aspect of Roman streets, or more complex as river shapes. In the field of image processing, some methods allow us to recognize shapes in images.
3 Pattern Recognition in Images Using the Hough Transform Some pattern recognition methods allow to recognize simple shapes or natural shapes in images. These methods generally use images resulting from an hedge
346
C. de Runz and E. Desjardin
T
r
r
T
Fig. 3 Hough transform principle
detection (gradient, Canny-Deriche. . . ). Thus they work with binary images, and really only deal with those extracted points. A point of interest (POI) can be viewed as a 2D data, and those not in the background form a 2D dataset. In a GIS, a GDB stores only important data with their coordinates, building a 2D dataset for the spatial situation. Thus, the use of a pattern recognition method becomes natural but all of them are not really efficient with imperfect information. In the pattern recognition field, one of the most powerful methods to detect geometric shape is the Hough transform (HT). This method proposed by Hough in 1962 [22] for straight lines detection was adapted to imperfect images such as the Fuzzy Hough Transform proposed in [20].
3.1 Hough Transform The Hough transform, introduced in [22], is a classic for shape recognition in images. Although it was initially used to detect straight lines, it was generalized to geometrical shape [14] and extended to complex shapes [3]. Illingworth and Kittler in 1988 [23] and Leavers in 1993 [25] propose surveys of HT. Bonnet in 2002 [3] proposed an unsupervised generalization of the HT. The principle of the method is to map POIs of an image in a parametric space defined by the equation of the wanted shape. POIs are projected in an accumulator that is defined on a discretization of the parametric space. An accumulator cell value represents the credit given to the cell associated shape in the image. In order to calculate this credit in the accumulator, each POI votes in all cells corresponding to an instance of the wanted shape going through the POI. The vote consists in an increment of cell values. Bigger the number of POI voting in a cell is, bigger the credit of the shape presence in the image is. Let us take, for instance, the case where the wanted shape is “straight line” in an image. The polar equation of a line L is: L : ρ = x × cos θ + y × sin θ , where (x, y) are the coordinates of a point of L.
Imperfect Spatiotemporal Information Analysis in a GIS
347
The parametric space is a 2D space where each straight line of the image is represented by its polar coordinates (ρ , θ ). Each POI p votes in the accumulator Acc for each cell (ρ , θ ) associated to a straight line going through p. Thus, the credit of all lines potentially going through p increases (Figure 3). If lines are present in the image, we can detect them by finding cells with greatest scores. In our context, the desired recognition methods must deal with incomplete and imprecise data. Han et al. [20] propose with the fuzzy Hough transform (FHT) to detect lines with noise, or quantization errors. FHT takes in consideration points and their neighborhood.
3.2 Fuzzy Hough Transform In order to take into consideration the image imperfection, Han et al. introduced the fuzzy Hough transform (FTH) in [19, 20]. This method considers POIs and their neighborhood using distributed voting principle. A neighborhood matrix is defined as a n × m matrix K where K(0, 0) corresponds to the weight in the center of K. Usually n and m are equal and small (equal to 3) in order to reduce the computation time. A point (i , j ) is in the neighborhood of a POI (i, j) iff i ∈ [i − n/2; i + n/2] and j ∈ [ j − m/2; j + m/2]. For each POI (i, j), each point (i , j ) in its neighborhood will vote in the accumulator. The vote value of (i , j ) is computed using the convolution principle, i.e. it corresponds to K(i − i, j − j). K could be defined using a fuzzy set. In this case, the used membership function defined on the neighborhood can be compute using a non negative function w(r) of distance r between the neighborhood point to the studied POI. Han et al. propose to set w(0) = 1 and w(r) decreasing when r increases. Thereby, each POI becomes a fuzzy point with characteristics depending on its quality information (see section 2.3). The figure 4 illustrates the interest of the FHT for line recognition. Since [20], the fuzzy Hough transform was adapted or optimized. For instance, in order to detect shape instances in gray-scale image, [33] proposes that the value of the vote of each image point is obtained using a fuzzy set defined on gray-scale. The analysis proposed in this chapter adapts the FHT (as view in [20]) in order to obtain valued spatiotemporal completion hypotheses of Reims street structure.
POI
Recognized line (HT)
Fig. 4 Illustration of straight line detection using HT and FHT
Recognized line (FHT)
348
C. de Runz and E. Desjardin
4 On the Valued Hypothesis Building The function of the searched objects defines the spatial shape to recognize. If the query must run on data representing a Roman amphitheatre, the shape to recognize using FHT is an ellipse or a circle. For the Roman streets, the shape is a line. The fuzzy Hough transform allows us to use fuzzy data which is essential to analyze spatial vague objects but it cannot be directly used in spatiotemporal analysis processes. The temporal aspect gives us information on spatial configuration evolution. Indeed, an object should be considered during its presence period as, for instance, the activity period of an archæological object. Due to those considerations and according to a reference date (represented with a fuzzy set), we propose to use the Hough accumulator principle on spatiotemporal information as follows: 1. we build an accumulator on each information component using its fuzzy representation, at least on the spatial and on the temporal components. The value of the vote in each accumulator depends on the studied component fuzzy set value and, for the temporal aspect, on the reference date; 2. we normalize the accumulators and merge them in a fusion set using a weighted mean, in order to value the influence of each component; 3. we select in the fusion fuzzy set the domain values with a criterion, for instance using an a-cut and we visualize this selection through a shape in a GIS.
4.1 Accumulators’ Building In a spatiotemporal context, the analysis allows to estimate the possible presence of an object and thus proposes some reconstruction hypothesis. We have to reduce the vote possibilities to the objects resulting of the combination between the fuzzy components of each object. Thus, the neighborhood of an object is defined by the support of its fuzzy localization. The only possible objects are the ones going through a point in the neighborhood and for which the shape is plausible. The idea is to build an accumulator for each component according to the fuzzy Hough transform principle. The spatial fuzzy representations of database objects determine the set of voting points, the set of potential shapes and the value of the votes for the spatial aspect. The values of votes in the accumulator corresponding to the temporal component are obtained using a temporal relation (as those seen in [26] or as the anteriority index [11]) between the temporal fuzzy model and the given date.
4.2 Accumulator Aggregation The information carried by each of the accumulators generated by the FHT step is different. To allow us to visualize data, a fusion of the accumulators is necessary. This step is devoted to the choice of the merging function and its application.
Imperfect Spatiotemporal Information Analysis in a GIS
349
First of all, classical fusion functions merge homogeneous quantities. Normalized by their maximum, the accumulators can be seen as fuzzy sets. Thus, these fuzzy sets can be merged. [32] and [34] present reviews of traditional aggregation and fusion operators. Classical operators as t-norm, t-conorm in fuzzy context (like Zadeh Min and Max) consider that the order of data is not important (symmetric function). Furthermore, t-norm and t-conorm admit a neutral value and an absorbent element. This implies that a value of at least one of the fuzzy set memberships could be deleted during the fusion process. For example, if we have N accumulators A1 , . . . , AN , for a pair (ρ , θ ), with the t-conorm if Ai (ρ , θ ) = 1 then the value of the obtained fuzzy set for (ρ , θ ) is 1. If the feature roles are different, a fitness function defined to obtain a weighted mean (also called WAA) of the normalized accumulators could be a good choice. [35] and [36] expose a study on mean type aggregation. We choose this approach in our application in order to allow users to tune the process in accordance with their goals. Thus, the choice of the fusion function depends on the application goals. The aggregation result is a fuzzy set called Final.
4.3 Visualization The spatiotemporal context and its imperfection imply to propose both classical selection process using in HT and ad-hoc ones for particular applications. In classical HT, the shape extraction consists in selecting the maxima. The use of this method in queries allows to only visualize the most pertinent results. If the goal is to obtain areas of high potentiality, the use of an α -cut could be a solution. If desired by experts, we reduce shape instances to segments with length depending on their real associated excavation points. It is specially meaningful in the case of Roman streets which are linear segments. The goal of this analysis is to display the potential shape according to a fuzzy temporal relation. As explained during the methodology description, applying an α -cut of Final to select the lines allows to obtain areas of potential streets. Higher the possibility of presence is, darker the display of the line is. The generated map can be used as a new layer in a GIS. The tool proposed in this application was applied to estimate maps according to periods defined by users. The queries results, for either the Third Century AD or the Fourth and the Fifth Centuries AD in Reims, allow to estimate the potential street areas in this period. In conclusion, choices of the temporal relation, the merging function, the selection function depend on the application goals. The results are visualized through a layer of GIS. The next section is devoted to an application of our methodology to Roman street estimation according to a reference date.
350
C. de Runz and E. Desjardin
Roman street excavation data Localisation x y
Orientation Į
Fuzzy localisation fLoc_ep
Fuzzy Orientation fOrien_ep
FHTLoc
FHTOrien
Date begin
end
Fuzzy Date fDate_ep
FHTDate
Final Fuzzy Set
Visualization in a GIS
Fig. 5 Process of the spatiotemporal analysis based on shape criterion on BDFRues object
4.4 Application to BDRues Data Roman streets had the particularity to be close to lines. Exploited excavation data contains at least the location and usually the date and the orientation for each street trace. Excavation data gives us information on local positions of Roman street sections. We analyze this data to propose fuzzy potential segments of lines using the process exposed in Fig. 5. The first FHT accumulator (FHT Loc) is devoted to the location, the second one to the orientation (FHT Orien) and the last one to the correspondence with a given date (FHT Date). Only the vote values change between the building of the three accumulators. The value of the vote in the accumulator FHT Loc is f Locep (x, y). For FHT Orien, the increment is equal to f Orienep (θ ). To obtain the street for the given period gp, the cell of FHT Date corresponding to the line on vote is increased by max(min(gp, f Dateep)). To generate maps based on fuzzy multi-modal data, we need to aggregate those three FHT. To become fuzzy sets, the three FHT accumulators (FHT Loc, FHT Orien and FHT Date) are normalized by their maximum value. In the tools, each feature corresponds to a different potential and importance. In classical merging, by extending the arithmetic mean, the weighted mean can solve this problem. With this function, the merging function could be simplified as follows: f inal = λ ∗ FHT Loc + μ ∗ FHT Orien + ν ∗ FHT Date,
Imperfect Spatiotemporal Information Analysis in a GIS
a
351
b
Fig. 6 Pre-maps of Reims defined with (λ , μ , ν ) = (5/6,1/12,1/12) during : a - the third century, b - the fourth and the fifth centuries
ab Fig. 7 Pre-maps of Reims defined with (λ , μ , ν ) = (1/12,5/6,1/12) during : a - the third century, b - the fourth and the fifth centuries
where the weights λ , ν and μ are non negative and λ + ν + μ = 1. The membership function of the fuzzy segments decreases with the distance to excavation points which validate the street.
5 Discussion on Results As said in the previous section, the choice of the weights in the aggregation defines different kind of results. In the following, the impact of that parameterization of the query is studied with an α -cut value of 0.95.
5.1 On the Influence of Weight on Results For example in Fig. 6a and in Fig. 6b, the maps are the same and present only one street because the influence of the dates and the orientations is minimized in comparison with the locations. Thus, only the street with the higher value is visualized. These weights associated to this α -cut allow to display the street defined by the highest number of excavation points.
352
C. de Runz and E. Desjardin
a
b
Fig. 8 Pre-maps of Reims defined with (λ , μ , ν ) = (1/12,1/12,5/6) during : a - the third century, b - the fourth and the fifth centuries
a
b
Fig. 9 Pre-maps of Reims defined with (λ , μ , ν ) = (1/3,1/3,1/3) during : a - the third century, b - the fourth and the fifth centuries
When the locations and the dates are minimized comparing to the orientation (Fig. 7a and 7b), the maps also are close for any dates. In this case, the object of the tool is to display the main direction of the streets. If the importance of dates is maximized (Fig. 8a and Fig. 8b), then the maps change according to the reference date but all the street evaluations are the same. Here, the goal is to visualize the influence of the dates in the process. In Fig. 9a and Fig. 9b, a classical mean is applied. Thus, the tool allows to visualize the streets but we could not define which element has the highest value. With the α -cut, using our GDB, the displayed streets are only those where the possibility of the location, the date correspondence and the orientation are maximum. The next section is devoted to the results of the queries defined using our methodology and programmed to estimate the potential presence of streets during the third century and during the fourth and fifth centuries.
5.2 Valued Hypothesis In our examples of application, the weights (λ , μ , ν ) of f inal are empirically evaluated as (3/13, 1/13, 9/13) in the application. This affectation permits to give three
Imperfect Spatiotemporal Information Analysis in a GIS
353
a
b
c
d
Fig. 10 a: A Simulated map of Reims during the third Century; b: A map defined by experts of Reims during the third Century; c: A simulated map of Reims during the fourth and the fifth Centuries; d: A map defined by experts of Reims during the fourth and the fifth Centuries
times more importance to date correspondence compared with location, and three times more importance to location in comparison to orientation. It allows to define areas of streets (because the orientation and the location are minimized) but these areas are more restricted than in Fig. 8 because the location is more important than the orientation. The value of the weight for the dates, which is higher than the value of the weight for the locations, allows to estimate that the displayed streets correspond to the reference date. The comparisons between simulated maps (Fig. 10a and 10c) and maps defined by experts (Fig. 10b and 10d) imply that the queries are interesting for archaeologists. The simulated streets and streets from experts are most of the time similar, some differences are visible. The difference could be explain by the fact that the maps given by experts are old and it is expensive to build a new map each time a new information is added in the GDB. Moreover, when experts draw the map, they evaluate the relevant aspects of each element and do not consider the less interesting. In our tools, the relevant aspects are defined by the accumulators values. The main interest of the tools is to define areas of possible presence of Roman streets according to a reference date and to the stored data. Those areas are pertinent to help archaeologist in their diagnosis. This pre-map proposes street configuration hypothesis for the 3rd Century and 5th Centuries. [9, 8] present the specific
354
C. de Runz and E. Desjardin
construction of the process in the case of archæological knowledge. This kind of map proposes an instantaneous visualization of spatial configuration according to the considered period, and thus this analysis process is a new tool for the study of spatiotemporal phenomena or evolution.
6 Conclusion The aims of this chapter were to expose an analysis method taking into consideration the imprecision. This method build valued hypothesis of past reconstruction that are visualized into a GIS layer. Because our data is imprecise, we model it using fuzzy set. Using those new data, we use the FHT principle in order to visualize some spatial valued configuration according to time. The results of this process are premaps which suggest spatial composition and dynamic hypothesis. This approach manages multi-component data in an archaeological GIS. The method could be view as a complex query where the searched information are defined by a shape. The parameters of the shape model are used in the fuzzy Hough Transform (FHT). The main benefit of this approach is to consider uncertainty, imprecision and incompleteness when querying GIS. If data are defined through fuzzy sets, then we can use the Fuzzy Hough Transform (FHT). The use of FHT leads to define the accumulator as a fuzzy set. Thus the selected values become fuzzy values bringing about a visualization of fuzzy objects. In the application we propose, the selections correspond to the use of classical α -cuts. We use the aggregation of fuzzy sets to merge the accumulators. This kind of query based on the shape modeled through FHT was applied on the geographical database “BDRues” to estimate the potential presence of Roman streets in the city of Reims (France). In the context of BDRues, data are structured by three features: the location, the orientation and the date. Each feature is defined by a fuzzy set leading to three fuzzy accumulators when querying GIS. It allows historians and archaeologists to evaluate, and to confirm their map hypothesis. This new tools helps them for diagnosing the urban management during a period using uncertain data issued from excavations. A generalization for other shapes will be studied in a future work. These visualization techniques may help the GIS users to explore the amount of data by displaying fuzzy objects.
References 1. Arnold, J., Libby, W.: Age determinations by radiocarbon content: Checks with samples of known age. Science 110, 678–680 (1949) 2. Bejaoui, L., B´edard, Y., Pinet, F., Salehi, M., Schneider, M.: Logical consitency for vague spatioteporal objects and relations. In: International Symposium on Spatial Data Quality - ISSDQ 2007 (June 2007) 3. Bonnet, N.: An unsupervised generalized Hough transform for natural shapes. Pattern Recognition 35, 1192–1196 (2002) 4. Burrough, P.A., McDonnell, R.: Principle of Geographical Information Systems. Oxford University Press, Oxford (1998)
Imperfect Spatiotemporal Information Analysis in a GIS
355
5. Conolly, J., Lake, M.: Geographic Information System in Archaeology. Cambridge University Press, Cambridge (2006) 6. De Ruffray, S.: The application of fuzzy logic to school districting: the example of Mozelle. In: Applied Geography Conference, France (2007) 7. De Runz, C., Desjardin, E., Herbin, M., Piantoni, F.: A new Method for the Comparison of two fuzzy numbers extending Fuzzy Max Order. In: Information Processing and Managment of Uncertainty in Knowledge-Based Systems - IPMU 2006, Editions EDK, Paris, France, pp. 127–133 (2006) 8. De Runz, C., Desjardin, E., Piantoni, F., Herbin, M.: Management of multi-modal data using the Fuzzy Hough Transform: Application to archaeological simulation. In: Rolland, C., Pastor, O., Cavarero, J.L. (eds.) First International Conference on Research Challenges in Information Science, pp. 351–356 (2007) 9. De Runz, C., Desjardin, E., Piantoni, F., Herbin, M.: Using fuzzy logic to manage uncertain multi-modal data in an archaeological GIS. In: International Symposium on Spatial Data Quality - ISSDQ 2007 (2007) 10. De Runz, C., Desjardin, E., Piantoni, F., Herbin, M.: Toward handling uncertainty of excavation data into a GIS. In: 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology, Budapest, Hongrie (2008) 11. De Runz, C., Desjardin, E., Piantoni, F., Herbin, M.: Anteriority index for managing fuzzy dates in archaeological gis. Soft Computing - A Fusion of Foundations, Methodologies and Applications 14(4), 339–344 (2010) 12. Devillers, R., Jeansoulin, R. (eds.): Fundamental of Spatial Data Quality. Iste Publishing Company (2006) 13. Dubois, D., Hadj Ali, A., Prade, H.: Fuzzyness and Uncertainty in Temporal Reasoning. Journal of Universal Computer Science 9(9), 1168–1194 (2003) 14. Duda, R.O., Hart, P.E.: Use of the Hough transform to detect lines and curves in pictures. Comm. ACM 15(1), 11–15 (1972) 15. Dutta, S.: Approximate spatial reasoning: integrating qualitative and quantitative constraints. International Journal of Approximate Reasoning 5(3), 307–330 (1991) 16. Fisher, P.: First experiments in viewshed uncertainty: the accuracy of the viewable area. Photogrammetric Engineering and Remote Sensing 58, 345–352 (1991) 17. Fisher, P., Comber, A., Wadsworth, R.: Approaches to Uncertainty in Spatial Data. In: Devillers, R., Jeansoulin, R. (eds.) Fundamentals of Spatial Data Quality, GIS, pp. 43–60. ISTE (2006) 18. Goodchild, M., Jeansoulin, R. (eds.): Data Quality in Geographic Information, from Error to Uncertainty. Hermes (1997) 19. Han, J.H., Koczy, L.T., Poston, T.: Fuzzy Hough Transform. In: Second IEEE International Conference on Fuzzy Systems, vol. 2, pp. 803–808 (1993) 20. Han, J.H., Koczy, L.T., Poston, T.: Fuzzy Hough Transform. Pattern Recognition Letters 15, 648–649 (1994) 21. Harris, E.: Principles of Archaeological Stratigraphy, 2nd edn. Academic Press, London (1989) 22. Hough, P.V.C.: Method and means for recognizing complex patterns. Tech. rep, US 3 069 654 (1962) 23. Illingworth, J., Kittler, J.: A survey of the Hough transform. Information Control 44, 87–116 (1988) 24. Klir, G.J., Yuan, B.: Fuzzy Sets and Fuzzy Logic: Theory and Application. Prentice Hall PTR, Englewood Cliffs (1995) 25. Leavers, V.F.: Which Hough Transform. CVGIP 58, 250–264 (1993)
356
C. de Runz and E. Desjardin
26. Nagypal, G., Motik, B.: A fuzzy model for representing uncertain, subjective and vague temporal knowledge in ontologies. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 906–923. Springer, Heidelberg (2003) 27. Navratil, G.: Modeling data quality with possibility-distributions. In: International Symposium on Spatial Data Quality - ISSDQ 2007 (2007) 28. Rodier, X., Saligny, L.: Mod´elisation des objets urbains pour l’´etude des dynamiques urbaines dans la longue dur´ee. In: SAGEO 2007 (2007) ´ 29. Rolland-May, C.: Evaluation des territoires. Hermes (2000) 30. Shi, W.: Four advances in handling uncertainties in spatial data and analysis. In: International Symposium Spatial Data Quality - ISSDQ 2007 (2007) 31. Smith, T.R., Menon, S., Starr, J.L., Estes, J.E.: Requirements and principles for the implementation and construction of large-scale geographic information systems. International Journal of Geographical Information Systems 1, 13–31 (1987) 32. Smol´ıkov´a, R., Wachowiak, M.P.: Aggregation operators for selection problems. Fuzzy Sets and Systems 131(1), 23–34 (2002) 33. Strauss, O.: Use the Fuzzy Hough transform towards reduction of the precision/uncertainty duality. Pattern Recognition 32, 1911–1922 (1999) 34. Xu, Z.S., Da, Q.L.: An overview of operators for aggregating information. International Journal of Intelligent Systems 18(9), 953–969 (2003) 35. Yager, R.: On mean type aggregation. IEEE Trans. Syst. Man Cybern. 26, 209–220 (1996) 36. Yager, R., Filev, D.: Induced ordered weighted averaging operators. IEEE Trans. Syst. Man Cybern. 29, 141–150 (1999) 37. Zadeh, L.A.: Fuzzy Sets. Information Control 8, 338–353 (1965)
Uncertainty in Interaction Modelling: Prospecting the Evolution of Urban Networks in South-Eastern France* Giovanni Fusco
Abstract. The applications presented in this chapter represent a complete modelling chain, integrating interaction modelling and uncertainty issues. New protocols to extract urban networks from spatial interaction data within a regional space are proposed. Bayesian Networks are later used to produce a model of the evolution of these networks. The results of prospective simulations of the future evolution of regional urban networks are subsequently integrated in a GIS platform to obtain an appropriate cartographic representation. GIS modelling and mapping integrate the probabilistic content of the model results, representing the degree of uncertainty in the knowledge of the future state of every component of the regional system.
1 Introduction Spatial prospective modelling and GIS applications have already become common decision support tools in planning. Modelling applications are particularly needed when decision makers have to deal with the future evolution of their territory. GIS applications are used to pre-process and post-process modelling data, to store and manage these data, to combine them with other spatial data and to propose convenient cartographic outputs. Real-world data and modelled data can thus be integrated within the same GIS application, allowing comparisons between past and future (simulated) states of a geographic system. The ontological difference between the two kinds of data is nevertheless often forgotten. Meta-data specify the quality of real world data (completeness, degree of uncertainty, etc.). Modelled data often lack any appreciation of their quality. More particularly, deterministic models are unable to take data uncertainty into account. New data-driven probabilistic approaches are presently being developed in spatial modelling: neural networks (Pijanowski et al. 2002, 2005), fuzzy methods (Openshaw and Turner 2000), Bayesian networks (Torres and Huber 2002, Fusco 2004, 2008, Jansenns et al. 2006), methods based Giovanni Fusco UMR 6012 ESPACE Université de Nice-Sophia Antipolis 98 Bd Herriot – BP 3209 06204 NICE cedex 3, France e-mail : [email protected] R. Jeansoulin et al. (Eds.): Methods for Handling Imperfect Spatial Info., STUDFUZZ 256, pp. 357–378. springerlink.com © Springer-Verlag Berlin Heidelberg 2010
358
G. Fusco
on evidence theory (Corgne 2004). Modellers can thus develop more sophisticated data structures for model inputs and outputs, integrating uncertainty levels in GIS semantic tables. This will eventually allow decision makers to fully appreciate the contribution of modelling as a tool for exploring the future, making a clear distinction among what is certain, what is probable and what is just possible in the evolution of a geographic system. This chapter is a contribution in this direction, dealing with probabilistic modelling of the evolution of a regional urban system through Bayesian Networks and its integration in a GIS platform. More particularly, the applications of this chapter are derived from a research project which pursued several scientific objectives. The first objective was to define regional urban networks from spatial interaction data. The second was to propose a model of the development of these networks over time. Proposing a trend scenario for the evolution of regional urban networks was the third objective, which had to be achieved by integrating uncertainty in the model and in the results of the application. Finally, we wanted to propose an appropriate cartographic representation of the model results and their uncertainty. These results could only be obtained through a modelling sequence integrating specific modelling modules and GIS. The elaboration of alternative scenarios to the trend scenario, and its assessment in terms of uncertainty, goes beyond the objectives of the research, but should be considered as an important future development. The methodological choices for the different modelling phases will be dealt with in section 2. The case study presented in section 3 will materialize the modelling results and allow a closer look at cartographic representation issues. Conclusions and future developments will be discussed in the last section.
2 Methodological Choices 2.1 Modelling Regional Urban Networks from Spatial Interaction Data The objective of the modeling chain used in this chapter is the determination of urban networks within a regional space and the prospection of its most probable evolution over time. The methodological choices derive from these objectives, as well as from the constraints of the data available for the study area, the Provence-Alpes-Côte d’Azur region in South-Eastern France. The most complete and reliable source of data for the 964 municipalities of the study area is the population census. French census data include flows exchanged among the municipalities in terms of daily commuter trips, daily trips to education facilities and inter-census residential movements (i.e. the number of people having changed address between the two census years). Despite the increasing importance of leisure and commercial trips (not included in the census), daily commuter trips still constitute the most important flows of personal mobility in defining and structuring urban systems (Berroir et al. 2006). They convey information on the daily functioning of the urban systems and are not surprisingly the basis of urban functional areas
Uncertainty in Interaction Modelling: Prospecting the Evolution
359
definitions in many countries (namely in France). In our approach, urban systems will thus be defined as networks of cities strongly connected by commuters’ flows. The search of main urban centers, hierarchically dominating a whole network or part of it, will also be a key element of our approach. By studying the different kind of relationships linking every municipality to these centers, we will try to detect regularities in the functioning of regional networks. Finally, urban networks will be studied in their evolution over time. No trend scenario could be inferred without a precise knowledge of past evolutions within the regional urban networks. The starting point of our modeling chain is the definition of regional urban networks from spatial interaction data through an extension of the dominant flows analysis. First proposed by Nystuen and Dacey (1961) and later developed by several authors (Kipnis 1985, Rabino and Occelli 1997, Berroir et al. 2006), the dominant flows approach aims at extracting the skeleton of an urban network (the dominant flows) from a spatial interaction matrix. Spatial interaction is a key concept in geographical analysis. Flows between spatial units separated by physical distance (possibly time- or cost-distance) can be viewed as a kind of spatial interaction, linked to the activities taking place in the spatial units. Regional space is concerned with different kinds of spatial interaction flows among its centers: commuter flows, tourists’ flows, migration flows, information flows, economic flows … They can be taken into account by a spatial interaction matrix, reporting in quantified measures (e.g. volumes, money, weights) the flows taking place between origins (the rows) and destinations (the columns). The dominant flows approach needs an a priori rank among spatial units reflecting their absolute dimension in terms of mass (population, jobs and total exchanged flows are the most common mass criteria). Exchanged flows are then used to produce a hierarchical network among the spatial units, using the notion of “largest flow”. This notion can have various definitions (Nystuen and Dacey, 1961), such as the largest out-flow, in-flow or total flow. Out-flows are the most commonly used in defining urban networks through commuter flows (Rabino and Occelli 1997, Berroir et al. 2006). According to this approach, a spatial unit is dominated if it sends its largest flow towards a centre of higher rank. Our approach also integrates the significance of the flow (Kipnis 1985, Rabino and Occelli 1997). The maximal flow towards a higher rank unit is dominant only if it is higher than an absolute threshold and a relative threshold (e.g. a certain percentage of emitted flows or of the unit’s population). It is thus possible to extract a primary graph from the spatial interaction matrix, which is made of the dominant flows within the study area. The central hypothesis of the dominant flows approach is that this primary graph extracted from a spatial interaction matrix can be considered as a good representation of a multidimensional hierarchical urban network within a regional space (Nystuen and Dacey, 1961, Rabino et Occelli 1997). From an operational point of view, urban networks are defined through the primary graph. The graph is rarely totally connected. Several spatial units with extremely small populations may be considered as independent by the algorithm as they don’t send
360
G. Fusco
significant flows towards other units. The remaining units can be linked in a tree-like network which has its origin in one dominant centre only and several branches departing from it. Or they can be split in several hierarchical sub-networks dependent on different dominant centers. The networks emerging from the dominant flows are hierarchical ones. We can attribute hierarchical levels to non-dominant spatial units as well. Two criteria can be used. The superiors’ criterion attributes a level to every spatial unit according to the level of the unit which is immediately superior in the network. Second level centres are thus those depending on first level centres (the dominant centres), and so forth (Figure 1a). In the analysis of urban networks this criterion highlights the network proximity to hyper-centres (a suburban municipality depending directly on the metropolitan centre is considered of second order).
Fig. 1 Two Level Assignment Criteria within Urban Networks
The subordinates’ criterion takes into account the ability of every spatial unit to dominate subordinate units. The lowest hierarchical level is assigned to spatial units unable to dominate other municipalities. The directly superior level is attributed to the units dominating those of the lowest level, and so forth, until the last dominant centre is reached (Figure 1 b). The subordinates’ criterion allows different levels among dominant centres, as their level depends on the complexity of the arborescent network that they command. Dominant centres can thus be of first, second, third level, etc. The dominant flows detection and the assignment of hierarchical levels can be implemented through appropriate computer algorithms. Once they are projected in geographical space, the network topologies produced by these algorithms can be used to visualise the spatial extent and the configuration of the urban networks in a regional space. Specific software tools can be conceived in order to obtain appropriate statistical and cartographic outputs for the analysis of the urban networks.
2.2 Modelling the Development of Urban Networks through Bayesian Networks The second phase of our analysis is the research of statistical regularities in the evolution of urban networks. This will eventually lead to building a probabilistic model for the evolution of the urban networks defined by dominant flows. Bayesian Networks (BN) are well suited to detect and organise statistical regularities in
Uncertainty in Interaction Modelling: Prospecting the Evolution
361
a probabilistic model of the study area. Advantages and disadvantages of using other machine-learning techniques will be dealt with in section 3.2. BN are graphical models of causal knowledge in an uncertain context (Pearl 2000, Jensen 2001, Korb and Nicholson 2004). They are a relatively new and powerful tool of knowledge representation, manipulation and discovery, whose capabilities are just beginning to be applied in urban studies (Torres and Huber 2002, Jansenns et al. 2006, Fusco 2004, 2008). A BN couples a graphical formalism with a mathematical one. Graphically, it is made up of circles (representing variables) and directed arcs (representing probabilistic causal links among them). The resulting directed graph (Figure 2) constitutes the network structure and represents qualitative information on causality among variables. The structure of a BN cannot contain loops. Only directed acyclic graphs (DAG) are possible network structures. D A E B C
Fig. 2 The Structure of a Simple BN of Five Variables
The graph is a representation of causal knowledge within a probabilistic framework. Every circle represents a stochastic variable, which can be described through a conditional probability function (a conditional probability table in the case of discrete variables). The numeric values of the probability functions (or tables) constitute the network parameters. The parameters and the structure are tightly linked. The joint probability distribution of the network factorizes according to the structure of the underlying graph. The conditional probability function (or table) of every node depends only on the values of its parent nodes and the total probability of the network is thus obtained as the product of the conditional probabilities of each node given its parents. For the network shown in Figure 2, we have: P(A, B, C, D, E) = P(A) P(B|A) P(C|B) P(D|A,B) P(E|D) where P(B|A) indicates the conditional probability of B, given A. This probabilistic approach to causality is particularly suited for representing the uncertainty affecting relationships among territorial variables. BN modelling allows two kinds of applications. The first is causal knowledge discovery. Knowing observed values of the model variables over a large number of cases, learning algorithms can infer the most probable set of probabilistic links among variables. Learning algorithms can mix data-acquired knowledge and modeller’s hypotheses in order to obtain a model which is both statistically robust and plausible for the domain of investigation.
362
G. Fusco
A second kind of application is system simulation through Bayesian inference. After having defined the causal structure and the probabilistic parameters of the model, the Bayesian Network can produce probabilistic simulations of the model with complete freedom in the choice on input and output variables. Updating a priori probabilities whenever new elements of knowledge are entered in the model, BN allow a rigorous treatment of uncertainty and of its propagation in the model’s results (Jensen 2001, Pearl 2000, Korb and Nicholson 2004). This feature is particularly interesting in scenario building for planning, where modellers have to make careful distinctions between what is probable, likely or just possible.
2.3 Integrating Spatial Models and GIS Tools The dominant flow theory and the BN technique for modelling probabilistic causal relationships are the methodological framework for a new analysis of urban networks within GIS. In order to implement the new analysis protocols we developed ad hoc models externally linked to a GIS platform (Figure 3). The ART model constitutes a modelling platform made up of interlinked modules in itself (Fusco 2009). ART extracts dominant flows from the spatial interaction matrix and calculates statistics for spatial units and dependent basins within the urban networks defined by dominant flows. It also proposes data mapping tools out of the GIS environment. The FRED model (Decoupigny 2007) is a road network model calculating time-distances among spatial units using the Floyd minimum path algorithm. The BN model formalizes the probabilistic relationships existing among descriptive variables of the urban networks. The model can later be used to infer future states for spatial units within the urban networks. A GIS platform is finally used for data storage, maintenance, pre- and postprocessing and for more general data mapping. The GIS platform is an essential element for managing the complex spatial databases necessary to data-driven models such as the ones composing our modelling chain.
ART model Dominant flows extraction, statistics for spatial units and dependent basins
BN model Probabilistic relationship among system variables, inference of future states for spatial units
FRED model Time-distance calculation on the road network
GIS Data storage, maintenance, pre- and postprocessing and mapping
Fig. 3 The General Architecture for Modelling Urban Networks
Uncertainty in Interaction Modelling: Prospecting the Evolution
363
3 The Case Study of Urban Networks in the PACA Region The study area of our research is the Provence-Alpes-Côte d’Azur (PACA) region in South-Eastern France. As the third region in France for population and economic activity, PACA has been affected over the past few decades by the emergence of two distinct metropolitan systems in the coastal area: the metropolitan area of Provence, including Marseilles, Aix-en-Provence and Toulon, and the metropolitan area of the French Riviera around Nice, Monaco and Cannes (Decoupigny and Fusco 2008). The emergence of metropolitan systems is reshaping the flows exchanged between cities, suburban areas, retail and office concentrations, and rural villages. The northern alpine part of the region seems less affected by the emergence of metropolitan systems. The empirical data for the study area are provided by the French Statistical Institute (INSEE) through its population census. Commuter flows among the 964 municipalities of the study area (including the Principality of Monaco) are known for 1982, 1990 and 1999. Data also include flows between the study area and the rest of France.
3.1 The Urban Networks Defined by Commuter Trips in 1990 and 1999 Dominant flows of commuter trips define extremely different network structures within the PACA region (Figure 4). The cartographic outputs used in Figure 4 are network representations of the regional space. Municipalities are thus the spatial units of the analysis. They are conceived as nodes of a network and represented through circles, whose size is proportional to the population, while the colour symbolizes the hierarchical level within the regional networks. Dominant flows among units are the links of a relational network, represented by line segments whose width is proportional to the flow. Dominant flows define several hierarchical clusters of municipalities, depending directly or indirectly (through secondary centres) from a single dominant centre. These clusters can be filtered by different criteria of size and network complexity (Figure 5), to determine urban networks and urban centres. Different sets of parameters have been used. The set of parameters B defines an urban network as a basin of at least 20 000 inhabitants and 10 spatial units. Urban centres must have a population of at least 5 000 and a basin of at least 5 units, 10 000 inhabitants and 2 hierarchical levels. The parameter values were chosen in order to filter both major and minor urban structures within the study area. The more stringent set of parameters A filters only the main urban networks and centres within the PACA region. Polycentric urban networks have several urban centres (a dominant urban centre and one or more secondary centres, as it is the case for the Marseilles network in the western part of the study area). Among the dozen urban networks defined by the set of parameters A, are AixMarseilles, Toulon and Avignon in the West, Nice, Cannes and Monaco in the East, and Gap in the North. The two main metropolitan networks, around Marseilles and Nice, show important morphological differences. The Aix-Marseilles metropolitan
364
G. Fusco
Fig. 4 Hierarchical Networks Defined by Dominant Flows in the PACA Region in 1990 and 1999
area has a complex network structure, favoring the emergence of secondary centers in the hinterland. The network is less articulated along the coast. On the other hand, the metropolitan network around Nice is weakly structured in its hinterland as small villages directly depend on the main metropolitan centre. The network is better relayed along the coast, with the secondary center of Antibes. Networks can be analyzed comparatively between 1990 and 1999 (Figure 6). The main indication of such analysis is the extension over time of the main urban networks. Municipalities still independent in 1990 are progressively linked to other units creating new emerging clusters or, more often, reinforcing existing
Uncertainty in Interaction Modelling: Prospecting the Evolution
365
Fig. 5 Urban Networks and Urban Centres in the PACA Region in 1990 ones. Between 1990 and 1999 more than 60% of dominant flows remain unchanged. On the other hand, the new dominant flows are specifically located. North-East of Marseilles the municipalities of the Durance valley are absorbed in the urban network, often through the attraction of secondary centres (Aix-enProvence, Pertuis). The networks on the French Riviera absorb previously independent municipalities in the alpine valleys as well. Finally, the urban centres of the alpine hinterland gradually absorb new spatial units.
366
G. Fusco
Fig. 6 Urban Networks’ Evolution in the PACA Region 1990-1999
As far as network structures are concerned, hierarchical polycentrism is strengthened in the West around Marseilles and Avignon, each of them disposing of several satellite centres. In the North, polycentrism is less structured as new networks emerge without necessarily being dominated by the urban centre of Gap. The eastern part of the study area is marked by the weaknesses of its polycentrism. New units are added to the urban network of Nice in the hinterland, where no secondary centre emerges. Even the independent networks around Cannes and Monaco, on the coast, lose one hierarchical level, making them less of a counterbalance to the Nice network. The comparative map in Figure 6 shows the permanence of commuters’ dominant flows between the main regional urban centres and their closer influence area. These relationships are strongly anchored in the functioning of the regional system and are kept even if new emerging phenomena reshape other parts of the system. Among these is the increased spatial range of the direct domination of Nice and of the indirect domination of Marseilles and Avignon. A handful of metropolitan centres emerge thus as the pivots of the regional system. For the greatest number of municipalities, the passage from domination to independence is extremely rare, above all in the proximity of main urban centres. Indeed, when a unit falls in the domination of a main urban centre, it will likely still depend on it in the future. This contributes to the increased domination of the main centres. This observation could be nuanced for Marseilles. Its direct domination is relatively stable and even diminishes at its northernmost margins. Its urban network is nevertheless strengthened by the attractiveness of its secondary urban centres. Globally, the changes observed at the municipal level and making up the evolution of the urban networks in the study area are not randomly distributed. They show a certain coherence of behaviour favouring the same outcomes in situations which are relatively similar. On these bases, trend scenarios for the regional system could be established. The spatial information produced by the ART model and
Uncertainty in Interaction Modelling: Prospecting the Evolution
367
stored in the GIS nevertheless allows quantitative analyses going well beyond the qualitative evaluation of regional trends proposed in this section. What we need is a systematic search of spatial rules governing the evolution of the regional urban networks in the last decade in order to build a model of the evolution of these networks. The model could then be used to extrapolate past trends for the following decade, inferring the most probable evolutions for the regional urban networks.
3.2
Modelling the Evolution of the Network between 1990 and 1999
The objective of this section is to develop a model of the evolution of urban networks in the study area in the 1990’s in order to produce a trend scenario for the following decade. This will be done through a new application of BN enabling the integration of uncertainty in the forecasted evolutions of spatial interaction among the municipalities of the study area. From an operational point of view, we first have to extract the rules describing the evolution of the regional urban networks and organise them in a system. This will be done through a causal knowledge discovery application of the BN technique. The production of a trend scenario will be a later phase of the modelling application (system simulation through Bayesian inference). Bayesian probabilistic rules will then allow for integrating uncertainty explicitly, both in the inputs and in the outputs of model simulations. Developing a BN model from real world data for the evolution of regional urban networks is a challenging task concerning the integration of time and space. We tried to take up this challenge through an accurate variable selection. Variable selection is indeed a crucial phase in BN model building. The starting point is the analysis of the evolution over time of parameters describing the situation of every spatial unit within the regional urban networks defined by commuter flows. In this regard, understanding when a spatial unit changes its dependency status (loss of independence, gain of independence, change of dominant unit) is of primary importance. The change of dependency status has to be appreciated as an element within a more complex network of relations among geographic variables. Different hypotheses on the variables susceptible of playing a role in this network have been made. The selected variables (listed in Table 1) can be grouped in several classes: 1. Socio-economic variables of the spatial unit (population, jobs/population ratio) in the first census year (1990). 2. Variables linked to the flows produced by the spatial unit and characterizing locally its spatial interaction within the regional urban networks (hierarchical level, belonging to an urban or a metropolitan network, part of internal flows, of dominant flows, of hierarchical, para-hierarchical and inter-hierarchical flows1, etc.) in the first census year (1990).
1
Hierarchical flows link spatial units along the same branch of an urban network, parahierarchical flows link units belonging to different branches of the same urban network, inter-hierarchical flows link spatial units belonging to different urban networks (Fusco 2009).
368
G. Fusco
3.
4. 5.
Variables characterizing the relationships that the unit establishes with a certain number of remarkable centres in its network neighbourhood (the dominant centre SUP, the nearest potentially superior neighbour NPSN, the nearest potentially superior urban centre NPSUC defined by the parameter set A, the local potential attractor PLA). A synthetic variable characterizes the dependency status of the spatial unit (dependency from NPSN, NPSUC, PLA, from an Other spatial unit or situation of independence). Other variables quantify the distances of the spatial unit to these remarkable centres (using the time-distances calculated by the road network model FRED), as well as their populations, hierarchical levels and direct domination basins in the first census year (1990). Dynamic variables characterising the evolution of the socio-economic factors of the spatial unit between the two census years (1990 and 1999 for our data). Dynamic variables characterising the variation between the two census years of the insertion of the spatial unit in the urban networks as well as of its relationships with the remarkable centres. Two key variables in this group are the binary dependency variation variable and the synthetic variable characterizing the dependency status of the spatial unit in the second census year.
With the exception of a few socio-economic variables which are external variables for the model, spatial interaction is thus at the heart of variable selection. By using both static (situation at a given census year) and dynamic variables (variation between two successive census years) it is also possible to integrate feedbacks in modelling spatial dynamics. BN structures are directed acyclic graphs, where variable feedbacks are forbidden, whereas spatial systems are characterised by the presence of several feedbacks, making their modelling through BN questionable. When the time dimension is added, it becomes nevertheless possible to model situations where variable A depends from variable B at a given time, but the time variation of variable B depends from variable A. The feedback between BÆAÆB is thus modelled by the acyclic structure BÆAÆVariation of B. The variables of table 1, calculated by the ART model and stored in an appropriate geographical information system, constitute a spatial database which can be used by learning algorithms of BN. A JDBC/ODBC link between the GIS and the BN software2 allows for an immediate update of variable values and/or selection. The BN model shown in Figure 7 was generated through different search algorithms, using a Maximum Description Length (MDL) score in order to balance datadriven structure learning and structural complexity of the resulting network. The algorithms also integrated a certain number of hypotheses from the modeller in the form of a search order among groups of variables. Under the constraints of these hypotheses and of score penalization for complex structures, the produced BN model constitutes the most probable set of rules governing the evolution of the urban networks in the PACA region between 1990 and 1999. Different machine-learning techniques could have been used to extract the most probable set of rules describing the evolution of the study area. Neural Networks (Pijanowski 2
The BN model was developed using the software BayesiaLab © 2001-2009 (Bayesia 2009).
Uncertainty in Interaction Modelling: Prospecting the Evolution
369
Table 1 The 29 Variables of the Bayesian Network Model Code
Description
Class
POP
Resident population in census year t1 (1990)
1
TAUX EMPLOI
Jobs / Active Population ratio in census year t1 (1990)
1
H0
Percentage of internal flows in total emitted flows for the spatial unit Percentage of hierarchical dominant flows in total outflows for the spatial unit
2
2
METRONETWORK
Percentage of inter-hierarchical flows in total outflows for the spatial unit Belonging to a metropolitan network (5 possible values : Marseille, Nice, Toulon, Avignon and none)
NIVSUP
Hierarchical level of the spatial unit (defined by superiors’ criterion)
2
HD I
2
2
P
Percentage of para-hierarchical flows in total outflows for the spatial unit
2
PROF BASSIN
Depth of the basin dominated by the spatial unit
2
URBAN NET_A
Belonging to an urban network (defined by parameters set A)
2
DNPSUC
Time-distance to the nearest potentially superior urban centre
3
DSUP
Time-distance to the directly dominant unit The nearest potentially superior urban centre is the dominant centre of the spatial unit (Boolean variable) The nearest potentially superior urban centre (defined by parameters set B) is the dominant centre of the spatial unit (Boolean variable)
3
NPSUCSUP NPSUC_B_SUP POPSUP
3 3
SUP
Resident population of the directly dominant unit Hierarchical level (defined by the subordinates’ criterion) of the nearest potentially superior urban centre Hierarchical level (defined by the subordinates’ criterion) of the dominant unit Dependency status of the spatial unit (5 possible values: NPSN, NPSUC, PLA, Other and * = independent)
3
SUP UC
The dominant unit is an urban centre (Boolean variable)
3
UNLM1NPSUC
Directly dominated units of the nearest potentially superior urban centre
3
UNLM1SUP
Directly dominated units of the directly dominant unit
3
VAR POP
4
SUPt2
Average population growth rate between census years t1 and t2 Variation of the Jobs / Active Population ratio between census years t1 and t2 Dependency status of the spatial unit in census year t2 (5 possible values: NPSN, NPSUC, PLA , Other and * = independent)
VAR DEP
Variation of the dependency status between t1 and t2 (Boolean variable)
5
VAR H0
Variation of the percentage of internal flows between t1 and t2
5
SUBLVNPSUC SUBLVSUP
VAR TAUX EMP
3 3 3
4 5
VAR HD
Variation of the percentage of hierarchical dominant flows between t1 and t2
5
VAR I
Variation of the percentage of inter-hierarchical flows between t1 and t2
5
VAR P
Variation of the percentage of para-hierarchical flows between t1 and t2
5
et al. 2002, 2005) or any combination of decision trees and rule-based classifiers (Jansens et al. 2006, Bauer and Steinnocher 2001, Manoj Kumar et al. 2002) appear at first view suitable alternatives. The characteristics of the problem at hand and considerations on uncertainty propagation made BN the most appropriate modelling technique. What we were looking for was a model representing the main probabilistic links among the 29 variables describing the spatial units and their evolution within the regional urban networks. The modelling goal was to discover these links with as few assumptions as possible. Even if the present text focuses on the prediction of two key variables (see following page), no pre-established input and output variable was
370
G. Fusco
The Complete Model (29 variables)
Fragment concerning the explanation of the binary variable of dependency variation (VAR DEP) and the synthetic variable of dependency status at the second census year (SUP t2)
Fig. 7 The BN Model for the Evolution of Urban Networks between 1990 and 1999
determined for the model. Probabilistic inference was to be possible in any direction among the model variables. Several applications could then follow: the inference of a trend-scenario, the retrodiction of conditions increasing the probability of preferred alternative scenarios, as well as the assessment of possible causes and consequences of any intermediate variable. Moreover, the model should be able to propagate uncertainty, possibly affecting any of its variables. Uncertainty propagation should allow scenario building in several ways. An overall assessment of scenario uncertainty should be possible, as well as information on second or third most probable outcomes for every element of the scenario. Although extremely different in their theoretical foundations, as well as in their advantages and drawbacks, neural networks (NN), decision trees (DT) and rulebased classifiers (RBC) could not satisfy our modelling needs. The main constraint imposed by these techniques on the model would have been its “directionality”. Input and output variables should have been pre-established, reducing the future use of the model in scenario building and exploration. Even if fuzzy approaches can
Uncertainty in Interaction Modelling: Prospecting the Evolution
371
integrate uncertainty in NN, DT and RBC, limits in uncertainty propagation through many variables and association rules are another serious drawback of these modelling alternatives. In this respect, Bayesian Networks are the most appropriate technique for global uncertainty propagation within the model (Pearl 2000, Jensen 2001, Korb and Nicholson 2004). Moreover, neural networks modelling would be of no use in understanding variable interaction within regional urban networks. The black box functioning of NN models make them suitable for short-term model simulation, but much less useful for scenario building. In this respect, DT and RBC produce more easily interpreted segmentation rules. Comparing decision trees, rule-based classifiers and Bayesian Networks, Janssens et al. (2004) suggest nevertheless that BN are better suited to capture the complexity inherent in the interaction of several variables as in real-world systems. BN learning, nevertheless, like other machine learning techniques, depends heavily on selected model variables. Using slightly different variable sets, several models have been developed from the same database. The model which was retained has the best predictive power on two key variables: the binary variable of dependency variation VAR DEP and the synthetic variable characterising the dependency status of the spatial unit at the second census date SUPt2. They are the two most crucial variables for simulating the future evolution of the regional urban networks. Figure 7 shows the structure of the retained BN model, that is the qualitative probabilistic links among variables in the form of directed arcs. Conditional probability tables (as the one in Figure 8) quantify the probabilistic relationships between every variable and its parents. Figure 7 also shows the fragment of the model most directly concerning the explanation of these two variables. This fragment constitutes the Markov blanket of the two variables, which will allow probabilistic simulations on the two key variables, independently from the rest of the network. VAR DEP is thus explained by the combination of values of the population variable (POP), of the part of dominant out-flows for the spatial unit (HD) and of the values of the binary variable NPSUC_B_SUP (ascertaining whether the nearest potentially superior urban centre defined by the parameter set B is the dominant centre for the spatial unit). The highest dependency variation probabilities concern small municipalities (population less than 220) not depending on an urban centre and having less than 63% of dominant out-flows, or medium-sized municipalities (population between 220 and 754) not depending on an urban centre and having less than 40% of dominant out-flows. The evolution of the synthetic variable of dependency status depends pretty logically on the VAR DEP variable and on the synthetic variable of dependency status at the previous census year (SUP). If no dependency variation takes place, the relationship is deterministic (the previous dependency status is always kept). If the dependency changes, an interesting transition matrix emerges (Figure 8). Globally, the most probable issue for a spatial unit which did not depend on the nearest urban centre is to become its subordinate. On the contrary, if the dependency change concerns a unit which previously depended on the nearest urban centre, the most probable issues are either the dependency on the nearest potentially superior neighbour or the independence.
372
G. Fusco
Fig. 8 Conditional Probability Table for Variable SUPt2
The most recent trend in the evolution of the urban networks of the study area is thus the steady growth of the influence areas of the 18 most important urban centres, even if the largest municipalities and those directly depending on a minor urban centre (one of the 14 centres retained only by the parameter set B like Menton, Antibes, Fréjus, Brignoles, Martigues, Arles, Orange or Sisteron) tend to resist the attraction of the largest centres.
3.3 Proposing a Trend Scenario for 2008 The BN model can now be used for probabilistic simulation of the future evolution of the urban networks in the study area. We will make the assumption that the spatial dynamics observed in the 1990’s will be carried on in the following decade. Under this assumption, a trend scenario for the evolution of the urban networks in an eight-to-ten year time span can be obtained by the BN. For every spatial unit, the values of the static variables for 1999 will be entered in the model as elements of certain knowledge3. The values of the dynamic variables between 1999 and 2007-2009 will then be obtained through Bayesian probabilistic inference from the BN. The BN will attribute a probability to every value of the future unknown variables. The most probable outcome for every spatial unit can thus be determined, as well as the second, third most probable outcomes, and so on. More precisely, when the probabilistic simulation concerns several variables, we have to look for the most probable configuration of the unknown variables. This configuration could be composed of variable values which are not always the most probable outcomes for every single variable, but which make up the most probable set of values in the multinomial probability distribution of the unknown variables. What these remarks imply in terms of propagated uncertainty in the simulation results will be seen through an example. Using the fragment of BN constituting their Markov blanket, we will simulate the future values of the two variables VAR DEP and SUPt2, knowing the 1999 values of the other four (POP, HD, 3
The BN model nevertheless allows for entering “soft evidence” in the simulations. We can thus attribute a probability or a likelihood value to the different values of the input variable. The uncertainty in the input variables would then be propagated to the output variables.
Uncertainty in Interaction Modelling: Prospecting the Evolution
373
NPSUC_B_SUP and SUP). The knowledge of the future values of these two variables will result in a new dependency table for 2007-2009, similar to the one produced by the ART model for 1999. Indeed, the 2007-2009 simulated data will be richer in content than the 1999 real world data. The most probable dependency status for every spatial unit will be characterized by a probability value. Even more crucially, we will know the second, third, forth and fifth most probable dependency status with their respective probability values. Before even starting the simulation of the 2007-2009 data, we want to determine the predictive power of the model for the two variables in question. This can be done through two targeted cross validation analyses on the 1990 and 1990-1999 data. After having randomly extracted 10 samples containing each 10% of the records in the database through a k-fold procedure, we will evaluate how the model generated from the 90% remaining records can correctly predict the values of the variables VAR DEP and SUPt2 for the sample data. If, within the fragment model, we could use the knowledge of the five remaining variables to predict VAR DEP and SUPt2, the overall precision4 of the model would be 99% and 82%, respectively. But in planning applications, the knowledge VAR DEP cannot be used to predict SUPt2 and vice-versa. We will thus have to consider all dynamic variables as unobservable when predicting the value of another dynamic variable. The only certain elements of knowledge to be used in the probabilistic simulation will be the ones of the four static variables. Finally, more uncertainty will be propagated and the predictive power for VAR DEP and SUPt2 will fall to 75% and 72% respectively. We will consider these values as sufficiently high in this pioneering application for regional modelling. The confusion matrixes of these network targeted performances highlight nevertheless the strengths and the weaknesses of the model. As far as variable SUPt2 is concerned, the precision is lower when future dependencies are inferred for units depending from dominant centres of Other kind. As for the VAR DEP variable, the forecast precision is better for units keeping their old dependency than for those changing dependency. Nevertheless, one should remember that the main goal of the model was to extract association rules among variables in order to support scenario building. The predictive power for selected variables is thus a poor evaluation of the model. It only assesses the accuracy of the most probable outcome forecasted for every spatial unit. Model results, as shown in Figure 9, are more sophisticated: they include different prospected outcomes for every spatial unit; every outcome is characterised by a probability value, enriching scenario building with second or third most possible outcomes which are not evaluated in predictive power assessment. Conscious of the limits of the model in correctly predicting future values for the two key variables, we will now simulate the future evolution of the regional urban networks for 2007-2009. Figure 9 shows the dominant flows of daily mobility in the PACA region as forecasted by the model for 2007-2009 using 1999 data. Figure 9 is
4
The precision of the model for a target variable is defined as being the ratio between the number of correct predictions of the target variable and the total number of cases in the cross-validation.
374
G. Fusco
Fig. 9 Probabilistic Representation of Dominant Flows for 2007-2009
not a simple cartographic representation of a dependency table. Probabilistic aspects are fully integrated in the model results and in their mapping. A particularity of the simulation of the urban network evolution is that the future state of a spatial unit can be either independence or dependence on another unit. In terms of GIS data modelling, the model results will have to be split in two different tables. A point table will include all the situations of independence (with their probability as attribute value) and a poly-line table will contain the links between dependent units and their dominant centres (with the probability of this dependency link as attribute value). This structuring of spatial data will allow easy to perform data querying and mapping on the model results. When they constitute the most probable outcome of the model simulation, dependency relationships are represented by segments whose colour code (in the grey scale) is proportional to their certainty/uncertainty. Dependency relationships are also represented when they constitute the second most probable outcome (a different colour code is then used), as long as their probability exceeds the threshold level of 0.1. Similarly, situations of independence constituting the most
Uncertainty in Interaction Modelling: Prospecting the Evolution
375
probable or the second most probable outcome of the model simulation are represented by circles whose colour code is proportional to their certainty/uncertainty. Looking closer at the model results, the trend scenario for the study area, made up of the most probable model outcomes, is the steady extension of the urban networks in the alpine hinterland, around Gap, Briançon, Digne and Manosque. The urban network of Nice continues its extension in the hinterland, too (although a few units are lost in favour of Cannes and Digne in the West). The networks around Avignon and Marseilles grow marginally through the expansion of the influence area of their secondary centres. In 2007-2009, the number of hierarchical levels determined through the subordinates’ criterion is reduced to four (they were five in 1999), as the network around Marseilles marginally reduces its depth. Only six centres of first level are left: Marseilles, Nice, Toulon, Avignon, Draguignan and Gap. In 1999 there were one centre of first level (Marseilles) and nine centres of second level. The network organisation of the regional space is thus simplified. The model results are nevertheless not limited to the most probable outcomes. More particularly, when the most probable outcome is affected by a high degree of uncertainty (outcome probability less then 0.5, for example) it becomes interesting to evaluate other less probable outcomes (as the second most probable outcomes represented in Figure 9). We can thus see how several units in the alpine hinterland hesitate between independence and dependency from the urban centres of Gap, Briançon, Digne and Manosque. The contrast with the units belonging to the urban network of Marseilles is striking. For these municipalities, the future dependency status is predicted by the BN model with a relatively low degree of uncertainty. As for the urban network around Nice, relatively low uncertainty levels affect the coastal area, as well as the eastern part of the hinterland. Higher uncertainty levels characterise the future dependency status of the alpine villages of the western part of the hinterland. Higher degrees of uncertainty often (but not always) affect units whose dependency status is predicted to change and units situated further away from the main urban centres of the region. Highlighting sub-spaces presenting several possible (and comparably probable) futures in terms of insertion in the regional urban networks is a way of determining priority areas where the stakes are higher in terms of regional planning. Globally, the proposed BN model is a first model of the evolution of hierarchical urban networks resulting from daily mobility flows. The main result of the prospective simulation of the future evolution of these networks is a reinforcement of the metropolitan networks in the coastal area, as well as the emergence of well structured urban networks in the hinterland. We think that these results are plausible for the large alpine hinterland. The latter constitutes roughly two thirds of the regional area and spatial units, but is often forgotten in the analysis of metropolitan and urban development in the PACA region. The model results show that this area is progressively being integrated in the regional metropolitan structures, although in a peripheral position. At the same time, higher level of uncertainty affects the simulation results for this area. On the contrary, the model tends to underestimate the possible dependency variations of the larger municipalities, above
376
G. Fusco
all in the coastal area and in the Rhone valley. This is a major limit of the model, as the change in the dependency status of a few large municipalities can produce important modifications in the regional urban networks (as already observed between 1990 and 1999, when Cavaillon and Carpentras lost their independence and became secondary centres for Avignon).
4 Conclusions and Future Developments The applications presented in this chapter represent a complete modelling chain, integrating interaction modelling and uncertainty issues. We first developed protocols to extract urban networks from spatial interaction data. BN were later used to produce a model of the evolution of these networks. The results of prospective simulations of the future evolution of regional urban networks were subsequently integrated in a GIS platform to obtain an appropriate cartographic representation. More particularly, GIS modelling and mapping had to integrate the probabilistic content of the model results, representing the degree of certainty/uncertainty in the knowledge of the future state of the regional system. The BN model proposed in this chapter probably oversimplifies the evolution of regional urban networks. It also shows weaknesses for particular sub-spaces within the study area. It nevertheless opens interesting perspectives for the development of prospective models for decision support in planning. The BN model produces a trend scenario for the evolution of the regional urban networks, integrating the uncertainty characterising the knowledge of the future state of the system. This combined knowledge of a geo-referenced trend scenario and of the uncertainty affecting every single element composing the scenario, constitutes a crucial element in decision support for regional planning. Several developments for the proposed model can be foreseen. • The first direction of research would be to use the BN model to infer future values of other dynamic variables, beyond VAR DEP and SUPt2. • Increasing the time span of the BN model is a second foreseeable development. A second BN model could then be produced for the 1982-1990 period. Its comparison with the 1990-1999 model would be extremely instructive in order to highlight the persistence or evolution of spatial rules producing hierarchical urban networks in the study area. Chaining prospective simulations over several time steps (of 8-10 years each) is also an interesting perspective. It will then be possible to propose trend scenarios for 20-25 years. This is possibly the maximum time-span for the model. Beyond two or three time steps, uncertainty will seriously affect any model results, as the uncertainty of the results for the first simulation will propagate to the results of the following ones. • It would be interesting to integrate more exogenous variables in the model. Other spatial interaction data could also be used. Coupling the logics of daily mobility with those of residential mobility would result in more realistic models, as the different kinds of spatial mobility are particularly interwoven (Kaufamnn 2000). Integrating decision variables would eventually allow proposing
Uncertainty in Interaction Modelling: Prospecting the Evolution
377
different scenarios besides the trend scenario. The technique of decision graphs (Jensen 2001) as an extension of BN modelling, would then be used. • The modelling chain proposed here is transferable to other regional contexts. Comparative analyses could then be proposed between different study areas, based on the same modelling protocol (as proposed for example by Berroir et al. 2006). Finally, the applications presented in this chapter show how GIS are much more than data storage and mapping tools. GIS can increasingly be conceived as platforms integrating different modelling modules. GIS data maintenance, import/export and mapping functions will then be precious complements to modelling modules. GIS table structures will nevertheless have to appropriately take the distinctive features of modelling data into account, namely in terms of uncertainty.
References Bauer, T., Steinnocher, K.: Per parcel land use classification in urban areas applying a rulebased technique. In: GeoBIT/GIS, vol. 6, pp. 24–27 (2001) Bayesia, BayesiaLab Documentation, version 4.6, Bayesia, Laval (2009) Berroir, S., Mathian, H., Saint-Julien, T., Sanders, L.: Mobilités et polarisations: vers des métropoles polycentriques. Les cas des métropoles francilienne et méditerranéenne. In: Bonnet, M., Aubertel, P. (eds.) La ville aux limites de la mobilité, pp. 71–82. PUF, Paris (2006) Corgne, S.: Modelisation predictive de l’occupation des sols en contexte agricole intensif: Application à la couverture hivernale des sols en Bretagne, PhD dissertation, Université de Rennes 2 – ENST Brest (2004) Decoupigny, F.: Journey Simulation of a Movement on a Double Scale. In: Mathis, P. (ed.) Graphs and Networks. Multilevel Modelling, pp. 31–46. ISTE, London (2007) Fusco, G.: Looking for Sustainable Urban Mobility through Bayesian Networks. Scienze Regionali / Italian Journal of Regional Sciences (3), 87–106 (2004) Cybergeo 292, 87–106 (2004), http://www.cybergeo.eu/index2777.html Fusco, G.: Spatial Dynamics in France. In: Pourret, O., Naïm, P., Marcot, B.G. (eds.) Bayesian Networks: A Practical Guide to Applications, pp. 87–112. John Wiley & Sons, New York (2008) Fusco, G.: Modelling Urban Networks from Spatial Interaction Data: the ART Platform. In: Rabino, G., Scarlatti, F. (eds.) Advances in models and methods for Planning, pp. 63–72. Bologna, Pitagora (2009) Fusco, G., Decoupigny, F.: Logiques réticulaires dans l’organisation métropolitaine en région PACA , XLVe colloque de l’ASRDLF, Rimouski 25-27 août, 18 pages (2008) http://asrdlf2008.uqar.qc.ca/Papiers%20en%20ligne/ FUSCO%20G.%20et%20DECOUPIGNY%20F._texte%20ASRDLF%202008.pdf Jansens, D., Wets, G., Brijs, T., Vanhoof, K., Arentze, T., Timmermans, H.: Integrating Bayesian networks and decision trees in a sequential rule-based transportation model. European Journal of Operational Research 175, 16–34 (2006)
378
G. Fusco
Janssens, D., Wets, G., Brijs, T., Vanhoof, K., Timmermans, H.J.P., Arentze, A.: Improving the performance of a multi-agent rule-based model for activity pattern decisions using Bayesian networks. In: Conference Proceedings of the 83rd Annual Meeting of the Transportation Research Board, CD-ROM (2004) Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, New York (2001) Kaufmann, V.: Mobilité Quotidienne et Dynamiques Urbaines - La question du report modal, Presses Polytechniques et Universitaires Romandes, Lausanne (2000) Kipnis, B.A.: Graph Analysis of Metropolitan Residential Mobility: Methodology and Theoretical Implications. Urban Studies 22, 179–187 (1985) Korb, K.B., Nicholson, A.E.: Bayesian Artificial Intelligence. Chapman & Hall / CRC, Boca Raton (2004) Manoj Mumar, P., Sugumaran, R., Zerr, D.: A rule-based classifier using Classification and Regression Tree (CART) approach for urban landscape dynamics. In: IEEE International Geoscience and Remote Sensing Symposium 2002, vol. 2, pp. 1193–1194 (2002) Nystuen, J.D., Dacey, M.F.: Graph Theory Interpretation of Nodal Regions. In: Papers and Proceedings of the Regional Science Association, vol. 7, pp. 29–42 (1961) Openshaw, S., Turner, A.: Forecasting global climatic change impacts on Mediterranean agricultural land use in the 21st Century. Cybergeo 120 (2000), http://www.cybergeo.eu/index2255.html Pearl, J.: Causality – Models, Reasoning and Inference. Cambridge University Press, Cambridge (2000) Pijanowski, B., Brown, D., Shellito, B., Manik, G.: Using neural networks and GIS to forecast land use changes: a Land Transformation Model. Computers, Environment and Urban Systems 26, 553–575 (2002) Pijanowski, B., Pithadia, S., Shellito, B., Alexandridis, K.: Calibrating a neural networkbased urban change model for two metropolitan areas of the Upper Midwest of the United States. International Journal of Geographical Information Science 19(2), 197–215 (2005) Rabino, G., Occelli, S.: Understanding spatial structure from network data: theoretical considerations and applications. Cybergeo (29) (1997), http://www.cybergeo.eu/index2199.html Torres, J.F., Huber, M.: Learning a Causal Model from Household Survey Data Using a Bayesian Belief Network. In: Prooceedings of the 82nd Meeting of the Transportation Research Board, Washington D.C., CD-ROM (2002)
Author Index
Beaubouef, Theresa 103 Bennett, Brandon 15 Bloch, Isabelle 75
Matsakis, Pascal
Cur´e, Olivier
Papini, Odile 1, 133 Parisi, Francesco 307 Parker, Austin 307 Petry, Frederick E. 103 Prade, Henri 1, 133
189
de Saint-Cyr, Florence Dupin Desjardin, Eric 341 Doukari, Omar 165 Dubois, Didier 269 Fusco, Giovanni Grant, John
357
307
Jeansoulin, Robert Loquin, Kevin
Ni, Jing Bo
269
133
Runz, Cyril de
49
49
341
Schockaert, Steven 1, 211 Smart, Philip D. 211 Stein, Alfred 243 Subrahmanian, V.S. 307
1, 165 Wendling, Laurent W¨ urbel, Eric 165
49