Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
6899
Max Egenhofer Nicholas Giudice Reinhard Moratz Michael Worboys (Eds.)
Spatial Information Theory 10th International Conference, COSIT 2011 Belfast, ME, USA, September 12-16, 2011 Proceedings
13
Volume Editors Max Egenhofer Nicholas Giudice Reinhard Moratz Michael Worboys University of Maine Orono, ME 04469, USA E-mail: {max, giudice, moratz, worboys}@spatial.maine.edu
ISSN 0302-9743 e-ISSN 1611-3349 e-ISBN 978-3-642-23196-4 ISBN 978-3-642-23195-7 DOI 10.1007/978-3-642-23196-4 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011934621 CR Subject Classification (1998): E.1, H.2.8, J.2, I.5.3, I.2, F.1 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The Conference on Spatial Information Theory—COSIT—was established in 1993 as a biennial interdisciplinary conference. The COSIT conference series focuses on innovation in spatial information theory across the disciplines. It caters to researchers in such fields as anthropology, artificial intelligence, cognitive neuroscience, computer science, geography, linguistics, mathematics, psychology, and spatial cognition who are concerned with models of space and time. Of particular interest are perspectives that cut across multiple domains or those that leverage established methods common in one field so that new insights about space are gained in another field or discipline. COSIT 2011 marked the 10th time that COSIT convened. The 2011 conference was held September 12-16, 2011 at the Hutchinson Center in Belfast, Maine, on the Penobscot Bay. Late September was an excellent time to enjoy the stunning beauty of Maine’s coast since fall is New England’s most picturesque time of year. The foliage brings a crisp chill and vibrant leaf colorations. All COSIT submissions were fully refereed by three or four members of the Program Committee, who were asked to write substantial appraisals analyzing the submissions’ relevance to the conference, their intellectual merit, their scientific significance, novelty, relation to previously published literature, and clarity of presentation. Out of the 55 submissions, the Program Committee selected 23 papers for oral presentation. The three keynote speakers for COSIT 2011 were Ernest Davis, Department of Computer Science, New York University, Nora Newcombe, Department of Psychology at Temple University and Spatial Intelligence and Learning Center (SILC), and Thomas Wolbers, Centre for Cognitive and Neural Systems, University of Edinburgh. In addition to the technical program, COSIT 2011 had a poster session, four workshops, two tutorials, and a doctoral colloquium. We thank the many people who made COSIT 2011 such a success: all those who submitted work and participated at the meeting, the reviewers, the Program Committee, the local Organizing Committee, and the staff of the Hutchinson Center. September 2011
Max Egenhofer Nicholas Giudice Reinhard Moratz Mike Worboys
Organization
General Chairs Nicholas Giudice Mike Worboys
University of Maine, USA University of Maine, USA
Program Chairs Max Egenhofer Reinhard Moratz
University of Maine, USA University of Maine, USA
Steering Committee Christophe Claramunt Anthony Cohn Michel Denis Matt Duckham Max Egenhofer Andrew Frank Christian Freksa Stephen Hirtle Werner Kuhn Benjamin Kuipers David Mark Dan Montello Kathleen Stewart Sabine Timpf Barbara Tversky Stephan Winter Michael Worboys
Naval Academy Research Institute, France University of Leeds, UK LIMSI-CNRS, Paris, France University of Melbourne, Australia University of Maine, USA Technical University Vienna, Austria University of Bremen, Germany University of Pittsburgh, USA University of M¨ unster, Germany University of Michigan, USA SUNY Buffalo, USA UCSB, USA University of Iowa, USA University of Augsburg, Germany Stanford University, USA University of Melbourne, Australia University of Maine, USA
Program Committee Pragya Agarwal Brandon Bennett Moulin Bernard Sven Bertel Michela Bertolotto
Mehul Bhatt Thomas Bittner Gilberto Cˆ amara Christophe Claramunt Eliseo Clementini
VIII
Organization
Helen Couclelis Leila De Floriani Matt Duckham Geoffrey Edwards Carola Eschenbach Sara Fabrikant Andrew Frank Christian Freksa Mark Gahegan Antony Galton Stephen Hirtle Christopher Jones Marinos Kavouras Alexander Klippel Christian Kray Barry Kronenfeld Werner Kuhn Lars Kulik Damir Medak Daniel R. Montello
Nora Newcombe Martin Raubal Jochen Renz Kai-Florian Richter Andrea Rodriguez Christoph Schlieder Angela Schwering John Stell Kathleen Stewart Sabine Timpf Barbara Tversky David Uttal Nico Van De Weghe Jan Oliver Wallgr¨ un Robert Weibel Stephan Winter Thomas Wolbers Diedrich Wolter May Yuan
Table of Contents
Maps and Navigation How Do Decision Time and Realism Affect Map-Based Decision Making? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Wilkening and Sara Irina Fabrikant
1
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Malumbo Chipofya, Jia Wang, and Angela Schwering
20
Scalable Navigation Support for Crowds: Personalized Guidance via Augmented Signage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fathi Hamhoum and Christian Kray
40
Information on the Consequence of a Move and Its Use for Route Improvisation Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takeshi Shirabe
57
The Effect of Activity on Relevance and Granularity for Navigation . . . . Stephen C. Hirtle, Sabine Timpf, and Thora Tenbrink I Can Tell by the Way You Use Your Walk: Real-Time Classification of Wayfinding Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Makoto Takemiya and Toru Ishikawa
73
90
Spatial Change From Video to RCC8: Exploiting a Distance Based Semantics to Stabilise the Interpretation of Mereotopological Relations . . . . . . . . . . . . . Muralikrishna Sridhar, Anthony G. Cohn, and David C. Hogg
110
Decentralized Reasoning about Gradual Changes of Topological Relationships between Continuously Evolving Regions . . . . . . . . . . . . . . . . Lin-Jie Guan and Matt Duckham
126
Spatio-temporal Evolution as Bigraph Dynamics . . . . . . . . . . . . . . . . . . . . . John Stell, G´eraldine Del Mondo, Remy Thibaud, and Christophe Claramunt
148
Spatial Reasoning On Optimal Arrangements of Binary Sensors . . . . . . . . . . . . . . . . . . . . . . . . Parvin Asadzadeh, Lars Kulik, Egemen Tanin, and Anthony Wirth
168
X
Table of Contents
A Hybrid Geometric-Qualitative Spatial Reasoning System and Its Application in GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giorgio De Felice, Paolo Fogliaroni, and Jan Oliver Wallgr¨ un CLP(QS): A Declarative Spatial Reasoning Framework . . . . . . . . . . . . . . . Mehul Bhatt, Jae Hee Lee, and Carl Schultz
188
210
Spatial Cognition and Social Aspects of Space The Social Connection in Mental Representations of Space: Explicit and Implicit Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holly A. Taylor, Qi Wang, Stephanie A. Gagnon, Keith B. Maddox, and Tad T. Bruny´e Revisiting the Plasticity of Human Spatial Cognition . . . . . . . . . . . . . . . . . Linda Abarbanell, Rachel Montana, and Peggy Li Linguistic and Cultural Universality of the Concept of Sense-of-Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel R. Montello and Danqing Xiao Towards a Formalization of Social Spaces for Socially Aware Robots . . . . Felix Lindner and Carola Eschenbach
231
245
264 283
Perception and Spatial Semantics Finite Relativist Geometry Grounded in Perceptual Operations . . . . . . . . Simon Scheider and Werner Kuhn Linking Spatial Haptic Perception to Linguistic Representations: Assisting Utterances for Tactile-Map Explorations . . . . . . . . . . . . . . . . . . . . Kris Lohmann, Carola Eschenbach, and Christopher Habel Analyzing the Spatial-Semantic Interaction of Points of Interest in Volunteered Geographic Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christoph M¨ ulligann, Krzysztof Janowicz, Mao Ye, and Wang-Chien Lee
304
328
350
Space and Language A Model of Spatial Reference Frames in Language . . . . . . . . . . . . . . . . . . . Thora Tenbrink and Werner Kuhn Universality, Language-Variability and Individuality: Defining Linguistic Building Blocks for Spatial Relations . . . . . . . . . . . . . . . . . . . . . . Kristin Stock and Claudia Cialone
371
391
Table of Contents
The Semantics of Farsi be: Applying the Principled Polysemy Model . . . . Narges Mahpeykar and Andrea Tyler
XI
413
On the Explicit and Implicit Spatiotemporal Architecture of Narratives of Personal Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Blake Stephen Howald and E. Graham Katz
434
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
455
How Do Decision Time and Realism Affect Map-Based Decision Making? Jan Wilkening and Sara Irina Fabrikant Geographic Information Visualization & Analysis Group (GIVA), Department of Geography, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland {jan.wilkening,sara.fabrikant}@geo.uzh.ch
Abstract. We commonly make decisions based on different kinds of maps, and under varying time constraints. The accuracy of these decisions often can decide even over life and death. In this study, we investigate how varying time constraints and different map types can influence people’s visuo-spatial decision making, specifically for a complex slope detection task involving three spatial dimensions. We find that participants’ response accuracy and response confidence do not decrease linearly, as hypothesized, when given less response time. Assessing collected responses within the signal detection theory framework, we find that different inference error types occur with different map types. Finally, we replicate previous findings suggesting that while people might prefer more realistic looking maps, they do not necessarily perform better with them. Keywords: time pressure, slope maps, shaded relief maps, empirical map evaluation.
1 Introduction Many people have used maps for spatio-temporal decision-making under varying time constraints. For instance, commuters must choose alternative driving routes with road maps, depending on rapidly changing traffic situations. Hikers might need to rapidly select a different trail using a topographic map, when the weather suddenly deteriorates in the mountains, or a sailor might have to quickly consult a nautical chart when navigating in an area with sudden wind and water level changes. Time available for such kinds of map-based decisions can vary enormously. Sometimes the decision time window might consist merely of a few seconds, and the decision might decide over life and death. With increasing human mobility, and respective increased availability of mobile map devices, it seems crucial to investigate how decision time constraints and display types might affect the quality of map-based decision making under rapidly changing conditions. We have been investigating this rather under researched issue with a series of prior controlled experiments, which we review with other relevant research in the related work section. In this study, involving a map-based slope detection task under varying time pressure scenarios, we specifically investigate decision-making within a three-dimensional context M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 1–19, 2011. © Springer-Verlag Berlin Heidelberg 2011
2
J. Wilkening and S.I. Fabrikant
and different display types, and analyze collected responses (i.e., accuracy) within the signal detection theory framework.
2 Related Work Our research program lies at the intersection of time pressure and decision-making research, mainly carried out in psychology and economics, including empirical map design research in cartography. In the following, we review related work from these cognate research fields. 2.1 Decision Making under Time Pressure Many external factors and cognitive biases impair optimal human decision-making (Simon, 1959; Payne, 1982; Gigerenzer, 2002). The effect of time constraints on decision making has been evaluated systematically by cognitive, developmental, and personality psychologists, as well as by human resources researchers, or economists (see Förster et al., 2003 for an extensive review). It is widely accepted that decision time influences the quality of decisions (Svenson et al., 1990), and that the negative effect of time pressure on decision making is robust and consistent (Pew, 1969; Ahituv et al., 1998). Two concepts from the psychological literature on time pressure and decisionmaking seem relevant for our study: Firstly, the speed-accuracy trade-off concept (Wickelgren, 1977) suggests that time pressure can reduce the overall quality of a decision, and, secondly, the speed-confidence trade-off (Smith et al., 1982) suggests that the confidence with which people make decisions might decrease with increasing time pressure. The characteristics of the speed-accuracy trade-off depend on task complexity: The more complex a task, the more likely the occurrence of the speed-accuracy trade-off (Johnson et al., 1993). However, there are also instances when time pressure has a beneficial effect on decision-making. For example, in a long-term time pressure study with NASA scientists and engineers Andrews and Farris (1972) found that decision performance actually increased with increased time pressure, but only to a certain tipping point. Beyond that point, decision performance decreased again. Peters et al. (1984) replicated these findings in a related study involving commercial bankers as decision makers. Hwang (1994) argues that perhaps the best way to describe the interaction between decision performance and time pressure is not a linear relationship, but an inverted U-shaped curve: “Increasing time pressure leads to better performance up to a certain point, beyond that point more time pressure reduces, rather than increases, performance.” (Hwang, 1994, p. 198). A still open question remains whether map-based decisions follow a linear speedaccuracy trade-off relation, or an inverted U-shaped curve as found in previous research outside of GIScience, which would imply that time pressure could also have a positive effect on map-based decisions. At this point, it is also unclear how map design and task complexity might interact with map-based decision making under time pressure. In empirical cartographic research, response time is typically employed as a dependent variable (i.e., efficiency measure) to evaluate cartographic design
How Do Decision Time and Realism Affect Map-Based Decision Making?
3
principles (Lloyd and Bunch, 2003; Garlandini and Fabrikant, 2009; Dillemuth, 2009). However, little work has been done until now to study the effect of time pressure (i.e., as a controlled, independent variable or factor) on map based decisionmaking. Baus, Krüger and Wahlster (2002) suggest to consider time pressure when designing displays of mobile devices for pedestrian navigation. They argue that changing travelling speeds during navigation create varying time pressure situations, which in turn should lead to different user requirements for navigation displays. They contend that different content should be displayed on a map used in different time pressure conditions. In another study involving user motivation in navigation, Srinivas and Hirtle (2010) offered a reward to one participant group as an incentive for faster task completion, while the other “control” group was not given any incentive to reduce task completion time. Indeed, the “more motivated” participants completed the routes significantly faster than the participants in the “control” group. 2.2 Map Design Issues in Decision Making Numerous prior empirical studies in cartography have investigated how map design might influence human visuo-spatial inference and decision making, typically depending on a specific map use task (Fabrikant and Lobben, 2009). For our study on slope detection, research comparing 2D and 3D-looking maps for a task involving three spatial dimensions seems particularly relevant. For example, studying aviator navigation performance, Smallman et al. (2001) have shown that users’ search time for selecting aircraft which meet certain criteria was significantly faster with 2D maps than with 3D-looking map displays. In related work on the design of cockpit displays, Thomas and Wickens (2006) found no significant performance differences in participants’ accuracy and response times between 2D co-planar or 3D perspective displays. Coors et al. (2005) evaluated small-screen 3D and 2D mobile navigation aids, and found that the majority of the participants had a positive attitude towards 3D. Participants found that 3D maps were generally a “good idea”, but also that 2D was already “sufficient” for mobile navigation. However, participants’ response times were significantly slower with 3D maps compared to 2D maps. This suggests that 3D displays in the context of navigation might be more suitable when having more time available for decision-making, but less useful under time pressure. In this context, the potential discrepancy between user preferences and actual task performance is also relevant. For instance, Canham, Smallman and Hegarty (2007) and Hegarty et al. (2009) have shown that users tend to prefer more realistic, 3D-looking weather maps that on the surface seem to contain more information for the decision-making task at hand than more abstract 2D maps. However, while users prefer 3D, these displays do not necessarily seem to positively influence users’ task performance. In fact, Hegarty and colleagues (2009) found that performance was generally better with the less realistic-looking maps, while users’ preference ratings indicated just the opposite. They interpret these results as “another good, empirically validated illustration of the common-sense notion that what people think they want is not always what is best for them” (Fabrikant and Lobben, 2009). According to Hegarty and colleagues, “naïve cartographers” seem to prefer 3D displays to 2D displays, and also seem to prefer more realistic depictions to simpler, more abstract ones. Cartographic design theories and principles, however, aim for reducing graphic
4
J. Wilkening and S.I. Fabrikant
complexity (Bertin, 1967). Similarly, the claims by designers for maximizing the data-ink ratio and for minimizing chart junk (Tufte, 1983), or the empirically validated clutter principle by Rosenholtz and colleagues (2007) also call for more abstraction, and less gratuitous realism to facilitate visuo-spatial decision making. From these related studies, we can derive an initial research hypothesis that users prefer more realistic-looking maps (e.g., satellite image maps) and 3D maps (e.g., shaded relief maps), but might perform better with traditional 2D cartographic maps (i.e., topographic maps). While on the surface it might seem obvious that certain map types are suitable for certain kinds of tasks, it is less obvious how variations of map display designs might influence the quality of map-based decisions under varying temporal usage constraints.
3 Previous Own Work: Experiments and Expert Interviews In order to fill the existing research gap between time pressure research and empirical map design and map use studies, we have been conducting a series of controlled experiments on map-based decision making under time pressure. We complemented these studies by expert interviews with professionals in the field of map-based decision-making under time pressure. In the following, we summarize the main findings of this work which set the context for the slope detection experiment reported in Section 4. In a first experiment on map use preferences for a road selection task under various time pressure conditions, we found that participants preferred realisticlooking orthographic satellite image maps and perspective views with hill shaded relief when they were not under time pressure (Wilkening, 2009). However, these preferred image maps were rated significantly less useful when under time pressure. In contrast, preference ratings for the more abstract looking topographic or road maps (i.e., without hill shading) were not affected by time pressure. In a second experiment, we assessed users’ road selection task performance in flat urban terrain. The roads were depicted either on a satellite image map or on a standard road map, under varying time pressure scenarios (Wilkening, 2010). The map display type did not affect participants’ accuracy scores. However, participants reported a significantly higher confidence in their performance with satellite images compared to the more abstract road maps. This over-confidence in realistic depictions has been discovered in prior work (Hegarty et al., 2009; Smallman and St. John, 2005; Fabrikant and Boughman, 2006). In our road selection experiment, shorter decision time limits resulted in a significant decrease in participants’ confidence, but not in accuracy. In other words, while we did find a speed-confidence trade-off effect, we did not find strong evidence for a speed-accuracy trade-off. After having obtained some first insights on map type preferences and task performance under time pressure by non expert map users, we were interested in interviewing professionals who perform map-based decisions under time pressure on a daily basis, specifically within a more complex three-dimensional context. For this reason we interviewed, amongst others, search and rescue helicopter pilots and
How Do Decision Time and Realism Affect Map-Based Decision Making?
5
professionally trained mountain guides. Both professional groups mentioned that they were generally satisfied with using the “classic” 2D topographic map for their routine work. In the age of 3D interactive globe viewers, and location-aware mobile displays, we found that the static, two-dimensional topographic map on paper is still the stateof-the-art for professionals dealing with real world emergency situations under time pressure. One reason could be that the majority of search and rescue personnel have been specifically trained with these maps, can read them well, and thus are generally comfortable with using them. These interviews confirm findings from our first experiment that familiarity (and training) with a display can positively influence usage preference, especially when under time pressure (Wilkening, 2009). For both helicopter flying and mountaineering activities, accurate slope identification is very important. For example, a helicopter must assess the steepness of the terrain for landing (Bloom, 2007), and for a mountain guide the steepness of a slope needs to be regularly assessed for determining the avalanche potential when on a ski tour during the snow season (Suter, 2007). As the depiction style of the thematically relevant third dimension might be important for these kinds of tasks, we specifically chose a slope detection task for our next experiment, which is described in detail the next sections.
4 Experiment As mentioned earlier, in own prior work we discovered a significant effect of time pressure on user preferences and response confidence for realistic 3D-looking maps in a 2D task context, while actual performance did not seem to be affected by the verisimilitude of the display. In this study, we are interested how 3D realism might affect participants’ response accuracy and confidence for a task under time pressure that specifically involves decision-making within a 3D context. We asked task domain novices, that is, people who might be familiar with maps, but have never used maps for landing a helicopter, to identify locations on various map stimuli where a helicopter could land. The previously interviewed professional helicopter pilots had mentioned inclines of less than 14% (or 8 degrees) for safe helicopter landing. This threshold seems to be a standard in the literature (e.g., Bloom, 2007). We again selected three time pressure scenarios with time limits what were identified through pilot testing. The experiment follows a within-subject design, where each participant was exposed to all time constraints and display types. 4.1 Participants Fifty-five (32 male and 23 female) participants took part in this study. Participants were either students or staff at the Department of Geography at the University of Zurich and the Institute of Cartography at the Swiss Federal Institute of Technology in Zurich. The majority of participants stated to be “rather familiar” with topographic maps (58.2%) and 3D displays (61.8%), while 32.7% reported to be “very familiar” with topographic maps, and 14.5% to be very familiar with 3D depictions. While our sample represents the more experienced map designer and user, the participants are not experts in the slope detection task domain, and do not represent experts in map-based decision making under time pressure.
6
J. Wilkening and S.I. Fabrikant
4.2 Materials We created twelve map displays in total, depicting mountainous areas in Switzerland. All maps were of identical size (389x355 pixels), and included a scale bar on the upper right of the display (see Figure 1). The elevation data for the stimuli were derived from the SRTM3 Digital Elevation Model (Jarvis et al., 2008). Slope information could be identified with two pieces of information depicted in the stimulus: the scale bar next to the map and the contour lines in the map. The map scale was held constant at 1:20,000 (run), and the contour line interval was held constant at 100m (rise). The twelve maps represent the elevation data in four different ways: 1. 2. 3. 4.
Contour lines only (map a) Contour lines plus light hill shading (map b) Contour lines plus dark hill shading (map c) Contour lines plus colored slope classes (map d)
While all map types are suitable for identifying slope, the depiction methods systematically vary in the degree of the depicted realism (i.e., shaded relief vs. contour line maps), in the apparent visual clutter (i.e., contour line vs. shaded relief maps), and in the information content for detecting slope suitability (slope vs. contour line maps). In other words, the maps are neither computationally nor informationally equivalent (Simon and Larkin, 1987; Fabrikant et al., 2008). Map (a) in Figure 1, containing only contour lines, represents the most abstract of the tested map types, and with the least amount of information (i.e., implicit slope information). Maps (b) and (c) additionally contain a shaded relief (i.e., explicit relative slope information), thus more information than map (a). Users can obtain slope information not only (implicitly) from the distance between the contour lines (a), but also from the relative darkness of the pixels (maps b+c). The steeper the slope, the darker is the appearance of the relief. To investigate the potential effect of the graphic quality of the hill shading, we created a lighter version (b) and a darker version (c) of the hill shaded relief maps. We employed the hill shade function available with the 3D Analyst Toolbox in ESRI’s ArcGIS. The light source for the hill shading was set to a 45° angle for the lighter relief (map b) and to a 22° angle for the dark relief (map c), respectively. Based on Simon and Larkin (1987), we hypothesize that the more implicit the depiction of the task relevant information (i.e., map a), and the higher the amount of task irrelevant information (i.e., maps b+c) the more reasoning effort is needed when making decisions with these maps. While our interviewed map-based decision-making professionals did not use slope maps (map d) for their daily work, they considered them as “nice-to-have”, so we included them in our study. The slope maps contain most task-relevant information in our tested maps. They show slope information explicitly in the map, and the respective information is explained in the accompanying legend, thus, should be easiest to use for task domain novices. Slope was calculated in ESRI’s ArcGIS and depicted in a diverging color scheme, employing the traffic light metaphor (green = go, red = stop). Slopes that are flat enough for a helicopter to land (i.e., below 14% steepness) are depicted with green shades, while slopes that are too steep for landing (i.e., above 14%) are shown in magenta shades (Figure 1d). We define amount of
How Do Decision Time and Realism Affect Map-Based Decision Making?
7
realism as a degree of verisimilitude with the real world (Zanola et al., 2009). We thus contend that shaded relief maps (b+c in Figure 1) look more realistic than a contour map (Figure 1a), because contours cannot be seen in the real world.
Fig. 1. Reduced examples of employed map stimuli: only contour line map (a), light hill shaded relief map (b), dark hill shaded relief map (c), and slope map (d)
Finally, the four tested map types also vary in graphic quality, or negatively put, in their degree of visual clutter. In Rosenholtz et al. (2007)’s terminology, clutter relates to the degree of perceptual organization of information in a display. The more organized a display, the less visual clutter (detracting information) it contains for a given task. We quantitatively assessed this purely bottom-up vision concept in our test displays by means of the Subband Entropy clutter measure, proposed by Rosenholtz and colleagues. This measure, also empirically validated with map displays, is “based on the notion of clutter as related to the efficiency with which the image can be encoded and inversely related to the amount of redundancy and grouping in the image” (Rosenholtz et al., 2007, p. 18). Subband entropy seems to be a good predictor for human map-reading performance under time pressure. The higher the subband entropy measure for a display (i.e., the more clutter), the less computationally efficient the extraction of information encoded in the image (Simon and Larkin 1987). To exemplify this measure, we computed subband entropy for the four stimuli shown in Figure 1, and find most clutter in the slope map (3.75), followed by the dark hill shaded relief (3.31), the light hill shaded relief (3.27), and lastly, the contour map (3.25). While the slope map is graphically more cluttered than the others (e.g., it includes an additional visual variable color), it shows the task relevant information
8
J. Wilkening and S.I. Fabrikant
explicitly (i.e., in the legend), thus despite perceptual clutter one would expect this map to need less reasoning effort to extract the task relevant information. The investigated factors are summarized in Table 1 below. Table 1. Comparison of the map types used in the slope detection experiment. The amount of information is indicated with + (low) to +++ (very high). contour map
shaded relief maps
slope map
degree of realism
(+)
depiction type (elevation information)
lines of equal elevation (absolute)
slope information type (amount) visual clutter (Subband Entropy) reasoning effort
implicit (+) (+)
(++) lines of equal elevation (absolute) & shaded relief (relative) implicit (++) (++)
(+) lines of equal elevation (absolute) & slope classes (absolute) explicit (+++) (+++)
(+++)
(++)
(+)
The locations participants had to assess for potential helicopter landing were represented with black labels (numbers) on a yellow background to maximize saliency. Each stimulus contained six such locations for assessment. No other pieces of information (such as labels of place names) were contained in the map. We ran a saliency model (Itti and Koch, 2001) on our test stimuli, to make sure that the saliency of the decision points was not significantly influenced by the tested map types. 4.3 Procedure The experiment took place in a lab equipped with standard personal computers connected to the Internet. The experiment was carried out digitally in a web browser displayed on a 17-inch computer screen set to 1280x768 pixel screen resolution. After filling out a background questionnaire, participants were then asked to safely land a helicopter on slopes not steeper than 14%. To assure that participants all had the necessary background to complete the task, they were first introduced to the slope concept and how slope can be calculated. They were shown how slope can be identified in a contour line map using the elevation information displayed with labels on the contour lines, and the ground distance information contained in the map scale bar. No other task relevant information was given to the participants. Participants were then asked to solve two warm-up tasks, which were identical to the actual experiment, which is described below. Then, they were shown the sequence of twelve maps described in the previous section. The order of the stimuli was systematically rotated to prevent learning biases due to potential ordering effects. For each map, participants had to select one or more locations that were flat enough for a helicopter to land, by clicking the respective checkbox below the map. For each map, six locations had to be assessed. The number of correct locations varied randomly from 1 to 5 per map. Overall, 50% of the labeled slopes were too steep to land a helicopter.
How Do Decision Time and Realism Affect Map-Based Decision Making?
9
Subjects had to solve the slope detection task under all three time constraint conditions, including 20s (most severe), 40s (moderate), and 60s (least severe) time limits and for all map display types described earlier. After completing each task, participants were asked to rate their confidence of response on a scale from “1 – not confident at all” to “4 – very confident”. Participants were not under time pressure when asked to rate their response confidence. Responses were collected digitally and included participants’ accuracy (percentage of correct answers) as well as (self-reported) confidence as success measures. After completing the digital portion of the experiment, participants were debriefed, and given a meal voucher for the university cafeteria in return for their participation. The experiment took approximately 15 minutes to complete. 4.4 Signal Detection Analysis The conceptual framework of signal detection theory (SDT), which was originally developed for research on visual perception (Tanner and Swets, 1954), can generally be employed for decision-making under uncertainty, and especially when decisions have to be made based on two or more alternatives. The benefits of using this framework for our research context is that response accuracy can be assessed with more analytical depth than just comparing correct and false answers. In SDT correct answers are coined “hits” or “correct rejects”, and errors are called “misses” or “false alarms”, respectively. This analysis framework can especially help us to identify which kinds of errors participants might make, due to varying time constraints and map display types, and thus if errors might follow a particular pattern. Applying this concept to our slope detection experiment, correctly selected locations per question are classified as “hits” (<14% steepness), and thus those (correctly) not selected locations are classified as “correct rejects” (>14%, see Table 2 below). Participant answers that are incorrect are classified as either “misses” or “false alarms”, respectively. A miss indicates a location that was not selected, even though it is correct (<14% steepness), and a false alarm occurs when participants incorrectly selected a location with a slope that is too steep (>14%). In other words, a miss is an overestimation of slope, while a false alarm represents the underestimation of a slope. Table 2 illustrates how we classify four possible types of responses within the data analysis framework of Signal Detection Theory. Table 2. Classification of correct and false answers according to Signal Detection Theory Reality Participants’ decisions Slope too steep (> 14%) Slope flat enough (<14%)
Slope too steep (> 14%) correct reject (true) false alarm (false)
Slope flat enough (<14%) miss (false) hit (true)
5 Results We first present the results regarding time pressure, followed by the results for the different map types, and finally, we report on the interaction of time pressure with map types on participants’ accuracy and confidence ratings.
10
J. Wilkening and S.I. Fabrikant
5.1 Time Pressure Effect Overall participants’ average response accuracy and confidence ratings shown in Figure 2 reveal a surprising and counterintuitive pattern. Participants are most accurate with the moderate time limit of 40s (M=82.8%, SD=11.4%), followed by the most generous time limit with 60s (M=72.2%, SD=15.2%), and lastly, as expected, the most severe time limit of 20s (M=66.7%, SD=20.4%). Similarly, participants’ self-reported confidence is also highest for the moderate 40s time limit (M=2.86, SD=0.44), followed by the least severe 60s limit (M=2.73, SD=0.15) and, lowest, again as expected, for the most severe 20s limit (M=2.67, SD=0.20). The performance increase from the 20s time limit to the 40s limit, and the performance decrease from the 40s to the 60s time limit are all significant for both accuracy and confidence (p< .001).
Fig. 2. Average accuracy and confidence per time pressure limit. Error bars: ±2 Standard Error (SE).
How Do Decision Time and Realism Affect Map-Based Decision Making?
11
5.2 Map Type Effect As expected, participants’ accuracy was significantly better with the slope map (M=83.6%, SD=14.8%) compared to all other maps, as shown in Figure 3. However, accuracy was not better as predicted, but even worse with the shaded relief maps compared to all other maps. Participants’ mean accuracy for the light hill shaded relief map is 73.1% (SD=16.8%), and with 65.4% (SD=18.5%) it is lowest overall for the dark hill shaded relief map. Surprisingly, participants perform even worse with the hill shaded relief maps that look more realistic, and contain more information than the most abstract contour map (M=73.5%, SD=13%). The difference between the contour map and the light hill shaded relief map is significant (p< .01), as well as the difference between the slope map and all other maps (p< .001). As can be seen in Figure 3, in congruence with the accuracy response pattern, participants’ confidence ratings are also highest for the slope map (M=3.13,
Fig. 3. Accuracy and confidence ratings for tested map display types (± 2 SE)
12
J. Wilkening and S.I. Fabrikant
SD=0.55), and higher for the contour maps (M=2.67, SD=0.47), compared to the lowest scoring hill shaded relief maps (dark: M=2.64, SD=0.47 and light: M=2.52, SD=0.47). Surprisingly again, participants have significantly higher confidence in their performance with the contour map, compared to the light hill shaded relief map (p< .05) that contains more information. The power of signal detection theory lets us analyze response accuracy in more detail. Overall, regardless of map type, misses (i.e., slope overestimation) occur more frequently than false alarms. Misses occur also more frequently than false alarms, independent of the tested time limits. The “correct rejection” is overall the more frequent correct answer than the “hit”, also for all map types and all temporal conditions. As expected, the number of false alarms (i.e., slope underestimation) shown in Figure 4 is significantly higher for the light hill shaded relief map (M=2.00, SD=1.75) than for the dark hill shaded relief map (M=1.18, SD=1.23). In contrast, the number of misses (i.e., slope overestimation) is, again as expected, higher with the darker hill shaded relief map (M=2.67, SD=1.77) compared to the light shaded relief (M=2.05, SD=1.69). As shown in Figure 4, SDT provides additional insights on what kinds of decision errors might have specifically contributed to the unexpectedly low accuracy for the shaded relief maps. Similar to the other map types, participants seem to overestimate the steepness of the slopes more frequently with the dark hill shaded relief maps (i.e., higher number of misses) compared to the light shaded relief maps. Hence, a map with a lighter shaded relief might help reduce this potential source of error. However, one can also see in Figure 4 that one drawback of light hill shaded relief maps might be their relatively high rate of false alarms.
Fig. 4. Misses and false alarms with shaded relief maps (± 2 SE). The maximum number of possible errors is 9 per map type.
5.3 Interaction of Map Type and Time Pressure We now turn to the research question how map types might support participants in their decision-making under varying time pressure scenarios. Participants gave most
How Do Decision Time and Realism Affect Map-Based Decision Making?
13
accurate answers with the (explicit) slope map under all time constraint conditions (see Figure 5). In the most severe time limit condition (20s), participants scored better with the most abstract contour map (M=71.2%, SD=30.5%), containing least amount of information, compared to the more realistic looking shaded relief maps (dark: M=63.9%, SD=27.9% and light: M=53.3%, SD=40.1%). For this shortest time limit, the overall differences between tested map types are significant (p< .01 for both shaded relief maps). Accuracy scores generally increase from the most severe (20s) to the moderate (40s) time limit condition. The accuracy differences between maps are not significant in the moderate condition. Overall, accuracy scores drop again for the highest scoring slope and contour maps under the least severe time constraint condition (60s), while accuracy scores for the hill shaded relief maps do not change much for the 40s and 60s limit conditions. In other words, participants’ accuracy with hill shaded relief maps only reaches the higher level of the other more abstract map types when participants are not under severe time pressure.
Fig. 5. Participant average accuracy per map display type and time limit
A very similar response pattern can be observed in Figure 6, when looking at participants’ confidence ratings. Again, mirroring accuracy scores, participants are most confident in their responses with the slope map, regardless of the given time limit. Participants’ confidence is also consistently high with the contour line map. The difference between the average confidence ratings for the slope map and the shaded relief maps is only significant at the 20s time limit. For this shortest time limit, the average confidence rating with the contour map is 2.58 (SD=0.08) and 2.30 (SD=0.07) with the shaded relief map. The rating difference between the contour map and both hill shaded relief maps is significant (p< .001). Only in the moderate time limit condition (40s), confidence ratings for the hill shaded relief maps are higher than for the contour map. This is in contrast to participants’ accuracy scores shown in Figure 5 earlier, where participant performance is better with the contour map that the shaded relief maps.
14
J. Wilkening and S.I. Fabrikant
Fig. 6. Participant average confidence per map display and time limit
6 Discussion Summarizing our results, we find that indeed, response accuracy and confidence ratings are worst when under highest time pressure, but best when participants are under a moderate response time limit. Both scores decrease significantly when participants have more decision time available. These results, which seem somewhat counterintuitive, do resemble the previously discovered inverted U-shaped response curve found by Hwang (1994), but not related to map-based decision making. Based on Johnson et al.’s (1993) and Hwang’s (1994) research results reviewed earlier, changes in speed-accuracy and speed-confidence trade-offs might be a consequence of task difficulty. As both response accuracy and confidence decreased with a time limit more severe than 40s, this slope detection task might have become significantly more difficult when participants had less than 40s to respond. As a result, we do find a clear speed-accuracy and speed-confidence trade-off effect. Participant performance did not further increase from the moderate to the least severe time limit, thus the slope detection task is not getting easier with more available decision time beyond the 40s time limit. In this study, the tipping point to which time pressure actually increases performance seems to be in the vicinity of 40s decision time. This pattern is in contrast to our previous road selection task study, where we did not find a time pressure effect on response accuracy, even with overall shorter decision time limits, down to even 10s decision time. One could argue that the road selection task on flat terrain is significantly less complex than a 3D slope detection task, and thus speed-accuracy and speed-confidence trade-offs are generally harder to find. This difference in task complexity might be one of the main reasons for the nonrepeatability of the results of Experiment I. Regarding the decision performance differences due to different the map types, surprisingly, participants’ accuracy and response confidence was unexpectedly low with the shaded relief maps. This result supports prior work by Hegarty and colleagues (Hegarty et al., 2008; Hegarty et al., 2009), who have shown that more
How Do Decision Time and Realism Affect Map-Based Decision Making?
15
realistic, 3D-looking displays while often preferred by “naïve users”, do not necessarily increase performance. While three-dimensional shaded reliefs provide more task-relevant (but implicit) information, compared to the more abstract contour map, this more on information does not lead to more effective (accurate) or efficient (faster) decision-making. One reason for this is might, arguably, be that the implicit thematically relevant information is not presented in a cognitively and perceptually adequate and inspired way (Swienty et al., 2008; Fabrikant et al., 2010). While the hill shaded relief maps might contain more explicit task-related information than the contour maps, they are also more cluttered (Rosenholtz et al., 2007), and thus might require more time for participants to visually parse. As Tufte (1983) would put it, the task-relevant data to graphic ink ratio in the visuo-spatial display is not optimized for the task at hand. On the other hand, while the slope maps exhibit the highest clutter values of the tested displays (see Table 1), the task-relevant data to graphic ink-ratio is indeed optimized for the task at hand. In fact, running a saliency model (Itti and Koch, 2001) on the stimuli, we find only one significant difference between the slope map and the other three tested map types (see Figure 1). The area along the bottom edge of the maps, where the density of the elevation contours is highest (i.e., the steepest area in the map), the slope map also shows darkest magenta shades between the contour lines. Moreover, the visual variable color hue seems not to have much influence on this saliency map pattern, as running the saliency model on a gray scale version of the slope maps (i.e., removing color hue) yields an identical saliency map pattern. Another possibility why the more abstract maps could have performed better under time pressure is that our 3D maps with high graphic density might have a general relative disadvantage when shown at smaller screen sizes with lower spatial display resolution than the 2D maps. However, participants do perform better with the shaded relief maps compared to the more abstract contour line map when they have more decision time available, and also seem to be more confident in their responses when under less time pressure. In this case, participant performance and confidence seem to reflect participant preferences, when we compare results from this study with the results from a prior map use preference experiment (see previous work Section 3), in which more realistic 3D looking satellite image maps obtained higher preference ratings when participants have more decision time available. In other words, we did not find strong evidence for a “naïve realism” effect (Smallman and St. John, 2005), or over-confidence in realistic-looking maps in this experiment, as low accuracy scores co-occurred with equally low confidence ratings for the tested shaded relief maps. This could be due to the fact that our participant sample consisted mainly of cartographic (design) professionals, and thus not “naïve” cartographers. Not surprisingly, the 2D slope maps, containing most of the thematically relevant information, outperformed all other map types with respect to effectiveness (i.e., accuracy) and efficiency (i.e., under all time limits), including participant confidence. In this case, in contrast to the shaded relief maps, the information increase had a positive effect on response accuracy and confidence, even though perceptually these maps appeared to be most cluttered (see Table 1). Reasons for this could be that the slope map already explicitly contains an intrinsic reasoning step (i.e., slope computation). This more on thematically relevant information is communicated in a cognitively adequate (explicit), and perceptually salient way, using empirically
16
J. Wilkening and S.I. Fabrikant
validated cartographic design principles (Fabrikant, Rebich-Hespanha and Hegarty 2010). In other words, participants can perform well and be confident in their decisions even with an abstract (but computationally efficient) depiction method, but only when thematically relevant information is communicated explicitly and rendered in a perceptually salient manner. It would be thus interesting to further investigate how different ways of representing slope information might affect the outcomes of map-based decision making tasks under time pressure. Although slope maps are not commonly known or used by map-based decision-making experts under time pressure, or the general public, our expert interviewees did find them useful, and had no problem in detecting the relevant information without any training.
7 Summary and Outlook In this study, we investigated how display types might affect people’s decision making when solving a complex slope detection task under varying time pressure conditions. Replicating previous work (Andrews and Farris, 1972; Hwang, 1994) we discovered an inverted U-shaped accuracy response curve which implies that moderate time pressure can have a positive effect on map-based decision-making, but only up to a certain tipping point, which seems to be around 40s in our study. Moreover, confirming long-standing (but rarely empirically validated) cartographic design theory (Bertin, 1967), we found that more abstract, but well designed contour and slope maps outperform more preferred realistic, 3D-looking hill shaded relief maps for the 3D slope detection task in our study. This might suggest that the benefit of explicitly communicating thematic relevant information, even in a graphically abstract way (i.e., higher cognitive cost), is greater for efficient and effective mapbased decision-making, than adding preferred and attractive, but visually more cluttered realism (i.e., higher perceptual cost). Low participant performance with shaded relief maps—even lower than the more abstract contour maps, containing even less information—suggest that visual realism might negatively influence decisionmaking, especially when under time pressure. Future experiments in varying map-based decision making contexts with different task complexity levels should be conducted to further investigate the generalizability of these somewhat counterintuitive findings, involving 1) performance decreases with more available decision time, and 2) surprisingly poor performance with shaded relief maps. For example, one could vary display sizes and the ways of representing slope information, in order to investigate the robustness of our findings. It is unclear at this point how performance is affected by user background and training. In future related studies, participants with less cartography training could be tested in similar time pressure contexts, in order to compare previous results by Hegarty and colleagues (2009), who have found higher preferences for 3D maps among “naïve cartographers”. Finally, we also encourage like minded researchers in GIScience and cartography to more often try to analyze response accuracy with the signal detection approach and to explore in which context misses or false alarms are the dominant types of errors, and how individual and group differences might influence hit and false alarm rates. For example, we found a higher number of misses compared to the number of false
How Do Decision Time and Realism Affect Map-Based Decision Making?
17
alarms in our experiment. One explanation for this play-safe strategy in this safetycritical task context could be that our participants, not trained in helicopter landing, might have rather preferred to miss a suitable landing spot, than landing on unsuitable terrain, with potentially life-threatening consequences (e.g., see the work of Hofer and Schwaninger (2005) relating to baggage screening tasks). Future empirical map design and map use studies could thus focus more on the question of what kinds of errors might result in low accuracy rates, and this might in turn lead to more focused map design guidelines. Acknowledgements. We would especially like to thank all our participants who were willing to participate in this study. We are also indebted to Christian Häberling from the Institute of Cartography at the Swiss Federal Institute of Technology Zurich, who gave us the opportunity to conduct our experiments in his cartography classes. Finally, we are grateful to our interviewees at Protection & Rescue (Schutz & Rettung) and Swiss Air Rescue (Rega) in Zurich, and at the Institute for Snow and Avalanche Research (SLF) in Davos, Switzerland, for sharing their expert insights on map-based decision making under time pressure.
References Ahituv, N., Igbaria, M., Sella, A.: The Effects of Time Pressure and Completeness of Information on Decision Making. Journal of Management Information Systems 15(2), 153–172 (1998) Andrews, F.M., Farris, G.F.: Time Pressure and Performance of Scientists and Engineers: A Five-Year Panel Study. Organizational Behavior and Human Performance 8(2), 185–200 (1972) Baus, J., Krüger, A., Wahlster, W.: A resource-adaptive mobile navigation system. In: Proceedings of the 7th International Conference on Intelligent User Interfaces, pp. 15–22. ACM, New York (2002) Bertin, J.: Semiologie graphique, Mouton, Paris (1967) Bloom, G.S.: Helicopters: How They Work (2007), http://www.helicopterpage.com/html/forces.html (last accessed June 16, 2011) Canham, M., Smallman, H., Hegarty, M.: Using Complex Visual Displays: When Users Want More than is Good for Them. In: Mosier, K., Fischer, U. (eds.) Proceedings of the Eighth International NDM Conference, Pacific Grove, CA (2007) Coors, V., Elting, C., Kray, C., Laakso, K.: Presenting Route Instructions on Mobile Devices: From Textual Directions to 3D Visualization. In: Dykes, J., Kraak, M.J., MacEachren, A.M. (eds.) Exploring Visualization, pp. 529–550. Elsevier, Oxford (2005) Dillemuth, J.: Navigation Tasks with Small-Display Maps: The Sum of the Parts Does Not Equal the Whole. Cartographica 44(3), 187–200 (2009) Fabrikant, S.I., Boughman, T.: Communicating Data Quality through Realism. In: Proceedings of GIScience 2006, pp. 59–60. Münster, Germany (2006) Fabrikant, S.I., Lobben, A.: Introduction: Cognitive Issues in Geographic Information Visualization. Cartographica 44(3), 139–143 (2009) Fabrikant, S.I., Rebich-Hespanha, H., Hegarty, M.: Cognitively Inspired and Perceptually Salient Graphic Displays for Efficient Spatial Inference Making. Annals of the Association of American Geographers 100(1), 13–29 (2010)
18
J. Wilkening and S.I. Fabrikant
Fabrikant, S.I., Rebich-Hespanha, H., Andrienko, N., Andrienko, G., Montello, D.R.: Novel Method to Measure Inference Affordance in Static Small-Multiple Map Displays Representing Dynamic Processes. The Cartographic Journal 45(3), 201–215 (2008) Förster, J., Higgins, E.T., Taylor Bianco, A.: Speed/accuracy decisions in task performance: Built-in trade-off or separate strategic concerns? Organizational Behaviour and Human Decision Processes 90, 148–164 (2003) Garlandini, S., Fabrikant, S.I.: Evaluating the Effectiveness and Efficiency of Visual Variables for Geographic Information Visualization. In: Hornsby, K.S., Claramunt, C., Denis, M., Ligozat, G. (eds.) COSIT 2009. LNCS, vol. 5756, pp. 195–211. Springer, Heidelberg (2009) Gigerenzer, G.: The Adaptive Toolbox. In: Gigerenzer, G., Selten, R. (eds.) Bounded Rationality: The Adaptive Toolbox, Dahlem Workshop Reports, pp. 37–50 (2002) Hegarty, M., Smallman, H.S., Stull, A.T.: Decoupling of Intuitions and Performance in the Use of Complex Visual Displays. In: Love, B., McRae, K., Sloutsky, V. (eds.) Proceedings of the 30th Annual Conference of the Cognitive Science Society, Washington, DC, pp. 881–886 (2008) Hegarty, M., Smallman, H.S., Stull, A.T., Canham, M.: Naive Geography: How Intuitions about Display Configuration Can Hurt Performance. Cartographica 44(3), 171–186 (2009) Hofer, F., Schwaninger, A.: Using threat image projection data for assessing individual screener performance. In: Brebbia, C.A., Bucciarelli, T., Garzia, F., Guarascio, M. (eds.) WIT Transactions on the Built Environment (82), Safety and Security Engineering, vol. 82, pp. 417–426. WIT Press, Wessex (2005) Hwang, M.I.: Decision making under time pressure: a model for information systems research. Information and Management 27, 197–203 (1994) Itti, L., Koch, C.: Computational Modelling of Visual Attention. Nature Reviews Neuroscience 2(3), 194–203 (2001) Jarvis, A., Reuter, H.I., Nelson, A., Guevara, E: Hole-filled seamless SRMT data V4: International Centre for Tropical Agriculture, CIAT (2008) Johnson, E.J., Payne, J.W., Bettman, J.R.: Adapting to Time Constraints. In: Svenson, O., Maule, J. (eds.) Time Pressure and Stress in Human Judgment and Devision Making, pp. 103–1116. Plenum Press, New York (1993) Lloyd, R.E., Bunch, R.L.: Technology and Map-Learning: Users, Methods and Symbols. Annals of the Association of American Geographers 93(4), 828–850 (2003) Payne, J.: Contingent decision behavior. Psychological Bulletin 93, 382–402 (1982) Peters, L.H., O’Connor, E.J., Pooyon, A., Quick, J.C.: The Relationship between Time Pressure and Performance: A Field Test of Parkinson’s Law. Journal of Occupational Behaviour 5, 293–999 (1984) Pew, R.W.: The speed-accuracy operating characteristics. In: Koster, W.G. (ed.) Acta Psychologica 30: Attention and Performance, vol. II, pp. 16–26. North-Holland Publishing Company, Amsterdam (1969) Rosenholtz, R., Li, Y., Nakano, L.: Measuring Visual Clutter. Journal of Vision 7(2), 1–22 (2007) Simon, H.: Theories of decision making in economics and behavioural science. American Economic Review 49(3), 253–283 (1959) Simon, H.A., Larkin, J.H.: Why a diagram is (sometimes) worth then thousand words. Cognitive Science 11, 65–100 (1987) Smallman, H.S., St. John, M.: Naive Realism: Misplaced Faith in Realistic Displays. Ergonomics in Design 13(3), 6–13 (2005) Smallman, H.S., St. John, M., Oonk, H.M., Cowen, M.B.: Information Availabity in 2D and 3D Displays. Applied Perception 21(5), 51–57 (2001)
How Do Decision Time and Realism Affect Map-Based Decision Making?
19
Smith, J.F., Mitchell, T.R., Beach, L.R.: A cost-benefit mechanism for selecting problemsolving strategies: Some extensions and empirical tests. Organizational Behaviour and Human Performance 29(3), 370–396 (1982) Srinivas, S., Hirtle, S.C.: The Role of Motivation and Complexity on Wayfinding Performance. In: Purves, R., Weibel, R. (eds.) GIScience 2010 Extended Abstracts Volume, Zurich, Switzerland (2010) Suter, C.: Lawinenausbildung mit mobilen Systemen. In: Unpublished Diploma Thesis at the Department of Geography, University of Zurich, Switzerland (2007) Svenson, O., Edland, A., Slovic, P.: Choices and judgments of incompletely described decision alternatives under time pressure. Acta Psychologica 75, 153–169 (1990) Swienty, O., Reichenbacher, T., Reppermund, S., Zihl, J.: The Role Of Relevance and Cognition in Attention-guiding Geovisualization. The Cartographic Journal 45(3), 227–238 (2008) Tanner, W.P., Swets, J.A.: A Decision-Making Theory of Visual Detections. Psychological Review 61, 401–409 (1954) Thomas, L.C., Wickens, C.D.: Display Dimensionality, Conflict Geometry, and Time Pressure Effects on Conflict Detection and Resolution Performance Using Cockpit Displays of Traffic Information. The International Journal of Aviation Psychology 16(3), 321–342 (2006) Tufte, E.R.: The Visual Display of Quantitative Information. Graphics Press, Cheshire (1983) Wickelgren, W.A.: Speed-accuracy tradeoff and information processing dynamics. Acta Psychologica 41(1), 67–85 (1977) Wilkening, J.: User Preferences for Map-Based Decision Making Under Time Pressure. In: COSIT 2009 Doctoral Colloquium Proceedings, Aber Wrac’h, France, pp. 91–98 (2009) Wilkening, J.: Map Users’ Preferences and Performance under Time Pressure. In: Purves, R., Weibel, R. (eds.) GIScience 2010 Extended Abstracts Volume, Zurich, Switzerland (2010) Zanola, S., Fabrikant, S.I., Çöltekin, A.: The Effect of Realism on the Confidence in Spatial Data Quality in Stereoscopic 3D Displays. In: Proceedings of the 24th International Cartography Conference (ICC 2009), Santiago, Chile, November 15-21 (2009) (on CD-ROM)
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment Malumbo Chipofya, Jia Wang, and Angela Schwering Institute for Geoinformatics, University of Muenster, Germany {mchipofya,jia.wang,schwering}@uni-muenster.de
Abstract. Over the past years user-generated content has gained increasing importance in the area of geographic information science. Private citizens collect environmental data of their neighborhoods and publish it on the web. The wide success of volunteered geographic information relies on the simplicity of such systems. We propose to use sketch maps as a visual user interface, because sketch maps are intuitive, easy to produce for humans and commonly used in human-to-human communication. Sketch maps reflect users’ spatial knowledge that is based on observations rather than on measurements. However, sketch maps, often considered as externalizations of cognitive maps, are distorted, schematized, incomplete, and generalized. Processing spatial information from sketch maps must therefore account for these cognitive aspects. In this paper, we suggest a set of qualitative spatial aspects that should be captured in representations of sketch maps and give empirical evidence that these spatial aspects are robust against typical schematizations and distortions in human spatial knowledge. We propose several existing qualitative spatial calculi to formally represent the spatial aspects, suggest appropriate methods for applying them, and evaluate the proposed representations for alignment of sketch maps and metric maps. Keywords: cognitive qualitative representation, sketch map, qualitative spatial reasoning, sketch alignment.
1 Introduction Sketch maps are an intuitive way to express human spatial knowledge about the environment. They contain objects which represent real world geographic features, relations between these objects, and oftentimes symbolic and textual annotations [4]. These elements enable us to use sketch maps to communicate about our environments and to reason about our actions in those environments. In this way, sketch maps provide an intuitive user interaction modality for some geospatial computer applications [9]. Especially with the advent of Volunteered Geographic Information (VGI) [13], sketch maps may be the key to removing some of the barriers imposed by the technical requirements of traditional Geographic Information Systems (GIS) as noted by [27]. Sketch maps, however, do not have a georeferenced coordinate system. Therefore, in order to allow users to contribute and query geographic information using sketch maps, an automated system must be able to analyze them and establish correct M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 20–39, 2011. © Springer-Verlag Berlin Heidelberg 2011
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
21
correspondences between elements of a sketch map and elements of other spatial data sources [35], be they sketch maps or metric maps. The analysis involves extracting and characterizing useful information, such as depicted objects, from a sketch map and establishing correspondences involves describing the relationship between elements of the sketch map and elements of the other data source. This is also known as alignment if the spatial relations among the elements are of primary interest. For a system to perform the tasks described above it must have models that support cognitively plausible sketch map representations. Because human spatial thinking is inherently qualitative [12], such representations may also be expected to be qualitative in nature. Indeed many approaches to sketch map representation [9, 11, 16, 28] attempt to capture some qualitative aspects of the sketch maps by abstracting away from the geometric information. Qualitative representation of spatial knowledge involves representing only the relevant distinctions in a spatial configuration. For example, orientations with a predominantly northerly heading can all be regarded as belonging to the qualitative orientation “North”. Qualitative representations together with logical and algebraic mechanisms for performing some useful computations on them form what are known as qualitative calculi [8] and their study as Qualitative Spatial Reasoning (QSR). It has been noted that the most useful aspects of space from a QSR perspective are topology, orientation, and distance [25]. However the cognitive reality of sketch maps requires consideration of the reliability of every aspect of space used. For instance, it is known that sketch maps have inconsistent scale and perspective [4]. This is in part due to omissions, simplifications, exaggerations and other types of distortions introduced at the different stages of observation, perception and memorization of spatial information [33, 34]. These factors must therefore be taken into account. This paper proposes a set of formal qualitative spatial representations for sketch maps that minimizes the effects of cognitive distortions during alignment. Each representation captures an aspect of sketch maps that is likely to be represented correctly with respect to a metric city map. Only sketch maps of urban areas were considered in the studies reported in this paper. The next section briefly reviews sketch map representation and alignment methods. In section 3 results of an empirical study into criteria for obtaining a cognitively plausible sketch map alignment are presented. Based on the identified criteria five qualitative calculi have been used to formalize the spatial configuration information of objects in sketch maps. The resulting representations are discussed in section 4 and an evaluation of their application on three sample sketch maps is discussed in section 5. Section 6 concludes the paper with a summary and outlook on future work.
2 Background 2.1 Alignment of Spatial Information from Sketch Maps Alignment of spatial information requires identifying two spatial configurations or so called scenes [23] that are similar. The central question in spatial scene similarity is how to establish the associations between the elements of one scene and those of another scene. Here, we describe two approaches developed recently. Both were proposed for alignment of spatial sketches or sketch maps with metric maps.
22
M. Chipofya, J. Wang, and A. Schwering
Spatial scene similarity [22, 23] applies spatial alignment as part of a query procedure. They seek to align the structure of a query scene with that of another scene. A spatial scene query comprises a set of spatial objects and relations between the objects. A query is formulated as a spatial constraint satisfaction problem (CSP). The evaluation of the query then involves finding configurations in the database that satisfy all the constraints of the query. This is achieved by constructing an association graph which consists of a set of pairs of variables (objects in the query and database). The set of pairs are the nodes of the association graph, while the set of combined constraints become the edges of the graph. The final solutions to the query comprise all maximal complete subgraphs (maximal cliques) of the association graphs. Qualitative Matching is a similar approach suggested in Wallgrün et al. [35]. It represents a sketch map as a set of qualitative constraint networks (QCN) aspect by aspect. Each QCN is based on a specific qualitative spatial calculus. A matching problem is then defined as follows: for each possible pairing of nodes from one QCN with those from the other, find all consistent combined QCNs that satisfy the constraints from both original QCNs. 2.2 Formal Representation of Sketch Maps Both of the above methods can be used to align sketch maps over several aspects of space and suitable representations of the sketches are required for the alignments to be performed. In [9] Egenhofer used topological and directional relations in sketches to formulate geospatial queries for real spatial databases. Additional semantics of the spatial relations in the sketch can be obtained by a quantification of the extents to which pairs of objects interact with respect to each given relation [10]. Forbus et al. [11] consider a sketch as being composed of logical units called glyphs. A glyph has two components, the geometry, which is what the user draws, and a conceptual entity which refers to the concept implied by the geometry. The model considers three main types of spatial relations: positional relations given by cardinal directions (South, East, North, and West), adjacency relations captured by the Voronoi diagram of the outer contours of the glyphs, and topological relations computed between bounding boxes of the glyphs. This information can then be used, for a specific domain, to infer other information such as visibility (camouflaged or contrasted from background). In Kopczynski and Sester [16] concepts such as “street” or “park” represent objects in the sketch map. Objects and spatial relations are embedded in a conceptual graph structure and used to generate spatial queries. 2.3 Cognitive Aspects of Spatial Knowledge All the above cited methods attempt to capture some abstract, qualitative aspects of the sketch maps by abstracting away from the geometric information. However, they ignore the influences from human spatial cognition on sketch maps, and as a result fail to explicitly account for schematizations and distortions from cognitive errors [33, 34]. In general, sketch maps are characterized by inconsistent spatial scale and perspective and contain schematizations and distortions from human spatial cognition. The following gives an overview of important findings about how humans observe and perceive their environments: people make systematic errors in judging orientation of spatial objects that are located in different geographical or political units [29];
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
23
distortions due to perspective and due to landmarks are common in spatial judgments, e.g., distances between near spatial objects are considered relatively longer that distances between far away objects [14]; ordinary buildings are judged closer to landmarks than the other way around [18, 26]; routes are judged longer with more turns, more landmarks [31] or more intersections; spatial information is simplified in the cognition process: angles tend to be perceived more rectangular [15] and curved features are perceived straighter [6, 19]. While sketch maps are used for representing spatial information from human memories, such cognitive errors appear very often and have negative influences on sketch map accuracy [36] making direct sketch map alignment unreliable. In addition, omission of information and inclusion of extra information in the form of symbols and annotations occurs quite often [32] and contributes to the difficulty of automating sketch map alignment. Thus, a cognitively plausible representation for sketch maps is necessary. The abovementioned representations of sketch maps do not provide empirical evidence for the qualitative or quantitative aspects reflected in the representations and also do not evaluate the cognitive adequacy of the chosen representational formalisms. In contrast, the representation proposed in this paper is motivated by cognitive insights into human spatial cognition and thinking obtained in the empirical study described below.
3 Empirical Study: Criteria for Sketch Map Alignment To succeed in aligning a sketch map and its corresponding metric map, relevant sketch aspects for alignment are required. A list of such sketch aspects might constitute sufficient criteria for performing sketch map alignment with some success. To this end, an empirical study 1 was conducted to investigate relevant aspects of sketch maps. The study was divided into two phases: first, during the experiment, participants were asked to draw three locations from their memory on the paper; second, sketch maps were compared with metric maps while six sketch aspects were analyzed. In the end, a list of sketch aspects was concluded as the criteria for sketch map alignment and used to develop suitable formal representations for the task. 3.1 Experiment Participants. There were in total 25 university students joined the experiment with the age range between 19 and 29 (average age of 23 years with a standard deviation of 2.4 years). Among these 25 participants, 14 are male and 11 are female. All the participants joined the experiment gratuitously and assured to have no specific knowledge in cartography and geography and also no particular advanced skills in art. Though none of the participants are residents of the locations, all of them are familiar with the areas sketched by frequent visit by foot or vehicle. During the experiment, participants were asked to produce sketch maps with as much detail as possible but only from memory. There was no time limit during the sketching task, since this might pressure the participants and could influence the final sketch map quality. 1
This experiment was conducted in the Spatial Intelligence Lab at the Institute for Geoinformatics, University of Muenster. For a detailed description of the experiment we refer the reader to [5].
24
M. Chipofya, J. Wang, and A. Schwering
Materials. Each participant was given a piece of DIN-A4 sized sheet of paper and a black pen. Rules and other assisting drawing and measuring tools were not allowed. Before participants producing sketch maps, a sample sketch map was shown to give an example of how a sketch map could look like. Locations. There were three locations that participants were asked to sketch. All locations were urban areas with paved and built up regions. Besides straight and curved main streets, side streets and various types of buildings as shops and restaurants, the locations also contain natural areas as lakes or grasslands. The area of location I is a part of the inner city of Brueggen with landmarks such as lakes like “Laarer See” and “steep hill” as the sketching boundary; the area of location II is along the route which is across the pedestrian zone of Brueggen from the south to the north; the area of location III is in the city of Muenster which has the “LudgeriKreisel” as its center and with a radius of approximately 1km. Location I and III are with the similar size of 1.5km2 and they were sketched as survey maps. Location II has a route with an overall length of 700m and was sketched as a route map. The time that participants spent and the sketch maps that we received are shown in Table 1. Table 1. Time that participants spent and the sketch maps received in the experiment Location I Time Ave (StdDev) 21.7min (11.4) Total sketch maps analysed 12
Location II 16.3min (9.9) 12
Location III 14.8min (4.6) 5
3.2 Methodology of Sketch Map Comparison Sketch maps of each location were analyzed and compared with the corresponding metric map 2 . Six sketch aspects were analyzed during the comparison procedure. These sketch aspects were derived from the former studies of sketch map analysis and comparison [36], and they are related to sketched objects such as landmarks and street segments as well as binary relations sketched such as the topological, directional and order relations of sketched objects. In the context of sketch maps, sketched objects and sketched relations refer to: Objects Sketched. Landmarks are defined as subjective points of interests, which are the most memorable spatial elements to the participants [5]. So, landmarks could be any spatial objects people draw except for streets. Street network refers to a collection of street segments that connect pairs of junctions. A street segment is a piece of a street which is a linear feature for the means of travelling by foot, bike or vehicle whereas a junction is a specific location with which one or more street segments connect. Furthermore, the city block is defined separately out of the street network. A city block is a part of a street network and it provides more possibilities during sketch map alignment. It can be defined as either an open or a closed area that are surrounded by connected street segments. Relations Sketched. During sketch map comparison, topological relations were calculated between a landmark and a city block. The street network orientation refers 2
The Deutsch Grundkarte 1:5000 (DGK 5) was used as the metric map source.
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
25
to directional relations of pairs of adjacent street segments. Order relations refer to two kinds of relations: one is the order of landmarks along a linear reference frame, which can be a street or a path-extent like water body; the other is the cyclic order of the adjacent landmarks around a junction. In detail, the six sketch aspects are: topology of street network, street network orientation, order relations of landmarks along a street, order relations of landmarks around a junction, topology of city blocks and containment of landmarks in a city block. 3.3 Resulting Alignment Criteria The valid sketch maps of three locations were analyzed manually. All the calculations were conducted between adjacent spatial objects. We got a larger set of sketch aspects, which might be the criteria for alignment. Among them, the six sketch aspects that we defined are with the highest accuracy rate (see Table 2). Topology of street network represents the pattern of interconnection of street segments and junctions of a street network. The street network sketched is usually simplified and incomplete while it is much more complicated in the reality. This sketch aspect was calculated in the extracted street network graph with nodes representing junctions and edges representing street segments. Despite the missing and extra junctions and street segments, the connectivity of street segment and junctions sketched was with 100% accuracy for all three locations. Street network orientation was calculated between a reference street segment and its adjacent street segments, i.e., the adjacent street segments and the reference street segment share the same junctions. A qualitative orientation model that divides the space into two regions, which are right and left, was applied in the calculation. This directional model is built in a way that it has its reference orientation line formed by a pair of two points, which are the start point3 and the end point4 derived from of the pair of two junctions of the reference street segment. The oriented reference line is always pointing from the start point to the end point. In our case, accuracy rate of direction relations was calculated separately of participants depending on what street segments they drew. Table 2. Results of sketch map comparison Accuracy rate Location I Location II Location III Street network topology 100% 100% 100% Street network orientation 100% 100% 100% Order along a route 100% 94% 100% Order around a junction 100% City block topology 100% 100% 100% City block containment 100% 99% 100% To calculate order of landmarks around a junction, at least two adjacent landmarks being around a junction is required. Location II and III do not fit this basic requirement.
3
The definition of the start and end points is not arbitrary. In route map, for each street segment, the start point is close to the origin while the end point is close to the destination. In survey map, the definition is varied depending on the calculation.
26
M. Chipofya, J. Wang, and A. Schwering
The order of landmarks along a street shows high similarity with the metric maps. For both location I and III, 100% of all the landmarks sketched along the selected streets were placed in a correct order. For location II, along the route sketched, 94% of all the landmarks were correctly placed. In our case, missing landmarks were exclusive in the calculation of order relations. During the experiment, we found that most of the landmarks were sketched either along the main streets or around junctions. Order relations of landmarks around junctions could also provide possibility for sketch map alignment. Location I was analyzed for this order relation and showed 100% match with its corresponding metric map. The topology of city blocks is also represented in a high accuracy for all three locations. City blocks appear quite often while sketching. Although no participant sketched the complete street network, for the sketched street network, nearly 100% of the city blocks were represented by correctly connected junctions and street segments. In the end, the containment of landmarks in city blocks also shows its reliability for sketch map alignment. For containment analysis, the experiment locations were split up into city blocks that have their scales varied among participants. Though smallscale city blocks formed by side streets were not sketched by all the participants, landmarks were still correctly placed in the relatively large-scale city blocks formed by aggregated street segments. For location I and III, 100% of the landmarks were placed correctly in the city blocks, and for location II, this accuracy rate is 99%.The results show that all the above six sketch aspects always show high similarity (>90%) with the corresponding aspects in metric maps. Also they appear quite often in sketch maps. The six sketch aspects were proofed in this empirical study that participants seldom made mistakes while sketching these aspects and they can be reliable criteria for sketch map alignment.
4 Formal Representations of Sketch Maps for Alignment In order to reason about sketch maps, spatial information corresponding to the six criteria described above is represented using formal qualitative spatial calculi developed by the QSR community. A quick look at the six criteria indicates that the street network is of primary importance because all the other criteria depend on it. So the first step is to characterize the street network topology and relative orientations of street segments. Formal representations for the remaining four criteria are based on this characterization of the street network as a collection of street segments with some of their end-points coinciding at junctions to form a network topology. Wherever necessary, the end-points of a street segment will be distinguished as the start-point and end-point with regard to one of its two orientations. 4.1 Street Network Topology The topology of street network can be captured using the connectivity information of the corresponding graph as is usually done in GIS [17]. However, a more explicit structure, DRA7 which is a coarsened version of the dipole relation algebra (DRA) of Moratz et al. [20], was introduced in Wallgrün et al. [35] and captures the topology of sets of oriented line segments. The oriented line segments are also known as dipoles
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
27
[20]. A dipole is an ordered pair of points in ℝ2 which can be written as a = (as, ae), where as and ae are the start- and end-point of a respectively. A basic DRA relation between two dipoles A and B is represented by a 4-tuple of facts sBeBsAeA where sB is the position of the start-point of dipole B with respect to dipole A. The other three elements of the relation eB, sA, and eA are defined analogously. For DRA7 the possible positions of the start-/end-point of one dipole with respect to another dipole are s (coincides with the start-point), e (coincides with the end-point), and x (coincides with neither start-point nor end-point). In the proposed representations, the start- and end-points of street segments are used to define the dipoles. For example in Figure 1 the junctions B, C, D, E, and G define the dipoles BC, CB, GC, CG, DC, CD, DE, and ED. The relations for (BC, CG), (CG, BC), (CB, CG), and (BC, DE) are exxs, xsex, sxsx, and xxxx respectively. Note that in this example the dipole relations can be derived directly from the labeling of junctions. The seven basic relations of DRA7 include the four listed above, sese, eses, and xexe. Together the DRA7 basic relations capture the connectivity information of street segments and therefore the topology of the street network in sketch maps.
Fig. 1. Dipole representation of the street network: (a) identified junctions in the sketch map are connected with straight lines, and (b) the result is a graph that partitions the space into city blocks – see Section 4.5
4.2 Street Network Orientation (Relative Orientation of Street Segments) The relative orientations of street segments with respect to each other are locally more accurate between adjacent street segments. For a formal representation of this information the Oriented Point Relation Algebra (OPRAm) introduced in Moratz et al. [21] is used. OPRAm is an orientation calculus over which information can be represented at variable granularities determined by the parameter m; there are 4m possible distinctions between any two orientations. The primitive entities of OPRA are oriented points (o-points) in the plane defined as a pair (P, φ) where P is a point given by its Cartesian coordinates (x, y) and φ is an orientation in ℝ2. For two o-points A and B, an OPRAm relation is distinguished by the granularity of representation m,
28
M. Chipofya, J. Wang, and A. Schwering
the position j of A with respect to B, and the position i of B with respect to A written as A m<ji B. The points A and B have orientations given by φ (A) and φ (B) resp. such that the values of j and i encode both the relative positions and the point orientations of A and B with respect each other. Like other formal representations of relative point orientations, OPRAm is based on a partition of the full circular rotation in the ℝ2. The plane is partitioned into 2m planar sectors and 2m linear sectors. The main advantages of the OPRA calculi are that they take into account the orientations of the concerned points, they allow the distinction of relative orientations of collocated points, and they exist at different granularities. With the coarsest level, OPRA1, the relative positions of junctions and end-points of streets are encoded for each orientation along the street segments. That is, for every street segment end-point the orientation towards the opposite end-point of the street segment defines an o-point. At junctions, o-points corresponding to street segments incident on it are collocated o-points. OPRA1 has 20 basic relations and four possible relative positions front (0), left (1), back (2), and right (3) – see Figure 2c. The label of an o-point oriented along a street segment is given a superscript with the label of the street segment. For example, at junction C in Figures 2a, 2b, and 2c, the o-point oriented towards junction D is given the label CCD. A relation between a pair of o-points (e.g. BBC and CCD in Figure 2c), gives their relative locations based on their intrinsic orientation particularly distinguishing whether one is on the left or on the right side of the other. But to capture the relative orientation of an outgoing street segment requires the relative orientations with respect to both the preceding and succeeding segments.
Fig. 2. OPRA1 relations in sketch maps. The three o-points B and C in (a) and (b) have OPRA1 relations given in (c). OPRA1 is used to determine the relative orientations of outgoing street segments at junctions (a), (b), and (d).
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
29
Given four junctions A, B, C, and D in general position, three o-points AAB, BBC, and DBD, and i, j, k integers. Let AAB<ji DBD and BBC< k BBD. We restrict the values of i to {0, 1, 3}. Then any assignment of values to the variables i, k describes the configuration of a junction with three incident street segments AB, BC, BD. To determine whether BD is oriented towards the left or right of ABC the four possible relations between AAB and BBC are considered separately. If AAB<00 BBC then it is impossible to decide whether BBD is located to right or left of ABC. For the other three relations, the orientation o of BD is given by: AAB <10 BBC then o = left if i = k = 1 and o = right otherwise A
AB
2
<0B
AB 3
A < 0B
BC
BC
then o = left if i = k = 1 and o = right if i = k = 3
then o = right if i = k = 3 and o = left otherwise
(1) (2) (3)
This information corresponds to how correctly people place junctions along a street. The devised representation is consistent with the results in section 3 which showed that for a given heading, people place out-going street segments at junctions along a street on the correct side of the street. 4.3 Order of Landmarks Along Streets Landmarks in the sketch map are vectorized and approximated by polygons. The Interval Algebra (IA) of Allen [1] is used to represent ordering information between landmarks along a specific path in the street network. The term path is used here analogously to the term simple path in graph theory, i.e. a sequence of junctions such that from each of junction in the sequence there is a street segment to it and the next junction in the sequence.
Fig. 3. Landmarks adjacent to street segments can be determined using the voronoi diagram of the street segments. Here dashed lines represent some edges of the voronoi diagram of some of the street segments in Figure 1.
Only landmarks adjacent to the path are considered. For this, it is only required that for a given path, landmarks proximal to the path include landmarks closer (or equally
30
M. Chipofya, J. Wang, and A. Schwering
close) to street segments of the path than to segments that are not part of that path. In particular, suppose S is the sketch map, N(S) a representation of the street network in sketch map S, and GV(N(S)) the Generalized Voronoi Diagram [2] generated by the street segments of N(S) and the border of S. Then, if a landmark is adjacent to a path P, some of the landmark’s points must lie in the Voronoi region r(t) ⊆ GV(N(S)) of some street segment t of P (Figure 3). As such, only landmarks that have some part of them close enough to the path are taken into account. There is a function l(x) which returns the distance of a point x on a path P from the path’s start point sP. So l(sP) = 0 and l(eP) = LP where eP and LP are the end-point and length of P respectively. For each point x of P (0 < l(x) < LP), if the perpendicular at x intersects a point belonging to the geometry of an adjacent landmark, say A, for the first time (no point of A was encountered by perpendiculars of P’s points lying before x), then mark l(x) as the start of the interval IA along P corresponding to A (l(x) = sA). If y is the last point on P whose perpendicular encountered a point of A, mark l(y) as the end point eA of IA. A simplifying condition employed for now is that whenever a start-point of a landmark is encountered, the most recent point of that landmark previously encountered, if any, must have been encountered in the immediately preceding street segment. In that case the previous interval corresponding to the landmark is extended to continue from the recently encountered point. The relations between the intervals corresponding to landmarks together with the path become the ordering information as shown in Figure 4.
Fig. 4. Landmarks are ordered as intervals defined by their perpendicular projection onto street segments. Disjoint intervals corresponding to a landmark at a junction are combined into a single interval.
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
31
When traversing a path, the order of landmarks is established by observing the intersections between the line perpendicular to each point of the path and adjacent landmarks so that the ordering information is defined by the projection of adjacent landmarks onto the path (Figure 4). At the beginning of the path there are two choices. Either the perpendicular at the start-point intersects a landmark or it does not. If it does, then it has to be checked whether the landmark starts before or at the start of the path. The path end-point is handled analogously. 4.4 Order of Landmarks around Junctions Three possibilities were investigated for representing the order of landmarks around junctions. The first based on point ordering information was rejected because it does not account for the spatial extension of sketched objects. The second, based on representing the portion of an object lying in a certain orientation from an observation point takes into account more aspects than required. The representation selected explicitly accounts for the spatial and angular extensions of landmarks. It is based on the algebra of cyclic intervals [24]. The Cyclic Interval Algebra (CIA) expresses relations of intervals (called c-intervals) on a circle based on Allen’s IA relations. In addition to 12 of the 13 basic IA relations (before and after are merged into a single relation that is analogous to DC in RCC), CIA includes four more relations accounting for the cyclic nature of the embedding space. We use CIA to formally represent the order of landmarks around junctions.
Fig. 5. Cyclic intervals of two landmarks around a junction give the order in which one would see them standing at the junction and turning anti-clockwise around 360° (objects from the sketch map in Figure 1).
The approach taken is similar to the one given in the last section. The reference point is a junction j. For a landmark adjacent to a street segment incident to j, the cinterval of the landmark is its projection onto a (unit) circle centered at j. The projection is given by sweeping the 360° view at j in the counter-clockwise direction and recording the positions of the sweep line on the circle whenever a start- or endpoint of a landmark is encountered. The arc from the mark of the start-point to the
32
M. Chipofya, J. Wang, and A. Schwering
mark of the end-point is the c-interval of the corresponding landmark. The c-intervals are defined as closed intervals on a real circle. Figure 5 shows the c-intervals corresponding to two landmarks around a junction. 4.5 Topology of City Blocks Like landmarks, city blocks in the sketch map are stored as polygons and their topological relations are interpreted as relations in RCC-8 [7]. We use the procedure presented in Bennett et al [3] to evaluate the RCC-8 relations of the set of polygons derived from the sketch map. In that approach, polygons are represented as terms in a Closure Algebra 4 (CA). Each term is the intersection of finitely many half-planes formed by the lines passing through adjacent pairs of vertices of the polygon. The corresponding RCC-8 relations are derived from the terms of the CA by testing the emptiness of their intersections. City blocks are delineated by street segments. The outline of a city block is formed by street segments and sections of the border of the drawing surface adjacent to it. The city block is first represented as a polygon whose vertices include end-points of the street segments adjacent to it and any nodes at the border of the drawing surface. Because city blocks are defined based on street segments, and because the street network in a sketch map is generally incomplete, sketch maps do not contain many closed city blocks. In order to maximize the number of city blocks detected, street segments need to be extended to the border of the drawing surface. The procedure to do this extends all street segments with end-points oriented towards the border of the drawing surface and with no other objects in their path at the same rate. This is done until either the border is encountered or another street segment extension is encountered. The end-point of the extension becomes the new end-point for the extended street segment. It is worth noting at this point that while the street network defines the topology of city blocks, these two structures are not dual to each other since some street segments are not part of the city block topology. By their definition, no two city blocks can overlap and a city block is constituted by a connected region. The adjacency relations between city blocks are represented using an adjacency matrix [30]. Each boundary street segment in the sketch map corresponds to a value of 1 in the intersections of the rows and columns of the matrix for the city blocks which it borders. The adjacency matrix is important both for the alignment task and for determining when city blocks can be aggregated. When comparing sketch maps, it is common to encounter aggregated city blocks due to their incompleteness. Intuitively, two externally connected (EC) regions can only be aggregated if their intersection is not a single point (a 0-meet). 4.6 Containment of Landmarks in City Blocks: Topological Relations of Landmarks Landmarks are located in city blocks. The topological constraints on landmarks and city blocks together allow us to partially constrain the possible locations of the landmarks. While city blocks are non-overlapping, a landmark may overlap several 4
A CA is a Boolean Algebra augmented with a closure operator. For details refer to Bennett et al. [3].
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
33
city blocks. The RCC-8 relations between landmarks are derived the same way as those for city blocks. The RCC-8 constraints between landmarks and city block form the QCN for topological information of a sketch map. A landmark can be part of a city block but not the other way around. Again this constraint is difficult to represent formally in a general theory like RCC.
5 Evaluation of the Formalization The evaluation of the proposed representational models was performed using a set of three test sketch maps (we will call them S1, S2, and S3 – Figures 7-9 at the end of this paper). All three sketch maps depict the same area. The qualitative representations derived from the sketch maps were compared with the same information derived from a metric map (RM) of the sketched area (Figure 10). For each sketch map, a hypothesis of the matching elements between the sketch map and RM was initially generated based on a visual analysis. A match hypothesis is an a priori mapping of objects in the sketch map to those in the metric map. The hypothesis was then used to compare the constraints in the sketch map representations to those in RM representations. The orientation of the street network (Section 4.2) was not evaluated because a visual analysis confirmed that the choices left or right were always correct. 5.1 Street Network Topology As expected, the sketch maps contained less information than RM. A total of 34 street segments and 17 junctions were identified in RM as compared to 11, 13, and 21 street segments and 5, 5, and 9 junctions for S1, S2, and S3 respectively. Table 3 shows a summary of the evaluation. In all but two cases the corresponding junctions were not present in the sketch maps because one or more street segments forming the junction were omitted. For the other two cases the junctions were omitted because the street segments forming the junction were not completed and therefore did not have a common end-point. For this same reason some street end points in the sketch maps could not be matched with street end points in RM. These corresponded to intermediate points of the matching street segments in the RM. A possible solution to representing the just stated fact is to include the DRA position i in the set of acceptable position labels of DRA7 during alignment. The label i says that the start/end point of a dipole lies inside another dipole (which point is being referred to depends on which of sB, eB, sA, or eA gets the label i). Table 3. Summary of alignment of dipoles generated from S1, S2, and S3 with dipoles from RM
No. of matched dipoles
S1 11
S2 10
S3 21
No. of conflicting dipole constraints No. of dipoles corresponding to aggregated street segments in RM Ave. No. of street segments per aggregated segment
0 1 3
0 2 2.5
2 3 2
34
M. Chipofya, J. Wang, and A. Schwering
Some street segments in the sketch maps matched an aggregated group of street segments in RM under the given match hypothesis. Aggregation of street segments was done by visual inspection. This cannot immediately be inferred in DRA7. Overall evaluation shows that there are very few conflicts between relations from the sketch maps and those from RM (see Table 3). The representations of the sketch maps were significantly accurate and suggest that the street network was an important part of the frame of reference used during sketching. 5.2 Order of Landmarks Along Streets Interval relations for landmarks along a path on a single street matched exactly for two of the three tested straight streets. On the other hand, for one path composed of segments from three different streets (with a sharp angle at the two junctions), the interval relations for landmarks were not entirely consistent with RM (Figure 6). But relations between non-adjacent landmarks were always consistent (intervals 1 and 3 in Figure 6). The inconsistent relations were always within a conceptual neighborhood distance of at most three relations with the largest distances occurring around the junction. The general the order of landmarks from one city block to another was always correct.
Fig. 6. Schematic of inteval relations of four landmarks (shown in Figure 4 for S1). The original intervals from Figure 4 correspond to intervals labeled 1 and 2 respectively.
5.3 Order of Landmarks around Junctions Interval relations for landmarks around a junction were tested for one junction (Figure 5) in RM, S1, and S2. S3 had no landmarks around the selected junction. Table 4 summarizes the results. For the tested junction there was only one conflict. It is likely that the distinguishable categories for this case are coarser than those of CIA because the positioning of landmarks around a junction depends on other factors as well. Table 4. Summary of alignment of order relations at a junction generated from S1, S2 with corresponding relations from RM No. of laandmarks arounnd junction No. of laandmarks match hed with RM No. of conflicting relatiions
RM 5 5 0
S1 4 3 0
S2 4 3 1
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
35
5.4 Topology of City Blocks As with the street segments, fewer city blocks could be obtained from sketch maps than from the RM. This was the result of incomplete street information. But once the dipole extension strategy described in section 4.4 above was applied, there was a dramatic increase in the number of city blocks obtained from sketch maps (See table 5 below). Nonetheless, the number of city blocks in RM that were aggregated to match a city block in the sketch map was large. The one inconsistency between the adjacency matrices of S3 and RM was a result of street segments not meeting at a junction but both ending at the boundary of the drawing surface. Table 5. Summary of comparison of topological relations for S1, S2, S3, and RM
City blocks from original dipoles City blocks from extended dipoles Aggregated city blocks in RM based on match (Ave. no. of city blocks per aggregate) Conflicting adjacency relations
RM 13 17
S1 2 6
S2 2 5
S3 2 12
0 (0)
4 (3.8)
2 (7)
3 (2.7)
0
0
0
1
Fig. 7. Sketch map S1
5.5 Containment of Landmarks in City Blocks: Topological Relations of Landmarks The RCC-8 relation that could be obtained between landmarks was only DC and between landmarks and city blocks was NTTP. Relations between objects in the
36
M. Chipofya, J. Wang, and A. Schwering
sketch maps matched with the corresponding relations in RM in all cases. It is worth noting here that our definition of city blocks based straight lines connecting pairs of junctions represents an overly simplified view of sketch maps and may not be suitable for more complex sketches with curved street objects. It is in part due to this simplicity that there were no landmarks overlapping more than one city block.
Fig. 8. Sketch map S2
Fig. 9. Sketch map S3
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
37
Fig. 10. Image of metric map RM. The original map was an extract from the Deutsch Grundkarte 1:5000 (DGK 5).
6 Conclusions and Future Work This paper presents a set of qualitative spatial representations for sketch maps to be used for aligning them with metric maps and possibly other sketch maps. Six aspects of sketch maps have identified as being robust against typical schematizations and distortions in sketch maps. For each aspect a representation has been proposed and five of these were evaluated. The evaluations show that all the proposed representations can be used reliably for making comparisons in some scenarios but they still lack the flexibility required to deal with sketched data. It was observed for example that sketch map data needs to be represented at a coarse level but at the same time for purposes of aligning them to other data those representations need to be expressive enough to introduce new distinctions when necessary. Matching of street segments is a case in point: incomplete street segments must be compared with complete and aggregated street segments. In the present work the focus was on resolving problems arising from cognitive distortion and schematization. But other problems exist that need to be addressed as well. Spatial aggregation of geographic features represented in sketch maps is particularly common during sketching. Every sketch map contains aggregated objects, perhaps, because as human beings we each tend to carve up the world in our own different ways. As such there is need to develop methods for explicitly expressing aggregation as an operator on primitive entities for the different spatial calculi. This forms part our future work. Then the effects of missing or additional information also need to be addressed. If a sketch omits a street segment, it may lead to the appearance of a new city block corresponding to the aggregate of the city blocks that were separated by the missing street segment. Another consideration for the future is to be
38
M. Chipofya, J. Wang, and A. Schwering
able to represent the curviness of a street and characterize turns at junctions as either straight on or some other turn direction within a formal theory. Currently we find that this is lacking. Finally in this work we considered only urban sketch maps. The last consideration we make looking forward is to investigate suitable representations for sketch maps in different settings such as rural areas.
References 1. Allen, J.F.: Maintaining knowledge about temporal intervals. J. Commun. ACM 26, 832–843 (1983) 2. Aurenhammer, F.: Voronoi diagrams – a survey of a fundamental geometric data structure. ACM Computing Surveys 23, 345–405 (1991) 3. Bennett, B., Isli, A., Cohn, A.: A system handling RCC-8 queries on 2D regions representable in the closure algebra of half-planes. In: Mira, J., del Pobil, A., Ali, M. (eds.) IEA/AIE 1998. LNCS, vol. 1415, pp. 281–290. Springer, Heidelberg (1998) 4. Blaser, A.: Geo-spatial Sketches. University of Maine, Department of Spatial Information Science and Engineering and National Center for Geographic Information and Analysis (1998) 5. Blut, C.: Spatial Analysis of Sketch Maps: The effect of Spatial Cognitive Distortions on Qualitative Mapping of Sketch Maps to Metric Maps. University of Muenster (2010) 6. Chase, W.G., Chi, M.T.H.: Cognitive skill: Implications for spatial skill in large-scale environments. University of Pittsburgh, Learning Research and Development Center (1979) 7. Cohn, A.G., Bennett, B., Gooday, J., Gotts, N.M.: Qualitative Spatial Representation and Reasoning with the Region Connection Calculus. GeoInformatica 1, 275–316 (1997) 8. Cohn, A.G., Hazarika, S.M.: Qualitative Spatial Representation and Reasoning: An Overview. Fundam. Inform. 46, 1–29 (2001) 9. Egenhofer, M.J.: Query processing in spatial-query-by-sketch. Journal of Visual Languages and Computing 8, 403–424 (1997) 10. Egenhofer, M.J., Dube, M.P.: Topological Relations from Metric Refinements (2009) 11. Forbus, K., Usher, J., Chapman, V.: Qualitative Spatial Reasoning About Sketch Maps. In: Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence (2003) 12. Freksa, C.: Qualitative spatial reasoning. In: Cognitive and Linguistic Aspects of Geographic Space, pp. 361–372. Kluwer, Dordrecht (1991) 13. Goodchild, M.F.: Citizens as sensors: The world of volunteered geography. Geo Journal 69, 211–221 (2007) 14. Holyoak, K.J., Mah, W.A.: Cognitive reference points in judgments of symbolic magnitude. Cognitive Psychology 14, 328–352 (1982) 15. Kahneman, D., Tversky, A.: Intuitive prediction: Biases and corrective procedures. Defense Technical Information Center (1977) 16. Kopczynski, M., Sester, M.: Representation of Sketch Data for Localisation in Large Data Sets. In: XXth Congress of the International Society for Photogrammetry and Remote Sensing (ISPRS), Istanbul, Turkey (2004) 17. Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W.: Geographic Information Systems and Science. John Wiley and Sons, West Sussex (2005) 18. McNamara, T.P., Diwadkar, V.A.: Symmetry and Asymmetry of Human Spatial Memory. Cognitive Psychology 34, 160–190 (1997)
Towards Cognitively Plausible Spatial Representations for Sketch Map Alignment
39
19. Milgram, S., Jodelet, D.: Psychological maps of Paris. Environmental Psychology, 104–124 (1976) 20. Moratz, R., Renz, J., Wolter, D.: Qualitative spatial reasoning about line segments, pp. 234–238 (2000) 21. Moratz, R., Dylla, F., Frommberger, L.: A relative orientation algebra with adjustable granularity (2005) 22. Nedas, K.: Semantic similarity of spatial scenes. University of Maine (2006) 23. Nedas, K., Egenhofer, M.: Spatial-Scene Similarity Queries. Transactions in GIS 12, 661–681 (2008) 24. Osmani, A.: Introduction to Reasoning about Cyclic Intervals. In: Imam, I., Kodratoff, Y., El-Dessouki, A., Ali, M. (eds.) IEA/AIE 1999. LNCS (LNAI), vol. 1611, pp. 698–706. Springer, Heidelberg (1999) 25. Renz, J., Nebel, B.: Qualitative Spatial Reasoning using Constraint Calculi. In: Aiello, M., Pratt-Hartmann, I., Benthem, J.V. (eds.) Handbook of Spatial Logics, pp. 161–215. Springer, Heidelberg (2007) 26. Sadalla, E.K., Staplin, L.J.: The Perception of Traversed Distance. Intersections. Environment & Behavior 12, 167–182 (1980) 27. Schwering, A., Wang, J.: Sketching as Interface for VGI Systems. In: Geoinformatik 2010, Kiel, Germany (2010) 28. Skubic, M., Blisard, S., Bailey, C., Adams, J.A., Matsakis, P.: Qualitative analysis of sketched route maps: translating a sketch into linguistic descriptions. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34, 1275–1282 (2004) 29. Stevens, A., Coupe, P.: Distortions in judged spatial relations. Cognitive Psychology 10, 422–437 (1978) 30. Theobald, D.M.: Topology revisited: representing spatial relations. International Journal of Geographical Information Science 15, 689–705 (2001) 31. Thorndyke, P.W.: Distance estimation from cognitive maps. Cognitive Psychology 13, 526–550 (1981) 32. Tversky, B.: Some Ways that Maps and Diagrams Communicate. In: Spatial Cognition II, Integrating Abstract Theories, Empirical Studies, Formal Methods, and Practical Applications, pp. 72–79. Springer, Heidelberg (2000) 33. Tversky, B.: Navigating by mind and by body. Spatial Cognition III, 1033–1033 (2003) 34. Tversky, B.: How to get around by mind and body: Spatial thought, spatial action. In: Cognition, Evolution, and Rationality: A Cognitive Science for the XXIst Century. Routledge, London (2005) 35. Wallgrün, J.O., Wolter, D., Richter, K.-F.: Qualitative matching of spatial information. In: 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS 2010), pp. 300–309. ACM, New York (2010) 36. Wang, J.: How human schematization and systematic errors take effect on sketch map formalizations. Master Thesis, University of Münster (2009)
Scalable Navigation Support for Crowds: Personalized Guidance via Augmented Signage Fathi Hamhoum and Christian Kray Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
[email protected]
Abstract. Navigating unfamiliar places is a common problem people face, and there is a wealth of commercial and research-based applications particularly for mobile devices that provide support in these settings. While many of these solutions work well on an individual level, they are less well suited for very crowded situations, e. g. sports matches, festivals and fairs, or events such as pilgrimages. In a large crowd, attending to a mobile device can be hazardous, the underlying technology might not scale well enough, and some people might be excluded due to not having access to a mobile device. Public signage does not suffer from these issues, and consequently, people frequently rely on signage in crowded settings. However, a key disadvantage of public signage is to not provide personalized navigation support. In this paper, we therefore investigate augmented signage as a means to provide navigation support for large crowds. We introduce a scalable signage-based approach and present results from a comparison study contrasting two designs for augmented signage with a base case. The results provide initial evidence that such a system could be easily useable, may help to reduce task load, and has the potential to improve navigation performance. Keywords: Augmented signage, navigation support, crowd interfaces, user studies.
1
Motivation
Navigating in an unfamiliar environment can be a daunting task regardless of the means of transportation. Pedestrians traditionally rely on a variety of tools to support them in this task, e. g. on maps, static signage or on other people familiar with the area. In this context, various types of computer-based guide systems have been proposed (cf. e. g. [6,22,24]) and commercialized (cf. e. g. [1]). Most frequently, such systems take the form of mobile guides that dynamically display maps enriched with information such as the user’s current location and the proposed route to a target location [1]. Dynamic (public) signage is an alternative approach that has been proposed to be used either in conjunction with mobile devices or without [23,14]. While these systems have been shown to work well on an individual level (e. g. a single user navigating urban environments), some settings pose significant challenges for such approaches. In particular, if users wish to navigate in areas that are very crowded, several key problems may arise: M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 40–56, 2011. c Springer-Verlag Berlin Heidelberg 2011
Scalable Navigation Support for Crowds
41
– Difficulty of using a mobile phone in a crowd. Depending on the density of the crowd, it can be difficult (or even hazardous) to take out a mobile phone and to interact with it since people need to pay close attention to what is going on around them. The movement of the crowd may interfere with the interaction as well. – Lack of scalability of the underlying technology. If a large number of people try to access a wireless network (e. g. BlueTooth, WiFi, GSM) in a very confined area, the frequency of errors, low bandwidth, high latency or loss of connection can increase dramatically – up to the point where the system may become unusable. – Exclusion of some users. It is quite likely that a percentage of the crowd will not have a mobile phone at all: they might not own one at all, or not have brought it for fear of it being stolen or lost. Certain events, such as pilgrimages, might also deter people from using a mobile phone. In order to address these problems and to provide a system that large numbers of people can use simultaneously to get personalized navigation support, we are investigating the use of (dynamic) public signage as a means to deliver this type of information. In this paper, we present an approach that uses a spatial partitioning in combination with standard signage (either static or dynamic), which we augment with additional information to enable users to extract personalized directions. As an added benefit, the approach provides means to influence the flow of people through space. We also report on a user study we conducted to investigate different designs of augmented signage for navigation support. In the following, we first review related work before describing the approach and the corresponding interface designs. We then report on the study we carried out to evaluate these designs, before discussing the results and their implications. The paper concludes by highlighting key contributions and future work.
2
Background
The way in which people navigate in situ and how navigational instructions can be conveyed is an integral part of spatial cognition research. Relevant topics covered in this area include how mental models of space relate to different representations of space [26,27] and the modelling of route knowledge in general [11]. Further key aspects relate to the role and type of landmarks [12] and spatial relations [13] as well as how to best integrate them into navigational instructions [21] that people find easy to follow. Static signage [2] remains the most common means to assist people in navigating unfamiliar environments. When designed well and placed appropriately [28] it can be very helpful, and there is also evidence that it can reduce the cognitive effort required to navigate an unfamiliar environment [10]. In addition, static signage scales well – provided it is of sufficient size and place appropriately – and can also provide some rudimentary means of flow control, which is particularly important in the context of very dense crowds [9]. However, issues such as illegibility, ambiguity, inaccuracy, and unreliability [2] can greatly reduce its
42
F. Hamhoum and C. Kray
usefulness. Either way, a key drawback of static signage (and static maps such as you-are-here maps [16] as well) is their lack of adaptability to individual users and changing circumstances. Mobile guides [1,3,6] do not suffer from this problem. Using location information (that can be sensed fairly accurately e. g. via GPS or RF-technologies), high resolution displays and possibly also information about the owner of a mobile device, they can provide dynamic navigation support that is adapted to the individual user [4,15]. These systems however do require people to carry a sensor-equipped mobile device capable of running the necessary software, and their users might also have to attend to them frequently (e. g. to retrieve the next set of directions). In contrast, static signage is embedded in the real world and thus does not distract the users’ attention from their environment. The use of public displays as dynamic signage is an alternative option that has been investigated as a means to support navigation. While some approaches (such as the GAUDI system [14]) rely solely on these displays to provide navigation support, others combine mobile devices with public displays (e. g. the Rotating Compass [23]) to enable ’eyes-free’ use based on cross-modal (tactile) cues [19]. In the context of a very crowded setting, the former suffer from a lack of scalability in terms of personalizing the display content, whereas the latter face issues with respect to perceiving tactile cues as well as with the scalability of the underlying technology if large numbers of users were to use it. In summary, while all of the approaches described above do offer some benefits, there are also some key drawbacks that put their usefulness in question when applied to crowded scenarios. In particular, the need for mobile phones, the reliance on vibro-tactile cueing, and the limitation to single users undermine their use for supporting navigation for large and heterogeneous crowds. In the following, we therefore outline a design for a public signage system that takes some cues from these systems but avoids the shortcomings identified above.
3
Personalized Navigation Support for Crowds
A number of key requirements quickly emerge when considering how to provide personalized navigation support to large crowds. Obviously, a feasible solution needs to be easy to use and learn, ideally with little to no training. In addition, it is clearly appropriate to design a system that enables the largest possible percentage of people to benefit from it rather than excluding particular groups (e. g. people who do not own a mobile phone, do not speak the local language, or cannot read well). Consequently, it is desirable to minimise technical requirements (e. g. the use of specific sensors) to enable the use at a large scale and also at sites without much technical infrastructure. Furthermore, such a system should not function at the expense of other people, who are not using the system (for example, by interfering with the standard use of the locations, where it is being deployed). Finally, it would be beneficial if the system would allow for some degree of flow management, e. g. to avoid people overcrowding particular areas.
Scalable Navigation Support for Crowds
43
Our solution is inspired by a signage system described in Ender’s Game, a novel by Orson Scott Card [5], which uses color-coded lines to provide directions to different groups in a large training facility. The basic principle of using color codes that map to areas and routes also underlies our approach but instead of this coding scheme replacing the traditional signs, we embed the code into standard signage in order to preserve the original artefact and its function. The two key components of the proposed system are a spatial partitioning (of the real world) with a corresponding mapping to color codes, and augmented signage that embeds color codes into standard signs used for navigation. 3.1
Spatial Partitioning
In order to provide large numbers of people with personalized directions, it is worthwhile to consider where they might want to go to. In principle, every single person in a crowd might want to go to a different destination but in practice the number of actual destinations is often much smaller. For example, after a football match the spectators might want to go to a relatively small number of parking areas or public transport hubs. Similarly, if a concert hall has to be evacuated, there will be a limited number of assembly spots and escape routes that people have to follow depending on where they are at the time of the emergency. Finally, at large events such Olympic Games or pilgrimages, large groups of people might have the same goal (e. g. a particular stadium or site). Consequently, in many cases it will be sufficient to provide guidance to a relatively small number of destinations. These destinations in turn can be used to partition space. Figure 1 shows an example, where an inner city area around a football stadium has been partitioned into six areas that were then mapped to specific colors. Such a mapping enables not only the augmentation of existing signage (see 3.2) but also facilitates navigation as users only need to remember a particular color code rather than a sequence of names of intermediate and final destinations. In order to increase the number of possible partitions beyond the number of clearly identifiable colors, it is possible to assign symbols to sub-partitions. Figure 2 shows an example, where the large red area has been further divided into four subregions. Each subregion has been mapped to a symbol (e. g. triangle, heart) and thus people will need to remember (and follow) a combination of color and symbol (e. g. red circle). While the number of clearly identifiable symbols is also limited, by combining them with colors it is possible to create mappings that are more fine-grained; for example, using eight colours and ten symbols it is possible to directly address 80 areas. Since people need to memorize these codes, it is important to ensure different combinations can be easily distinguished, which in turn imposes an upper limit of how many combinations are feasible in practical use. Either mapping (color only or colored symbols) can then be embedded into signage as described in the following.
44
F. Hamhoum and C. Kray
Fig. 1. Example partition of an inner city area around a football stadium (marked by an ‘X’ on the map) - map generated from openstreetmap [20]
Fig. 2. Example sub-partition using four symbols to divide area corresponding to one color - map generated from openstreetmap [20]
3.2
Augmented Signage Design
One key concern in designing the system to provide personalized guidance to large numbers of users was to preserve the main function that public signage provides to all nearby people. We thus chose to augment existing signage rather
Scalable Navigation Support for Crowds
45
than replacing it with something that would only be readable (and therefore usable) by people who have been taught how to use it. Consequently, we propose a system where people can use visual cues that are embedded into standard signage to extract personalized information (i. e. how to get to a specific target location resp. partition). By remembering a simple code, individuals can look at a sign and use the embedded visual cues to infer which way they need to go in order to get to their desired target destination. The visual cue is constructed based on the color mapping and spatial partitioning described above. The ‘individual’ code could initially be acquired in several different ways: it could be printed on entrance tickets, it could be agreed on by a group of people who want to meet up after an event, or it could be transmitted by a dynamic display at strategic locations (e. g. on the main paths leading to the event). In this paper we are mainly concerned with the augmentation of standard signage and scalability, and therefore, we assume that people will be in possession of the correct code when they are looking at the signs. We created two designs for the augmented signage. The first one adds colored circles to items shown on a sign (see Figure 3, top left). Each color corresponds to a specific destination, e. g. ‘red’ might correspond to a specific Metro Station or city area. In order to infer the direction to follow at a sign, a user would first have to remember the color, then find the red circle on the sign, and finally follow the arrow shown next to the entry near the circle (regardless of whether that entry reads ‘Metro Station’ or not). For example, given the color red and the sign shown at the top left of Figure 3, a user would have to walk straight ahead to reach their destination. The second design combines symbols and colors (see Figure 3, top right), so that a user needs to remember, for example, ‘green heart’ or ‘purple square’. The process of extracting individual information is the same as with the first design. Using the example sign shown at the top right of Figure 3, a user following ‘purple cross’ would thus have to walk straight ahead to reach their destination. In either design, the cues could be changed dynamically, e. g. to provide updated directions, to control the flow of people, to include new destinations, or to multiplex multiple destinations (see section 5 for more detail). The user study reported in section 4 evaluates and contrasts these two designs against a base case (Figure 3, bottom left). 3.3
Flow Management
Since flow control and crowd management play an important role in ensuring the safe running of large events, it makes sense to consider augmented signage in combination with a spatial partitioning in this context. Even when disregarding dynamic signage (see section 5), the approach described above provides some beneficial features to shape the flow of people. Figure 4 provides an example illustrating this. The gray areas correspond to streets (which are numbered from one to four) and colored arrows to routes suggested via augmented signage. Assuming the destinations corresponding to red, blue and yellow are all located
46
F. Hamhoum and C. Kray
Fig. 3. Augmented signage designs (top row), base case (bottom left) and study setup (bottom right)
to the North of the streets shown, then in principle, every person walking along street four from the West could turn into street one. If street one is narrow, this could potentially create congestions. Using the color coding it would be possible to control where people would turn. In the example shown, the augmented signage placed at the intersection of street one and four would annotate the North-pointing arrow with red and the East-pointing arrow with blue, yellow and purple. The sign placed at the intersection of street two and four would add a blue annotation for the North-pointing arrow as well as include yellow and purple annotations for the East-pointing arrow. Finally, the augmented signage placed at the intersection of street three and four would add yellow to the North-pointing arrow while the East-pointing one
Scalable Navigation Support for Crowds
1
2
47
3
4
Fig. 4. Example illustrating flow management using color coding
would solely be annotated with purple. Therefore, only people following the red color code would turn at street one, whereas those following blue would turn at street two and those following yellow at street three – thereby distributing the crowd more evenly among streets one to three despite three out of four destinations being located in the same direction. It is worth noting that this would be difficult to achieve with standard signage without compromising its regular use as it would require the removal or redirection of some destinations on a number of signs. In the example given in Figure 4 a sign placed at the intersection between street one and four would have to align destination names in such a way that only a certain number of people would take street one while the majority of people would keep following street four. Depending on the topology and arrangement of destinations this is likely to result in longer than necessary routes in every day usage (i. e. non-crowded situations) and could also cause confusion as signs no longer direct people along the shortest routes.
4
User Study
In order to determine whether the designs described in section 3.2 would be acceptable and also to gather feedback from potential end users, we conducted a lab-based study contrasting a base condition with the two designs. More specifically, we were interested in finding answers to the following two questions: – Can people ‘read’ the system and use it to infer the right direction reliably and quickly? – How do the three conditions (arrows only, colored circles, colored symbols) compare in terms of task completion time, errors, disorientation events, usability, satisfaction, and workload? 4.1
Participants
We recruited 18 participants (9 female and 9 male) from around the University. They were aged between 22 and 52 years with the mean age being 35 years.
48
4.2
F. Hamhoum and C. Kray
Stimuli
For each of the three conditions, we created 12 signs that each showed four to nine destinations; for each destination an arrow pointing in one of eight directions was included (cardinal and ordinal directions). In the first condition (Arrow), no augmentation was added to those signs (see Figure 3, bottom left). In the second condition (Color), the signs were augmented by a number of colored circles (see Figure 3, top left), and in the third condition (Symbol), the signs were augmented with a number of color symbols (see Figure 3, top right). We used nine different colors – Blue, Red, Yellow, Brown, Green, Black, Orange, Gray, and Pink – in condition two and three, and eight different symbols – Circle, Star, Triangle, Moon, Heart, Square, Cross, and Rectangle – in the third conditions. We chose those colors and symbols based on how easily they could be distinguished from one another and also based on whether they could be easily verbalized, which we hoped would facilitate remembering of them. The signs were played back via an automated slideshow that presented each sign for 20 seconds before turning the screen black. The signs were shown on a 50” plasma screen, and participants were instructed to stand at the center of a circular mat placed in front of the screen. The circular mat had a diameter of one meter and eight marks around the edge to indicate the cardinal and ordinal directions (see Figure 3, bottom right). 4.3
Procedure
After a brief introduction, each participant first received a short questionnaire that we asked them to fill in to gather some background information. We provided a maximum of ten minutes to do this, and everyone completed it within that timeframe. Next, each participant was exposed to the three conditions we were testing. Each condition consisted of a brief explanation of the task participants had to perform followed by twelve trials. After completing the final condition, participants were asked to fill in another short questionnaire, were debriefed and received a small payment. Each trial was structured as follows: the experimenter first verbally provided the participant with a target destination or visual code depending on the condition. In the Arrow condition, the experimenter would instruct participants by saying “Your destination is X”, where ‘X’ would be a destination such as “the general hospital” or “the civic center”. In the Colour and Symbol conditions, the instructions given would be “To get to your destination, you have to follow Y”, where ‘Y’ would be either a color (e. g. “blue”) or a colored symbol (e. g. “pink moon”, “yellow square”). Once participants indicated (verbally, by nodding or via a gesture) that they had understood the instructions, the experimenter triggered the display of the next sign on the plasma screen. Participants then had to scrutinize the sign to determine, in which direction they would have to move to reach the destination they were given just beforehand. When they had figured out the direction they should move to, they had to put one of their feet onto the mark around the edge
Scalable Navigation Support for Crowds
49
of the circular mat that corresponds to the direction. Once this was done, the experimenter moved on to the next trial by blanking the screen and providing the next instruction. Each sign was shown for a maximum of 20 seconds, after which the screen would go blank automatically. If people did not select a direction within that time, we classified this as a disorientation event. We recorded all trials on video, and used the footage to take the following measurements: disorientation events (no direction selected within the 20 seconds a sign is displayed), errors (selection of an incorrect direction) and completion time (time from the appearance of a sign until on foot made contact with one of the marks corresponding to the directions. 4.4
Results
As part of the initial questionnaire, participants filled in the Santa Barbara Sense of Direction questionnaire. The results revealed an average score of 3.55, which is very similar to the score of 3.6 that Hegarty et al. [8] reported for their 211 participants. Timing. When analyzing the task completion time (see Figure 5), we found significant differences between the three conditions. A one-way ANOVA analysis indicates that the average values for the three conditions are significantly different with a p-value of 0.001 (α <0.01). A post-hoc Tukey test revealed significant differences between all conditions with p-values of 0.001 (α <0.01). Overall, participants were fastest in the Color condition followed by the Symbol condition, with the Arrow condition being the slowest one.
Fig. 5. Average task completion time per condition (with confidence intervals)
50
F. Hamhoum and C. Kray
Errors and Disorientation. The overall number of errors and disorientation events we recorded was very low. In total only two errors occurred throughout the entire study. Both errors were observed in the symbol condition, whereas no errors at all were recorded in either the Color or Arrow condition. Disorientation events were counted when the participants failed to select a direction within the 20 seconds that each sign appeared on the screen. Overall, we only observed this once during the symbol condition. No disorientation events occurred during the Arrow and Color conditions. Usability, Satisfaction, Ease to use, and Learnability. The responses to the selected questions from the USE questionnaire [18,17] are depicted in Figure 6, which shows the average results for all participants. In all four categories, the Color condition is rated best followed by the Symbol condition, whereas the Arrow condition is consistently rated lowest. This indicates that participants found the Color condition to be the easiest to learn and use, and also rated it best in terms of usability and satisfaction. The Symbol condition was rated second best in all four conditions, and the Arrow condition worst (attracting particularly low scores in the usability and satisfaction categories). A one-way ANOVA analysis indicates that the average aggregated values for the three conditions are significantly different with a p-value of 0.001 (α <0.01). A post-hoc Tukey test revealed significant differences between all conditions with p-values of 0.001 (α <0.01).
Fig. 6. Average results for subset of USE questionnaire [18] using a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree)
Workload. Individual results for different categories of the NASA Task Index Questionnaire [7] are depicted in Figure 7, which shows the mean results of all participants. The combined TLX scores for the different conditions were as follows: the Arrow condition has the highest average of 3.46 with standard deviation (SD) of 0.28, the Symbol condition has the second highest average value of 2.90 (SD = 0.37), and the Color condition has the lowest average value of 2.13 (SD = 0.16). Perceived task load was thus highest for the Arrow condition,
Scalable Navigation Support for Crowds
51
lowest for the Color condition and second lowest for the Symbol condition. A one-way ANOVA analysis indicates that the combined scores for the three conditions are significantly different with a p-value of 0.001 (α <0.01). A post-hoc Tukey test revealed significant differences between all conditions with p-values of 0.001 (α <0.01).
" !! # !! $ !! % !! & !!
Fig. 7. Mean results for NASA TLX questionnaire broken down by category and using a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree)
Fig. 8. Box plot showing the median of each condition for the NASA TLX questionnaire
Preference. In the final questionnaire, the participants were asked to rank the conditions according to which ones they prefer. The results are depicted in Figure 9. The average rank for the Color condition was 1.4, and 2.2 for both the Arrow and the Symbol condition (using an unweighted rank average). 12 out of the 18 participants (67%) ranked the Color condition as their first choice, 6 out of the 18 participants (33%) ranked the Arrow condition as the second choice, and 9 out of 18 participants (50%) ranked the Symbol condition their
F. Hamhoum and C. Kray
Number of participants
52
Fig. 9. Post-study ranking of the three conditions
third choice. Overall, the Color condition was clearly the most preferred option, while the Arrow and Symbol conditions were ranked similarly (depending on how different ranks are weighted).
5
Discussion
While the study provides some initial insights into whether people can use augmented signage, there are some shortcomings that limit the generality of the results we obtained. One of these issues relates to the lack of realism: while participants had to physically move to select a direction, this movement was very small compared to the amount of walking that could be required if the system was deployed in the real world. Another issue related to this is the lack of crowdedness, which is hard to emulate in the lab without compromising the safety of participants. Finally, we varied the overall complexity of sign within very narrow boundaries (i. e. between four and nine items per sign), whereas in the real world, sign complexity can vary more widely, e. g. in terms of the layout, number of items and directional indicators being used. A real-world test with an actual deployment would overcome these issues but could also incur considerable safety implications in case of dense crowds. A more realistic simulation environment [25] could thus be a sensible intermediate step. Nevertheless, the results are generally quite encouraging in a number of ways: we recorded very few errors or disorientation events, which indicates that participants were able to use the augmented signage well. Completion times for both augmented signage conditions were lower than for the non-augmented case, and the ratings in the different USE categories were also higher. The same is true for the workload, which was lower for augmented signage than it was for non-augmented signage. In general, the Color condition scored higher than the Symbol condition in all these tests. In terms of overall preference, the Color condition was clearly the most favored one; in this case, the Arrow and the Symbol condition tied in second place. It is worth noting that the few errors and
Scalable Navigation Support for Crowds
53
disorientation events we recorded during the study occurred only in the Symbol condition, despite the workload being rated lower than the Arrow condition. Compared to existing systems (see section 2) and in the context of the requirements identified in section 3, the proposed approach offers a number of benefits. It does not require users to carry a mobile device, it works well with static signage, and it maintains the original purpose/function of the signs. The use of colors and/or symbols also makes the system accessible to people, who do not speak the local language or who cannot read at all. The study outcomes thus provide initial evidence that augmented signage is usable, easy to learn and enjoyable without incurring large penalties in terms of usage time, error rates or disorientation events. While both augmented signage designs can provide personalized directions to any number of users, their scalability is limited by the number of simultaneous destinations they support. The colored circle design allows for as many destinations as there are colors, which realistically limits it to about ten destinations (primary plus secondary colors, leaving out the background color), as it would be difficult for people to remember specific shades of colors or colors that they cannot reliably name. The colored symbol design significantly extends the number of simultaneous destinations, effectively multiplying it by the number of symbols used – we used eight symbols and nine colors in the study reported above, resulting in up to 72 concurrent destinations. However, to display that many destinations on a single sign, it would be necessary to rotate through them. If they were all displayed simultaneously, individual symbols might be too small to be recognized well. In order to further increase the number of destinations, hierarchical approaches or multi-color codes could be used, which would most likely have an impact on usability. Using dynamic augmented signage cannot only help with scalability, it can also provide means to assign visual codes to groups of people, e. g. by cycling through them over time so that passers-by pick up codes depending on when they walk past the sign. In addition, dynamic signage would enable real-time adaptations, which would be beneficial in terms of responding to changes in the environment or crowd behavior. For example, the size of target areas could be adjusted dynamically in response to how many people are assigned to it. Combining dynamic displays with various sensors (e. g. flow rate sensors, people counters, presence sensors) could potentially fully automate this process. Another interesting option would be to combine the two augmented signage designs presented in section 3. By collocating all destinations corresponding to a particular color, signage could be augmented hierarchically: signs outside the area corresponding to a specific color only include color codes, not symbols for that color. Signs inside an area corresponding to a color do include symbols for that color. This would simplify the signs without reducing the number of addressable destinations, and would also account for people’s overall preference of the Color condition by minimizing their exposure to the colored symbols.
54
6
F. Hamhoum and C. Kray
Conclusions
In this paper, we presented the design and initial evaluation of an approach to provide personalized navigation support to large crowds via augmented signage. The approach is based on a spatial partitioning and a mapping of destinations to visual codes, which are used to augment regular signage. People can use these codes to extract individual directions from signs. The proposed approach addresses several shortcomings of existing approaches such as the requirement for everyone to carry mobile phones or the limited scalability in terms of the number of simultaneous users. We presented two different designs for augmented signage, which we compared against a base condition in a lab-based user study. The outcomes of this study provide initial evidence that augmented signage can be used successfully with little training and without incurring large penalties in terms of delays, workload, errors or disorientation. In general, the color-only condition was preferred followed by the colored-symbol condition, which supports a considerably larger number of concurrent destinations. The base case (text and arrow only) was consistently rated lowest in almost all tests. We also reviewed the potential of our approach in terms of managing people flow, and discussed different ways to improve and expand the system. Based on these initial results, our next step will be to apply this approach to a specific scenario and to evaluate it under more realistic conditions. In order to achieve this, we intend to investigate its use in the context of the pilgrimage that brings many Muslims to Mecca every year. This scenario poses a number of challenges such as the scale and diversity of the crowd that will enable us to test and refine the approach further. As it is unlikely that a research prototype could be deployed at Mecca, we also plan to look into different ways to increase the realism of lab-based studies, e. g. by simulating movement in a more convincing way or by using semi-immersive imagery.
References 1. Arikawa, M., Konomi, S., Ohnishi, K.: Navitime: Supporting pedestrian navigation in the real world. IEEE Pervasive Computing 6(3), 21–29 (2007) 2. Arthur, P., Passini, R.: Wayfinding: People, Signs, and Architecture. McGraw-Hill, New York (1992) 3. Baus, J., Kray, C., Cheverst, K.: A survey of map-based mobile guides. In: Meng, L., Zipf, A., Reichenbacher, T. (eds.) Map-Based Mobile Services, pp. 197–216. Springer, Heidelberg (2005) 4. Baus, J., Kr¨ uger, A., Wahlster, W.: A resource-adaptive mobile navigation system. In: Gil, Y., Leake, D.B. (eds.) IUI 2002 – 2002 International Conference on Intelligent User Interfaces, pp. 15–22. ACM Press, San Francisco (2002) 5. Card, O.S.: Ender’s Game. Tor Books (1985) 6. Cheverst, K., Davies, N., Friday, A., Efstratiou, C.: Developing a context-aware electronic tourist guide: Some issues and experiences. In: Proceedings of CHI 2000, Netherlands, pp. 17–24 (2000) 7. Hart, S.G., Stavenland, L.E.: Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In: Human Mental Workload, pp. 139–183. Elsevier, Amsterdam (1988)
Scalable Navigation Support for Crowds
55
8. Hegarty, M., Richardson, A.E., Montello, D.R., Lovelace, K., Subbiah, I.: Development of a self-report measure of environmental spatial ability. Intelligence 30(5), 425–447 (2002) 9. Helbing, D., Johansson, A., Al-Abideen, H.Z.: Dynamics of crowd disasters: An empirical study. Phys. Rev. E 75(4), 046109 (2007) 10. H¨ olscher, C., Br¨ osamle, M., Meilinger, T., Strube, G.: Signs and maps - cognitive economy in the use of external aids for indoor navigation. In: McNamara, D., Trafton, J. (eds.) Proceedings of the 29th Annual Cognitive Science Society, pp. 377–382. Cognitive Science Society, Austin (2007) 11. Klippel, A., Tappe, H., Kulik, L., Lee, P.U.: Wayfinding choremes–a language for modeling conceptual route knowledge. Journal of Visual Languages & Computing 16(4), 311–329 (2005) 12. Klippel, A., Winter, S.: Structural salience of landmarks for route directions. In: Cohn, A., Mark, D. (eds.) COSIT 2005. LNCS, vol. 3693, pp. 347–362. Springer, Heidelberg (2005) 13. Kray, C., Blocher, A.: Modeling the basic meanings of path relations. In: Proceedings of the 16th IJCAI, pp. 384–389. Morgan Kaufmann, San Francisco (1999) 14. Kray, C., Kortuem, G., Kr¨ uger, A.: Adaptive navigation support with public displays. In: Amant, R.S., Riedl, J., Jameson, A. (eds.) Proceedings of IUI 2005, pp. 326–328. ACM Press, New York (2005) 15. Kray, C., Laakso, K., Elting, C., Coors, V.: Presenting route instructions on mobile devices. In: Johnson, W.L., Andr´e, E., Domingue, J. (eds.) Proceedings of IUI 2003, pp. 117–124. ACM Press, Miami Beach (2003) 16. Levine, M.: You-are-here maps. psychological considerations. Environment and Behavior 14(2), 221–237 (1982) 17. Lewis, J.R.: Ibm computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int. J. Hum.-Comput. Interact. 7(1), 57–78 (1995) 18. Lund, A.: USE questionnaire (2011), http://usesurvey.com (last accessed March 3, 2011) 19. Olivier, P., Cao, H., Gilroy, S.W., Jackson, D.G.: Crossmodal ambient displays. In: British Computer Society HCI: Engage. Queen Mary College, University of London, UK (2006) 20. Open Street Map Foundation: Open street map web site (2011), http://www.openstreetmap.org/ (last accessed March 4, 2011) 21. Richter, K.-F.: A uniform handling of different landmark types in route directions. In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 373–389. Springer, Heidelberg (2007) 22. Rohs, M., Sch¨ oning, J., Raubal, M., Essl, G., Kr¨ uger, A.: Map navigation with mobile devices: virtual versus physical movement with and without visual context. In: ICMI 2007: Proceedings of the 9th International Conference on Multimodal Interfaces, pp. 146–153. ACM, New York (2007) 23. Rukzio, E., Schmidt, A., Kr¨ uger, A.: The rotating compass: a novel interaction technique for mobile navigation. In: CHI 2005: CHI 2005 Extended Abstracts On Human Factors In Computing Systems, pp. 1761–1764. ACM, New York (2005) 24. Schmid, F., Kuntzsch, C., Winter, S., Kazerani, A., Preisig, B.: Situated local and global orientation in mobile you-are-here maps. In: Proc. of Mobile HCI 2010, pp. 83–92. ACM, New York (2010)
56
F. Hamhoum and C. Kray
25. Singh, P., Ha, H.N., Kuang, Z., Olivier, P., Kray, C., Blythe, P., James, P.: Immersive video as a rapid prototyping and evaluation tool for mobile and ambient applications. In: MobileHCI 2006: Proceedings of the 8th Conference On HumanComputer Interaction With Mobile Devices and Services, pp. 264–264. ACM Press, New York (2006) 26. Tversky, B.: Cognitive Maps, Cognitive Collages, and Spatial Mental Models. In: Campari, I., Frank, A.U. (eds.) COSIT 1993. LNCS, vol. 716, pp. 14–24. Springer, Heidelberg (1993) 27. Tversky, B., Lee, P.U.: Pictorial and verbal tools for conveying routes. In: Freksa, C., Mark, D.M. (eds.) COSIT 1999. LNCS, vol. 1661, pp. 37–50. Springer, Heidelberg (1999) 28. Xie, H., Filippidis, L., Gwynne, S., Galea, E.R., Blackshields, D., Lawrence, P.J.: Signage Legibility Distances as a Function of Observation Angle. Journal of Fire Protection Engineering 17(1), 41–64 (2007)
Information on the Consequence of a Move and Its Use for Route Improvisation Support Takeshi Shirabe School of Architecture and the Built Environment Royal Institute of Technology (KTH) SE-100 44, Stockholm, Sweden
[email protected]
Abstract. This paper proposes a new method of navigational assistance in unfamiliar environments. In such environments, major concerns would normally be how to find a good route to a selected destination and how to design and communicate directions to follow that route. This may not be the case, however, if route selection criteria are not complete or subject to change during a trip. To cope with such uncertainty, the proposed method calculates, for each possible move from the current position, a single value characterizing the consequence of that move, e.g., how long it will take to reach the destination if that move is made. The paper outlines a design of a route improvisation support system equipped with this method, and underlines the merit of letting the user build up a route progressively by taking into account highly local, temporary, or personal information that is not stored in the system but collected by the user while traveling.
1 Introduction Advances in digital information technology over the past several decades have had profound effects on the way people travel in unfamiliar or not-fully-familiar environments. It is increasingly common for drivers to use in-vehicle navigation systems to find a route to a destination and obtain a sequence of directions that help them follow that route. Also many modern mobile communication devices are equipped with the functionality of route search and guidance for pedestrians. Navigation is an activity most of us do routinely but can be an interesting subject of study. One starting point for its analysis is to recognize two aspects of the activity: “planning” (or “wayfinding”) and “execution of movements” (or “locomotion”) [14]. In a typical scenario of route planning with an existing navigation system, the user specifies a destination and selects a type of “cost” (e.g. travel time, travel distance, toll payment, or fuel consumption) in terms of which routes are evaluated, and then the system detects the current location and calculates a shortest (or minimum-cost) route to the specified destination. Some navigation systems present the user with multiple routes (one for each cost type) from which he chooses one prior to the trip. This functionality increases the quality of route decision and/or the user’s satisfaction with the selected route since multiple candidate routes can be compared from different M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 57–72, 2011. © Springer-Verlag Berlin Heidelberg 2011
58
T. Shirabe
perspectives. Some navigation systems suggest an alternative route to the user during the trip as soon as it finds that the currently selected route has become unavailable or suboptimal. This functionality increases the chance of arriving at the destination successfully and/or quickly. As the term planning implies, route planning deals with the future. Thus route planners should remember that the condition of a planned route may change while it is actually traversed. Methods are available for finding a shortest route through a “time-dependent” or “dynamic” network, whose topology (e.g. road opening/closure) or attribute (e.g. traffic volume) changes as a function of time [15, 10, 25, 5]. Obviously all future conditions cannot be predicted in a deterministic manner. So some researchers incorporated uncertainty into road networks in such a way that the travel time of each road segment follows a known probabilistic distribution and sought variations of shortest routes such as one that achieves the earliest expected arrival [21, 9] and one that maximizes the chance of arriving on time [7]. Whatever route planning method is employed, once a route is set, the remaining execution task seems trivial—just follow the set route. Nonetheless, humans often fail to do it for various reasons despite all kinds of navigational aids (see, e.g., [4]). The complexity of human navigation has been (at least partially) explained by theories and models developed through decades of interdisciplinary research involving, among others, psychology and cognitive science/engineering. Elements of good route directions have been identified (see, e.g., [13]), and techniques for enhancing the quality of route directions (e.g., automated landmark extraction [16], “chunking” of route elements [12], knowledge-based schematization [20], dialog-driven route direction generation [17]) have been developed. Though limited, our review of relevant literature suggests that existing navigation methods and systems rely on a common implicit assumption: one (and only one) route is considered at a time when route directions are given. It is this “one route at a time” assumption that makes it possible to place on the screen an arrow icon indicating the next move or to send a vocal message like “turn left at the next intersection.” This is, of course, nothing wrong or surprising because most users expect unambiguous route guidance. Now imagine a traveler who needs to depart from a selected route in response to unexpected events. A conventional navigation system would provides him with a new route only when a registered type of event occurs, e.g., he fails to stay on the selected route, he informs the system of his intension to make a detour, or the system finds the currently selected route no longer optimal. However, some events are simply too local, temporary, or personal to trigger the system to seek an alternative route. For instance, an unwelcome friend is approaching from ahead or a suspicious car is chasing from behind. Another case when a navigation system works less effectively is when a traveler is somewhat familiar with the environment and believes that she knows better than the navigation system. All she wants from the system may be an alert when there is a risk of making an unrecoverable mistake. There may be an even more chaotic situation such that an ultimate destination is fixed but route selection criteria are spontaneous or capricious during a trip. For example, a tourist needs to go back to his hotel by a certain time but wants to explore the area as long as time allows.
Information on the Consequence of a Move
59
Taking an unexpected or unplanned move without knowing its consequence would cause the risk of failing or delaying arrival at the destination. It is possible that a user of a conventional navigation system stops at every intersection to compute a new route starting with each possible move from there. This, however, would cost significant time and computing resources, which, in turn, would prevent the user from receiving timely route guidance and traveling smoothly. The examples illustrated above by no means imply a shortcoming of the current navigation technology. But they call for new concepts of en-route travel assistance. This paper proposes one relatively simple one, that is, to provide the traveler with information on the consequence of each possible move wherever he is and whenever he needs/wants it. Such information may be qualitative, e.g., “don’t turn left there” or quantitative, e.g., “if you turn left there, it will take extra 5 minutes.” This way, the traveler will not be told which way to go next but be let decide on the next move according to whatever preferences and/or constraints he currently has. This sort of travel support may be more appropriately regarded as relating to exploration or route improvisation1 rather than navigation. The remainder of the paper is organized as follows. Section 2 reviews a basic theory concerning the classic shortest path problem and its solution, which is utilized in Section 3 to develop a method of producing a simple piece of information representing the consequence of a move. Section 4 incorporates this method into a system that supports route improvisation. Section 5 concludes the paper.
2 Shortest Path Problem In this section we examine two closely-related concepts concerning the shortest path problem, namely, shortest path tree and reduced cost. Because the shortest path problem has been studied in a wide range of contexts with different terminologies, we first define some basic terms we use throughout this paper. A network is a set of nodes (representing, e.g., road intersections) and a set of arcs (representing, e.g. road segments), and assumed to be directed (i.e. every arc is defined by an ordered pair of nodes) unless otherwise stated. A path is a sequence of arcs, and assumed to be directed (i.e. all its arcs go the same direction) unless otherwise stated. Origin and destination are the first and last nodes of a path, respectively. A cycle is a sequence of arcs that begins and ends with the same node but repeats no other nodes. A tree is a network that has a path between any two nodes but no cycle (directed or not). The following notation is employed throughout this section. G = ( N , A) denotes a network consisting of a set, N, of n nodes and a set, A, of m arcs. Associated with each arc is a constant, l (i, j ) , representing its length. The length of a path is defined as the sum of the lengths of all arcs in that path, and d (i ) denotes the length of some path from node i to the destination.
1
Route improvisation may be regarded as a combination of “undirected wayfinding” and “directed wayfinding” according to Wiener’s taxonomy [23] since it may involve unplanned stops and detours before reaching a specific destination.
60
T. Shirabe
It is assumed that all networks considered in this paper contain no cycle with negative length in order to keep the problem from being computationally intractable (or NP-hard). 2.1 Shortest Path Tree It is useful, at least from a computational point of view, to classify the classical shortest path problem in three kinds: one-to-one, one-to-all (or all-to-one), and all-to-all. The one-to-one shortest path problem is to find a path from the origin to the destination that is shorter than or equal to any other path. The one-to-all (or all-to-one) shortest path problem is to find a shortest path from the origin to every other node (or to the destination from every other node). Both have the same degree of computational complexity, as a solution to the former is derived from a solution to the latter2. For instance, if all arc lengths are nonnegative, Dijkstra’s algorithm in its original form [6] and that with Fibonacci heaps [1] solve the problem in O(n2 ) time and in O(m + n log n) time, respectively. If some arcs have a negative cost, one can use “Bellman-Ford algorithm” [1] whose running time is O(mn) . Finally, the all-to-all shortest path problem seeks a shortest path from every node to every other node. It can be solved by executing a shortest path algorithm as many times as the number of nodes (once for each node as the destination), but more efficient algorithms (e.g. “Floyd-Warshall algorithm” [1]) exist. A solution to the (all-to-one) shortest path problem is not just a single path but a set of shortest paths. There may be more than one such set in some networks, but there is at least one that forms a tree (see Figure 1) in all networks. From this tree, one can construct a shortest path from any node to the destination simply by tracing down the tree until the destination is reached. It has been proven that the values of d (i) (for all i ∈ N ) that are determined by any shortest path tree satisfy the following relations [1]: d (i ) ≤ d ( j ) + l (i, j )
∀(i , j ) ∈ A
(1)
It is easy to see how this relation holds in a concrete network. For instance, in reference to Figure 1, Equation (1) is true for arc (1, 2) because d (1) = 8 , d (2) = 5 and l (1, 2) = 5 , as well as for arc (1,3) because d (1) = 8 , d (3) = 4 and l (1,3) = 4 . Notice that the latter case involves an arc belonging to a shortest path and satisfies Equation (1) as equality. This is true in general, that is: d (i ) = d ( j ) + l (i, j )
∀(i, j ) in any shortest path
(2)
Conversely, any path tree whose associated d (i ) (for all i ∈ N ) satisfy Equation (1) is a shortest path tree. Thus Equation (1) is generally referred to as the optimality conditions for the shortest path problem. 2
In practice, the one-to-one problem may be solved more efficiently by exploiting an obvious lower bound (e.g. the straight-line distance if the network is embedded in the Euclidean plane) on the length of a shortest path from each node to the destination [8].
Information on the Consequence of a Move
61
Fig. 1. A shortest path tree (shaded) rooted at the destination (node 8). The number associated with each arc represents its length, and the number associated with each node (also underlined) represents the length of a shortest path from that node to the destination.
2.2 Reduced Cost Consider another variable l (i, j) that is defined as:
l (i, j) = l(i, j) + d ( j) − d (i)
(3)
Then the optimality conditions are re-written as:
l (i, j) ≥ 0
∀(i, j ) ∈ A
(4)
The nonnegativity of the optimal l (i, j) has an important implication. When
l (i, j) is optimal, d (i ) is equal to the length of a shortest path from node i to the destination, and l (i, j ) + d ( j ) is equal to the length of a shortest path from node i to the destination subject to the condition that the path begins with arc (i, j ) (see Figure 2). Thus the optimal l (i, j ) can be intuitively understood as representing how much the inclusion of arc (i, j ) would increase the shortest path length from node i to the destination, and is generally referred to as the reduced cost of arc (i, j ) .
d(j) j l(i,i)
n i
d(i)
Fig. 2. Analysis of the reduced cost of an arc. A shortest path from node i to node n via arc (i,j) is longer than a shortest path from node i to node n without such a restriction by
l (i, j) + d ( j ) − d (i) = l (i, j) . Note that the dashed arrows represent shortest paths, and the solid arrow represents an arc.
62
T. Shirabe
The interpretation of the optimality conditions in terms of reduced costs helps us find one or more shortest paths from any chosen node to the destination because any sequence of arcs with zero reduced cost is a shortest path. For instance, two such sequences (1-3-6-8 and 1-5-3-6) connect nodes 1 and 8 in Figure 3.
Fig. 3. A network with reduced costs. It is implied that paths 1-3-6-8 and 1-5-3-6-8 are shortest paths because they consist only of arcs with zero reduced cost
3 Method of Producing Information on the Consequence of a Move Our approach to supporting route improvisation is to keep the traveler informed of the consequence of every possible move from the current location. Obviously, there is a limit on the amount of information a traveling person is capable to process. Also, there is a limit on the length of time for which a traveling person is willing to wait before information becomes available. Thus, it is crucial to design information products that are small in volume and easy to compute but represent the consequences of alternative moves effectively. 3.1 Consequence Values We generally refer to a single value relevant to the consequence of a move as the consequence value of that move. One such (and perhaps the most straightforward) value indicates how long it takes to reach a destination if a certain move is made. Letting c (i, j ) denote this value for the move of taking arc (i, j ) , it is defined as: c (i, j ) = l (i , j ) + d ( j )
(6)
This is easy to derive from the output of the all-to-one shortest path algorithm discussed in the previous section. For a numerical example, suppose that we are at node 3 in the network illustrated in Figure 1. The consequence value of moving to node 4 is 5, as the length of (3, 4) is 1 and the shortest path length from node 4 to the destination is 4. Similarly, the consequence values of moving to node 5 and to node 6 are 9 (= 3 + 6) and 4 (= 3 + 1), respectively. One can easily extend the notion of consequence value so that the length of a path can be measured in terms of a variety of other attributes such as time, (monetary) cost,
Information on the Consequence of a Move
63
and energy consumption. Then, letting l (i, j , a ) denote the length of arc (i, j ) in term of attribute a and d ( j , a ) the shortest path length from node j to the destination in term of attribute a, the consequence value, c (i, j , a ) , of taking arc (i, j ) in terms of attribute a is defined as: c(i, j , a ) = l (i, j , a ) + d ( j , a )
(7)
In practice, it often takes some extra cost to turn at an intersection [3, 11, 24]. To see how such costs can be included in computation of consequence values, let us here modify our notation as follows. p (i, j , a ) denotes the cost incurred by turning from arc i to arc j in terms of attribute a, l ( j, a ) denotes the length of arc j in terms of attribute a, d ( j , a ) denotes the shortest path length (with turn costs included) from the head of arc j to the destination in terms of attribute a, and c (i, j , a ) denotes the consequence value of taking a turn from arc i to arc j in terms of attribute a. Then c (i, j , a ) is defined as: c(i, j , a ) = p (i, j , a ) + l ( j , a ) + d ( j , a )
(8)
In the remainder of this paper, to avoid notational complexity, it is assumed that there is no turn penalty so that consequence values are given by Equation (7). However, this assumption does not alter the essence of consequence value. 3.2 Consequence Values Relative to Particular Move Consequence values have so far been measured in absolute terms. They may not be most effective when their relative advantages (or disadvantages) need to be directly compared. Such a comparison can be facilitated by designating one particular move as the basis and express the consequence values of all other moves relative to it. This leads to another form of consequence value, here denoted as c(i, j, k , a) . Its formula is given below. c(i, j , k , a ) = c (i, j , a ) − c(i, k , a )
(9)
Clearly the derived value represents the excess cost in terms of attribute a incurred by taking arc (i, j ) instead of taking arc (i, k ) . Node k may be selected so that arc (i, k ) is the next segment of the route currently intended or recommended to follow. If this route coincides with a shortest path in terms of the same attribute, a, then c(i, j, k , a) is identical to the reduced cost defined in Equation (7). It may be equally useful to designate some other route (which may be optimal in terms of a different attribute) as the route for comparison. For instance, a driver who is currently following a fastest route may want to know how much gas will be saved if he makes a detour. In this case, a should be set to fuel consumption while arc (i, k ) remain a segment of the fastest route.
64
T. Shirabe
3.3 Consequence Values Relative to Extramural Value One can look at consequence values from a different reference point determined externally by translating them so that this point becomes the new origin. This process can be generalized to a linear transformation of c (i, j , a ) expressed by the following formula with two parameters α and β. c (i , j , a , α , β ) = α ⋅ c (i , j , a ) + β
(10)
We regard c(i, j , a, α , β ) as yet another variation of consequence value. The utility of this transformation varies depending on what attribute is chosen, as well as how the parameters are set. If the consequence value is measured in terms of time, one may set α to 1 and β to the current time. The resulting value indicates the expected arrival time if arc (i, j ) is taken. Alternatively, one may set α to −1 and β to the due arrival time minus the current time to see how much spare time one can have without failing to reach the destination in time if arc (i, j ) is taken. Those who are concerned about the availability of gas may set α to −1 and β to the amount of fuel currently stored in the tank. The resulting value indicates how much gas is expected to be left on the arrival at the destination if arc (i, j ) is taken. The value 0 or smaller is a warning that if you make that move, you will need to stop and refill the tank on the way. 3.4 Generalization of Consequence Values We conclude this section with a possible generalization of consequence values. While the shortest path length (or variations of it) is useful in a practical sense and efficient with respect to computational complexity, it is possible to see it just as one of many possible parameters that summarize (a subset of) the population of all routes starting with a certain move (Figure 4). Some of them are classified into two kinds: one concerning (sequences of) arcs and the other concerning nodes. An example of the former is the number of alternative routes of a certain length or shorter. An example of the latter is the number of points of interest reachable within a certain time.
Fig. 4. Population of routes (represented by an area bounded by a curve) associated with each move (represented by an arrow) from the current location (represented by a circle)
Information on the Consequence of a Move
65
However simple they may sound, some parameters would require a significantly large amount of computation. For example, to find the number of drug stores within a certain travel time after each move, fastest times need to be calculated from its corresponding arc to all drug stores. Although a shortest path algorithm runs relatively fast, it may not fast enough to catch up with the user’s speed of travel. An alternative may be to pre-compute shortest paths from all nodes to all other nodes in advance and store them in a database. If the attribute of interest is stable (e.g. geometric length), this may work as long as the database has a large enough capacity to store all the shortest paths and organizes them in an efficient structure such as “shortest path quadtree” [18] or “path oracle” [19] that allows quick access to all shortest paths from any chosen node.
4 Design of Route Improvisation Support System The method of computing consequence values is intended to be integrated into a system that supports route improvisation. This section describes a design of such a system. In doing so, the following scenario is used for ease of exposition. •
The user is a pedestrian and may move in both directions on any road segment so that all arcs are undirected. • Travel time is the only attribute of interest. This simplifies the notation: l (i, j ) abbreviates l (i, j , travel time) and all other symbols should be interpreted similarly. The system admittedly resembles an existing navigation system. Therefore, details will be given only to distinctive components, i.e., those that calculate and communicate consequence values. 4.1 Overall Structure Figure 5 is a schematic diagram showing the overall arrangement of one possible implementation of the system.
User Input Device
2
Database
3
1 7
5 Consequence Value Calculator
Optimal Path Finder
4
Location Detector
6
8
Output Device
9
Storage Medium
Fig. 5. A design of a route improvisation support system. The boxes represent components of the system and the arrows represent flows of data. Each data flow is numbered to facilitate its explanation in the text.
66
T. Shirabe
A database stores a digital map that includes at least a (pedestrian) network in which each arc is assigned the time it takes to walk through it. The map may additionally include other data, such as locations of landmarks, pictures of intersections, and points of interest, useful for supporting orientation, place recognition, and route improvisation. A location detector detects the current location of the user by using existing technology such as a global positioning system, an indoor sensor network, an odometer and a gyroscope. Through a user input device, the user inputs an identifier of a destination, e.g., a postal address or a place name. The database transforms the identifier to a location (coinciding with a node or a point on an arc) on the map. According to the initial location (arrow 1) and the selected destination (arrow 2), the database retrieves a portion of the map relevant to the present trip. The map portion should be sufficiently small so that the subsequent tasks will be completed in a reasonable amount of time but sufficiently large so that it is guaranteed to find a fastest path from every node that the user may go through on the way to the destination. Given the selected destination and retrieved map portion (arrow 3), an optimal path finder builds a tree of fastest paths rooted at the destination, and saves it in a storage medium (arrow 4). Route improvisation support may begin as soon as the algorithm finds fastest paths from all nodes within a certain proximity of the current location while the rest of computation is being performed. In response to the change of the user’s location detected by the location detector (arrow 5), a consequence value calculator takes a relevant portion of the fastest path tree from the storage medium (arrow 6) and arc travel times from the database (arrow 7) and calculates a consequence value of each possible move from the current location. The result, together with any relevant data (e.g. locations of landmarks and background images) taken from the database (arrow 7), is transmitted to an output device (arrow 8). The user can always use the user input device to send the consequence value calculator a command (arrow 9) to change the form of consequence values. 4.2 Calculation of Consequence Value Like a navigation system, it is important for a route improvisation system to provide the user with sought information in a timely manner in accordance with the user’s current location. A consequence value should be computed for each of the moves that are possibly made next. It, however, involves ambiguity or subjectivity in judging which moves qualify to be next moves. One way for automating this judgment relies on two variables, one representing the travel time from the current location to the head of the arc being traversed and the other representing the travel time from the current location to the tail of this arc. These variables are denoted by l ( x, i ) and ( x, h ) in Figure 6, and how they are used is described below. In reference to Figure 6, suppose that the user is currently located at location x and approaching a four-way intersection (node i). If l ( x, i ) does not exceed a predetermined threshold, that is, the current position is sufficiently close to node i, then possible next moves are going straight through i (x to node i to node j2), turning left at
Information on the Consequence of a Move
67
i (x to node i to node j1), and turning right at i (x to node i to node j3). Otherwise, going to node i (not beyond it) is considered to be the only such move. Similarly, l ( x, h) determines which backward moves (i.e. moves in the direction to node h) should be considered as possible next moves. j2 d(j2)
l(i,j2) d(j1)
d(j3)
d(i)
j1
j3
i l(i,j3)
l(i,j1) l(x,i) x
l(x,h)
d(k1)
d(k3)
d(h)
k1
k3
h l(h,k1)
l(h,k3)
l(h,k2)
k2 d(k2)
Fig. 6. Portion of a network near the user’s current location (represented by a circle labeled with x). The value associated with each node (represented by another circle) indicates the shortest path length (in terms of travel time) from that node to the destination, and the value associated with each arc (represented by a line segment) indicates the travel time of that arc. Note that arc (h,i) is dynamically divided into two arcs (x,i) and (x,h) as the user moves.
It is easy to turn the aforementioned rules and Equations (6)-(10) into a procedure for generating consequence values in accordance with the user’s location. An example below identifies all possible next moves from a given location x and calculates their consequence values relative to that of the optimal move. It can be easily adapted for other forms of consequence values. Note that all fastest path lengths d (⋅) were calculated prior to the trip and that threshold is a predetermined constant.
68
T. Shirabe
if l ( x, i ) ≤ threshold and l ( x, h) ≤ threshold then min := minimum of { l ( x, i ) + l (i , j ) + d ( j ) for all nodes j ( ≠ h ) ad-
∪
jacent to i} { l ( x, h) + l ( h, k ) + d ( k ) for all nodes k ( ≠ i ) adjacent to h} for each node j ( ≠ h ) adjacent to i c (i, j ) := l ( x, i ) + l (i, j ) + d ( j ) − min endfor for each node k ( ≠ i ) adjacent to h c ( h, k ) := l ( x, h) + l ( h, k ) + d ( k ) − min endfor else if l ( x, i ) ≤ threshold and l ( x, h) > threshold then min := minimum of { l ( x, i ) + l (i , j ) + d ( j ) for all nodes j ( ≠ h ) adjacent to i}
∪{ l ( x , h ) + d ( h ) }
for each node j ( ≠ h ) adjacent to i c (i, j ) := l ( x, i ) + l (i, j ) + d ( j ) − min endfor c ( x, h) := l ( x, h) + d ( h) − min else if l ( x, i ) > threshold and l ( x, h) ≤ threshold then
∪
min := minimum of { l ( x, i ) + d (i ) } { l ( x, h ) + l ( h, k ) + d ( k ) for all nodes k ( ≠ i ) adjacent to h} c( x, i) := l ( x, i ) + d (i) − min for each node k ( ≠ i ) adjacent to h c ( h, k ) := l ( x, h) + l ( h, k ) + d ( k ) − min endfor else min = minimum of { l ( x, i ) + d (i ) , l ( x, h ) + d ( h) } c( x, i) := l ( x, i ) + d (i) − min c ( x, h) := l ( x, h) + d ( h) − min
endif
As seen above, the computation of a consequence value requires only a few elementary operations. Therefore, the result should be immediately available and ready to be communicated to the user. 4.3 Communication of Consequence Values Here we present an example of how consequence values may be communicated through a graphical interface. It should be understood that it serves the purpose of illustration not of restriction. This is particularly so since navigational information is
Information on the Consequence of a Move
69
often more effectively communicated vocally than graphically [22], and this may be true for route improvisation support, too. The output devise has a screen on which different forms of consequence values can be displayed (Figure 7) in response to a command sent by the user through a keyboard, a touch screen, or any other form of medium. It is assumed that the user is currently approaching to an intersection from the south and has options of going straight through the intersection, turning left at the intersection, turning right at the intersection, and turning around immediately.
(b)
(a) 20 MINS
30 MINS
6:20 PM
25 MINS
6:30 PM
28 MINS
6:25 PM
6:28 PM
(c)
(d) ±0 MIN
+10 MIN
+8 MINS
+5 MINS
+10 MIN
+5 MINS
+8 MINS
Fig. 7. Graphical communication of consequence values. It is assumed that the user’s current location (denoted by a dot) is sufficiently close to the intersection located at the center and sufficiently far from the intersection partially seen at the bottom.
In Figure 7(a), a consequence value is placed on the road segment onto which its corresponding move turns. For example, the label “25 MINS” is associated with the move of turning right at the intersection and indicates that it will take 25 minutes to reach the destination if the right turn is made there. An answer to such a what-if question is given to all other moves. The figure also suggests that the optimal move is the
70
T. Shirabe
one going straight since it is labeled with the smallest value. If arrival time is crucial in improvising a route, the user may choose to translate these consequence values by the current time (assumed to be 6:00 PM in Figure 7(b)). As mentioned earlier, the user may find it convenient if consequence values are calculated relative to that of the optimal move. Figure 7(c) displays this form of consequence values, from which the user can quickly learn how much extra time will be incurred by each move. The optimal move is the one with the “±0 MIN” label, which could be replaced by a pictorial symbol like an arrow (Figure 7(d)). This implies that the proposed method may be integrated into an existing navigation system.
5 Conclusion The paper introduced a new concept of travel assistance that keeps the user informed of the consequence of each possible move from the current location and lets him/her improvise a path while traveling. It employs a method of assigning each move a single value called a “consequence value,” which answer a what-if question regarding that move, such as how long (in terms of a selected attribute) it takes to reach a destination if that move is made. A consequence value of a move can be seen as one of many possible parameters that characterize a set of routes following that move. Some parameters are easily derived from a shortest path tree rooted at the destination, which is computed by an all-to-one shortest path algorithm executed prior to the trip. Others may require an excessive amount of pre-trip computation or an excessive frequency of en-route computation. The paper also outlined a design of a route improvisation support system equipped with the proposed method. Since the system share many basic components with the conventional navigation system, it is possible to use the proposed method as an extension of an existing navigation system. Whichever type of implementation chosen, however, it might experience difficulties in communicating a plurality of consequence values (one for each possible move) to the user effectively. The difficulties multiply if the user wants to associate with each move multiple consequence values, which may be computed using different cost types, different computing formulas, and/or even different destinations. This imposes a serious design question of where to place consequence values dynamically on a limited-size screen, which resembles the problem of “dynamic map labeling” [2]. Even if technical issues as such were addressed, a fundamental question regarding the usability of a route improvisation support system would remain: whether the user is able and willing to take the amount of cognitive load the system imposes. In the case of a conventional navigation system, the user focuses on the task of following a prescribed sequence of directions (although, as discussed earlier, its execution is not that easy). On the other hand, a route improvisation support system does not make a decision for the user, but lets (and indeed makes) the user process provided information and make his/her own decision while traveling. Therefore, the usability of the proposed method and system must be tested empirically, which will certainly call for interdisciplinary research cooperation.
Information on the Consequence of a Move
71
References 1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network flows: theory, algorithms and applications. Prentice Hall, Englewood Cliffs (1993) 2. Been, K., Nöllenburg, M., Poon, S.-H., Wolff, A.: Optimizing active ranges for consistent dynamic map labeling. Computational Geometry 43(3), 312–328 (2010) 3. Cadwell, T.: On finding minimum routes in a network with turn penalties. Communications of the ACM 4(2), 107–108 (1961) 4. Ellard, C.: You Are Here: Why We Can Find Our Way to the Moon, But Get Lost in the Mall. Doubleday, New York (2009) 5. Dehne, F., Omran, M.T., Sack, J.-R.: Shortest paths in time-dependent FIFO networks using edge load forecasts. In: Proceedings of the Second International Workshop on Computational Transportation Science, IWCTS 2009 (2009) 6. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959) 7. Fan, Y., Kalaba, R., Moore, J.: Arriving on time. Journal of Optimization Theory and Applications 127(3), 497–513 (2005) 8. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics SSC4 4(2), 100–107 (1968) 9. Kamburowski, J.: A note on the stochastic shortest route problem. Operations Research 33(3), 696–698 (1985) 10. Kaufman, D.E., Smith, R.L.: Fastest paths in time-dependent networks for ivhs application. IVHS Journal 1, 1–11 (1993) 11. Kirby, R.F., Potts, R.B.: The minimum route problem for networks with turn penalties and prohibitions. Transport Research 3, 397–408 (1969) 12. Klippel, A., Tappe, H., Habel, C.: Pictorial representations of routes: Chunking route segments during comprehension. In: Freksa, C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition III. LNCS (LNAI), vol. 2685, pp. 11–33. Springer, Heidelberg (2003) 13. Lovelace, K.L., Hegarty, M., Montello, D.R.: Elements of good route directions in familiar and unfamiliar environments. In: Freksa, C., Mark, D.M. (eds.) COSIT 1999. LNCS, vol. 1661, pp. 65–82. Springer, Heidelberg (1999) 14. Montello, D.R.: Navigation. In: Shah, P., Miyake, A. (eds.) The Cambridge Handbook of Visuospatial Thinking, pp. 257–294. Cambridge University Press, Cambridge (2005) 15. Orda, A., Rom, R.: Minimum weight paths in time-dependent network. Networks 21(3), 295–320 (1991) 16. Raubal, M., Winter, S.: Enriching wayfinding instructions with local landmarks. In: Egenhofer, M.J., Mark, D.M. (eds.) GIScience 2002. LNCS, vol. 2478, pp. 243–259. Springer, Heidelberg (2002) 17. Richter, K.-F., Tomko, M., Winter, S.: A dialog-driven process of generating route directions. Computers, Environment and Urban Systems 32(3), 233–245 (2008) 18. Samet, H., Sankaranarayanan, J., Alborzi, H.: Scalable network distance browsing in spatial databases. In: Proceedings of ACM SIGMOD Conference, Vancouver, Canada, pp. 43–54 (2008) 19. Sankaranarayanan, J., Samet, H., Alborzi, H.: Path oracles for spatial networks. In: Proceedings of the 35th International Conference on Very Large Data Bases (VLDB), Lyon, France, vol. 2, pp. 1210–1221 (2009) 20. Schmid, F.: Knowledge-based wayfinding maps for small display cartography. Journal of Location Based Systems 2(1), 57–83 (2008)
72
T. Shirabe
21. Sigal, C.E., Pritsker, A.A.B., Solberg, J.J.: The stochastic shortest route problem. Operations Research 28(5), 1122–1129 (1980) 22. Streeter, L.A., Vitello, D., Wonsiewicz, S.A.: How to tell people where to go: comparing navigational aids. International Journal of Man-Machine Studies 22, 549–562 (1985) 23. Wiener, J.M., Büchner, S.J., Hölscher, C.: Taxonomy of human wayfinding tasks: a knowledge-based approach. Spatial Cognition & Computation 9(2), 152–165 (2008) 24. Winter, S.: Modeling Costs of Turns in Route Planning. Geoinformatica 6, 345–361 (2002) 25. Ziliaskopoulos, A., Mahmassani, H.S.: Time-dependent shortest path algorithm for realtime intelligent vehicle highway system applications. Transportation Research Record 1408, 94–100 (1993)
The Effect of Activity on Relevance and Granularity for Navigation Stephen C. Hirtle1, Sabine Timpf2, and Thora Tenbrink3 1
School of Information Sciences, University of Pittsburgh, Pittsburgh, PA 15260 USA
[email protected] 2 Institute of Geography, University of Augsburg, Augsburg 86135, Germany
[email protected] 3 Transregional Collaborative Research Center on Spatial Cognition, University of Bremen, Bremen 28359, Germany
[email protected]
Abstract. This paper addresses the role of activity on the construction of route directions. Primary to our conceptualization is that the activity at hand constrains the relevance of spatial information for task performance, as well as the level of granularity at which information is needed. In this paper, we highlight the role of activity for relevance and granularity first based on a review of each of the components involved, and furthermore by a semantic analysis of content patterns in human-generated instructions. The analysis identifies the verbalization styles that are associated with distinct types of activities on the basis of individual keywords that may serve as indicators. We offer a strong theoretical argument for the importance of activities and provide a first step towards an operationalization of this concept, as well as implications for the development of cognitively motivated navigation systems. Keywords: Activities, granularity, relevance, navigation, routing, word clouds.
1 Introduction Spatial information theory has made great advances over the past two decades in delineating the ontological and cognitive structures of how humans perceive space. One focus of this line of research has been on the construction of useful and efficient navigational systems that guide users using cognitively rich frameworks, such as landmarks, neighborhoods, features, and navigational history [1-4]. To be successful in building such systems, it is critical to understand the task at hand, the knowledge that users bring to bear, the expectations that users have about the space, and the physical cues that the environment provides to the user [5-6]. The approach taken is representative of a more general problem, which is to identify theoretical and practical issues that highlight how humans represent spatial information. Consider the problem of navigating from your home to a new theater in the center of town. This rather common and mundane task might be seen as requiring a number of stages from determining the location of the new theater, selecting the mode (or modes) of transportation, and timing the travel through space so that you are not late. M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 73–89, 2011. © Springer-Verlag Berlin Heidelberg 2011
74
S.C. Hirtle, S. Timpf, and T. Tenbrink
Additional constraints, such as mixing the trip with dinner or shopping, might also be included if time allows. Primary to our conceptualization of the problem is that the activity at hand constrains the relevance of spatial information for task performance as well as the level of granularity at which information is needed [7]. For example, while biking to work and biking for leisure would require access to the same underlying data of bicycle-friendly paths and roads, the recommended route might be quite different as a result of the planned activity. For someone hiking in a mountainous environment, elevation information would be relevant for navigation, while this may not be the case for someone driving a car in the same environment [8]. Additionally, the granularity of instructions should vary with the knowledge of the user or other needs [9]. In a familiar environment, one may only need a brief indication of the route, while in an unfamiliar environment much greater detail is needed [2]. In these and many other ways, the task setting and activity in which the navigator is engaged lead to determinations of the level and structure of spatial instructions [10]. These components are not exhaustive, but can provide a rich foundation for the understanding of navigational tasks. In this paper we will highlight the role of activity for relevance and granularity first based on a review of each of the components involved, and furthermore by a semantic analysis of content patterns in humangenerated instructions. This analysis will characterize the verbalization styles that are associated with distinct types of activities on the basis of individual keywords that may serve as indicators. Finally, implications for future navigation systems are discussed.
2 Background In recent years, the investigation of route directions has increasingly aimed at a systematic differentiation of conceptual aspects determining the choice of spatial components to be represented. Many accounts have jointly achieved the identification of a relatively constant set of cognitive elements that are used as building blocks in route directions, including start and end point, route segments, action and movement descriptions, reorientations, landmarks, regions and areas, and distances [11-16]. Further research (as reported below) has highlighted different levels of granularity and an impact of considerations of relevance on the ways in which these elements are chosen and represented in a description. We argue that these various influences can be comprehensively captured by understanding the activity at hand. To substantiate this claim, we first review previous findings on relevance and granularity. 2.1 Relevance According to Sperber and Wilson [17], human cognition is generally tuned to a maximization of relevance. Humans draw from the perceptual input they receive just those bits of information that matter to them in some respect, and that relate to previous knowledge as well as current requirements, purposes, and needs. Navalpakkam and Itti [18] present a computational model of how task-dependent relevance directs visual attention focus with respect to scene perception,
The Effect of Activity on Relevance and Granularity for Navigation
75
demonstrating the conceptual bias posed by relevance already at a stage at which visually validated objective fact might have been expected. Fauconnier and Turner [19] propose such a cognitive bias towards maximal relevance to be one of several pervasive basic simplification mechanisms, which together serve to keep the available perceptual and conceptual complexity within manageable limits. Van der Henst et al. [20] demonstrate how relevance influences the ways in which indeterminate relational problems are solved by humans, namely by focusing on those solution paths that are potentially useful within a certain functional context. In wayfinding contexts, relevance can be considered as the kind of spatial information that people need to know in order to effectively navigate. Here one might ask what spatial cues individuals pull out of the environment to aid navigation. Intuitively, those spatial elements that are frequently mentioned in route descriptions as summarized above should generally be those elements that are considered by route givers as relevant for navigation. This assumption is supported by independent evidence; for instance, Janzen and Turennout [21] used fMRI to show increased attention and recall for objects that appear at decision points in a virtual reality navigational task. Since navigation differs according to context, considerations of relevance have also been considered in distinct ways. Within GIScience, relevance has been discussed by Purves and Jones [22] in the context of geographic information retrieval, Raper [2] in the context of information needs, and Reichenbacher [23] with regard to mobile services. However, to our knowledge, considerations of relevance have not been spelled out systematically across wayfinding or route description contexts, or integrated into any general theory of navigation. We propose that this can only be achieved via a systematic distinction of activity types. In particular, relevance is always a relative concept: spatial information is relevant for a specific purpose, i.e., within a particular activity context. Relevance determines the type of information to be provided at all in route descriptions. The next issue then concerns the precise way how this information is to be provided. This has been discussed in terms of granularity, which we turn to next. 2.2 Granularity It is a widely acknowledged fact that contexts determine the levels of granularity with which humans perceive and conceive of the world. Often, this concerns level of detail at hand; for instance, the extent to which a glass can be truthfully referred to as empty depends decisively on the context in which this reference becomes relevant [24]. Similarly, though quite independent of degree of detail, one and the same real-world entity can be conceptualized (and referred to) in various ways depending on current function [25]. Due to these widely recognized effects, Fonseca et al. [26] propose a framework for offering spatial information at various levels of granularity for geographic information systems. Timpf et al. [27, 28] outline how wayfinding processes are conceptually hierarchical and therefore require granularity transformations to allow for the diverse procedures ranging from conceptual planning to the fine motor execution of navigation. The need to consider variable granularity in route directions has been documented in a recent paper by Tenbrink and Winter [29], where they argue that the level of
76
S.C. Hirtle, S. Timpf, and T. Tenbrink
granularity provided in a description should vary with knowledge of the space. Their approach builds on both a long history in the communications literature that argued for efficient conversation [30] and experimental work that indicated that ideal directions were not overly wordy but provided key details for the navigation task [11, 31, 32]. Tenbrink and Winter [29] present a model of granularity that includes multilevel granularity in terms of both 1D (linear) granularity and 2D (areal) granularity. The approach is general enough to account for previous approaches to 1D granularity, such as chunking of a route into known segments [2, 9, 33] or the constructional units of Wunderlich and Reinelt [34], which represent the partitioning of the route into an orientation phase at the start, a travel phase, and an orientation phase at the destination. It also accounts for 2D granularity, such as destination descriptions of the form ‘head to airport’ [35]. By comparing directions generated by humans with those generated by web-based services, Tenbrink and Winter [29] found that human route directions were much more variable, e.g., in terms of less information along simple paths as opposed to more detailed information at complex decision points. Obviously, therefore, the choice of granularity level is influenced by previous knowledge. Notably, such knowledge is not to be considered as static. For instance, building on earlier work by Norman [36] and Kuipers [37], Timpf [38] argued that mobile devices require the negotiation between knowledge in the head (what do you know about the space), knowledge in the world (signage and other information that will facilitate navigation), and knowledge in the pocket (mobile device with instructions, images, or locational information). Within this context activities, such as planning, tracking and assessing can take place. Similar effects of task influencing granularity have also been found to apply in other contexts, independent of knowledge. Hölscher et al. [39] compared descriptions of one's own future route with descriptions of another person's potential route (such as a stranger in town asking for the way). They found that the imagined route traveller was provided systematically with more details about the spatial environment within each category of spatial information investigated (actions at decision points, paths, and landmarks at and between decision points). Moreover, the routes chosen differed according to task setting, indicating that the route network itself was perceived at a coarser level of granularity (i.e., more graph oriented) than for one's own preferred route. That addressees can influence the level of granularity in various ways was also found by Tenbrink et al. [40], who investigated map-based route instructions given to humans as compared to an automatic dialogue system. The dialogue system was consistently addressed on an incremental, turn-by-turn level of granularity, while instructions for humans switched back and forth between incremental and destinationbased descriptions. Altogether, again, the general activity in which spatial navigation is embedded appears to determine decisively how spatial information is conveyed, both with respect to which elements are included (relevance) and with respect to the level of detail at which they are described (granularity). We will now look at the notion of activity more closely. 2.3 Activity Navigation seldom is an end to itself if we disregard special types of navigation such as exploration or hiking. We usually journey because we need to be at some other
The Effect of Activity on Relevance and Granularity for Navigation
77
place to do something there that we can’t do here, such as shopping, working, learning, or meeting with someone. This means that the navigation process usually is embedded in some kind of activity. Depending on the activity and its purpose, different goals for the navigation process might become important, such as taking the most efficient mode of transportation, minimizing the cost, minimizing the effort in navigation, following the most scenic route, maximizing safety, learning about the history or architecture, and so on. One might even just minimize the amount of effort in thinking about the trip, e.g., out of habit many Americans will often just get in their car and not even consider using a bicycle or walking. From an ontological point of view, activities and tasks produce partitions of reality [41], where reality is composed of (1) those things that are relevant (the smaller, but very detailed, part) and (2) those things that are not relevant for the task at hand (the larger part). The activity leads to the relevant information being extracted at varying degrees of granularity. Four different examples are given below. a) A person who is headed to hospital to take a friend to the emergency room would most likely have a singular focus in mind and thus require or even want no other details than fastest and most direct route, while the casual visitor who is visiting a friend in the hospital might appreciate being offered different parking options at different prices. b) In changing mode of transportation (bus, tram, subway), the rider needs additional information, which ideally is in the form of signage at the exchange points. It is often easier to change to a more global mode (e.g., bus to subway) than to a more local mode (e.g, subway to bus), in part due to the number of options at the transfer point. c) In taking a bus from home to downtown, the main question is when do I need to get off and, if necessary, how do I signal the driver to stop. In contrast, subways are well-designed to indicate stops, which are less numerous than bus stops, and will automatically stop at all scheduled stations. GPS enabled buses in many jurisdictions are now much improved at announcing future stops along the route. d) Even in straight-forward travelling situations, there are often what Hirtle et al [42] called endpoint problems. That is, the destination may be clear, but the appropriate office, parking lot, entrance way, etc., may not be well-marked. This is particularly true when the final location has a generic address, such as One Microsoft Way. A conceptual framework to describe and distinguish activities and their constituents is Activity Theory [43, 44]. The idea of activity theory is that the human mind can only be understood within the context of human interaction with the (physical) world. This interaction is socially and culturally determined. In Human Computer Interaction (HCI) activity theory is often used when designing human-artifact interaction [45]. Activity theory further describes all human activities as hierarchical. Activity can be understood at three levels of analysis: activity, action or task, and operation. Activity contributes the purpose or motive for a collection of actions. Tasks are directed towards goals that contribute to or are related to the purpose of the activity. These goals can also be in conflict with each other. Tasks are combined from operations, which are adapted to emerging conditions depending on the goals of the actions.
78
S.C. Hirtle, S. Timpf, and T. Tenbrink Table 1. Examples of activities and their purposes, tasks and goals
visit a friend at the hospital
Purpose
take a friend to emergency room at the hospital get medical help fast
socialize
Task 1
drive your car to the hospital or call the ambulance
drive your car to the hospital
as fast as possible, fastest route
optional: take shortest path
Activity
Goal
Task 2 Goal
Task 3 Goal
park next to emergency room
park where available
nearest parking space
optional: least expensive
go to emergency room
go to main entrance
as fast as possible
find directory or information booth
go to friend's room
Task 4 Goal
try not to get lost
In Table 1, the example a) from above has been expressed within the framework of activity theory. The relevant information is determined in relation to the purpose and the derived goals of activities and tasks. As can easily be seen already here, the relevance of specific information such as the location of the emergency room can be inferred from the activity and its purpose – in the example on the left this information is highly relevant whereas it is completely irrelevant in the example on the right in Table 1. Granularity as derived from the activity seems less obvious. However, the sequences of tasks describe the activity at a more detailed level of information, i.e. at a higher granularity (see also [28]). Granularity will depend on both the purpose and the local complexity, where less detail may be required in certain situations and more detail in others. The granularity required in describing a route for a specific person or type of person depends on the assumed knowledge of that person and on the complexity of the environment [46] in which the activity is situated. Table 2. Activity Model of Wayfinding from Timpf (2005)
Activity
Wayfinding: Get from place A to place B
Tasks
Planning
Tracking
Assessing
Operations
Information gathering, find routes, determine constraints, determine complexity, produce instructions
Orienting, track location, compare to plan, orient yourself
Compare needed to planned time, assess instructions, determine complexity of routes
The Effect of Activity on Relevance and Granularity for Navigation
79
The notion of activity is a relative one. In our example above, the activity was to get your friend to the hospital and one of the tasks was to drive your car to the hospital. Now looking in detail at this task, we can again assume that “to drive your car to the hospital” can be seen as an activity and the tasks this requires might be planning your route, tracking where you are and assessing if your are fast enough. The execution of the plan, including the possibility of updating in transit, occurs through the set of planning, tracking and assessing tasks that are indicated in Table 2 taken from Timpf [38]. One distinction that may be necessary to make is the one where navigating is seen as a task within an activity (such as in the examples in Table 1) and where it is seen to be the activity itself (Table 2). In the cases of exploration, sailing, hiking, or jogging, navigation can be said to be an activity itself, whereas in the everyday cases, navigation often is a task to accomplish as part of some other activity. Altogether, these thoughts suggest that the consideration of activities should be decisive for the formulation of route directions, both with respect to the kinds of information to be included based on their relevance for the task at hand, and with respect to the level of detail in which this information is presented. In the following, we will provide insights gained by examining a corpus of route directions collected from the internet.
3 Analysis of Activity-Based Directions 3.1 Examples of Directions In order to examine the effects of granularity, relevance and activity on directiongiving, we chose to mine the Spatially-strAtified Route Directions (SARD) Corpus that was collected at Penn State University [47]. The corpus consists of 11,254 webpages, which contain human-generated route directions in English from the United States, United Kingdom, and Australia. The webpages are for a large variety of sites, such as parks, hospitals, county inns, restaurants, soccer fields, and so on. Many of directions include landmarks, counts, and corrections. For example, one set of directions includes the following segments: At the third red light, turn right onto Highway 14/206 (East Main Street). Go under the railroad trestle and at the second red light past the trestle, turn left onto South Washington Street. Turn left on Doster Street (if you reach the Autauga Creek Bridge, you have gone too far). The library is immediately to your right.
While the directions above are rather generic, there were also examples of directions that varied in terms of relevance and granularity dependent on the activity or location in mind. Overview first. The following example indicates the role of granularity in choosing a route. The webpage describes two choices in very general terms for a drive into Mexico. A “road log” gives detailed directions by mileposts, including where services can and cannot be found along the 120 km route.
80
S.C. Hirtle, S. Timpf, and T. Tenbrink
Two routes exist - the "Low Road" and the "High Road". The low road is more scenic but is prone to being washed out during the rainy season. The high road is less scenic but more reliable. During rainy seasons the upper road between San Rafael and Bahuichivo may be exceedingly muddy and impassable; river crossings on the lower road may also be impassable, especially in late afternoons and evenings when thunderstorms are most likely. … Most traffic is now using the lower road but the choice changes frequently and is seasonal. … Our road log gives details on each route Directions for visitors. Visitors to a location are often looking for more than just the fastest route, but might also want interesting alternatives that, in turn, provide a context to the space. For example, one set of walking directions from the United Kingdom states: Alternatively, take the 426-446 bus to Street Peter's Hospital. Alight at the bus stop as the bus leaves the hospital! Cross the road just before the roundabout. Take the small path next to the Runneymede sign then join the main pathway and head towards the Samsung building. On a nice day this is a pleasant 10 minute walk. As the path curves round you will see the Electronic Arts sign on the left. Follow the path over the lake. Reception is on the first corner of this hugely impressive modern building. These instructions suggest that on a nice day, it would be a rather pleasant walk on a series of small footpaths that pass some impressive modern architecture. Such details are not necessary, but are included to give the visitor to the location a sense of the area and information that can later be used for orientation. Directions to sports fields. Directions to sports fields provide an interesting challenge in that, while they are often near other landmarks, they are typically without a direct numerical address. That is, they are often behind a school or past a landmark. In addition, the directions often need to direct you in some detail to the parking lots. For example, these directions from Georgia include a number of orientation remarks to help you locate the fields as you approach them from the highway: Our main soccer complex is located 25 miles south of Hartsfield International Airport and is just off of I-85 South in Newnan, GA. If taking exit 47 off of I-85, go east at the bottom of the exit ramp. Go past the Home Depot and then you will pass the entrance to the Yamaha (both are on the left hand side of Hwy 34). Once past the Yamaha plant, the fields are on the left. Take a left at the next red light, which will be Walt Sanders Memorial Drive and will be right at the John Cullen Dodge Dealership. The soccer fields and parking lots are just behind the John Cullen Dodge. 3.2 Conceptual Analysis As the directions were written for different purposes, it is also possible to examine the verbalization styles that are associated with distinct types of activities. The first set of instructions above was for going to a library, which is a rather generic task. There are
The Effect of Activity on Relevance and Granularity for Navigation
81
other kinds of tasks that one might need navigational instructions for. As a first pass at constructing a theory of activities, we identified five different types of activities that are representative of a much larger set of possible activities. Each row of Table 3 lists a different activity in a general sense. These are (1) getting somewhere urgently, (2) enjoying the natural scenery, (3) taking an educational trip or sightseeing, (4) attending a sporting event or finding a sporting location, such as a soccer field, and (5) following a trail or route for exercise. While these are distinct activities as such, they share some basic features while contrasting fundamentally in others. For example, some activities (such as getting somewhere urgently) involve time pressure, while others (such as enjoying natural scenery and sightseeing) rather suggest the opposite, if the traveler wishes to spend time in an enjoyable way. Similarly, activities differ systematically with respect to the role of the path: for some activities the navigation itself is in focus for the traveler, while for others navigation is a mere necessity in order to reach some destination. The columns of Table 3 provide an intuitive characterization of some of these basic features. A plus sign (+) indicates it is a likely factor to include as a positive attribute in the directions, a negative sign (-) indicates it is a likely factor to include as a negative attribute, and an “o” indicates that it is unimportant for the category. Table 3. Attributes of directions (columns) versus the purpose of navigation (rows)
Getting somewhere urgently Enjoying natural scenery Educational trip / Sightseeing Attending sporting event Following trail for hiking or exercise
Time pressure
Beauty of landscape
Urban architecture
+
O
O
-
+
O
-
+
-
O
O
+
-
O
+
O
+
+
O
O
O
O
+
O
O
+
O
+
O
+
Effort
Focus on destination
Focus on the path
Note: + indicates a positive attribute, - indicates a negative attribute, and o indicates a neutral or unimportant attribute. See text for details.
From this analysis, one can start to identify likely keywords for each column. Intuitively, instructions positively focused on time pressure would include terms such as fast, fastest, quick, speed, rush hour, or traffic, while those negatively focused on time pressure might include terms such as leisure or stroll. Neutral directions would favor no particular word for this category. Thus, directions focused on getting you
82
S.C. Hirtle, S. Timpf, and T. Tenbrink
somewhere as quickly as possible would include phrases like “the fastest route” or “during rush hour”. In contrast, the phrase “as you stroll through” might appear with directions along natural trail for enjoying the natural scenery, indicating the opposite of time pressure. Positive terms for the beauty of the landscape would include lexical indicators such as beautiful, scenic, scenery, impressive, stunning. Positive terms for urban architecture would include interesting, innovative, impressive, architecture, or historic, while descriptions of natural scenery activities that tend to avoid buildings would be characterized by an absence of these terms. Terms for effort would include difficult, steep, effort, or careful, which people trying to reach a destination quickly might want to avoid, while people looking for exercise might seek just these indicators. Positive terms for focus on destination would include see, look for, or arrive at, while descriptions focusing on the path would normally not contain these terms. In contrast, positive terms for focus on path would include follow, enjoyable, nice route, or pleasant, which would be relatively irrelevant for descriptions focused on the destination. One can also look at the destination and how it might fit into the activity matrix shown in Table 3. That is, how do the directions to a hospital (typically a place to get to quickly), differ from the directions to a mountain-top country inn (where a more scenic route might be preferred). The database was mined to examine how human generated directions might vary with the activity that is planned. This was accomplished by looking at (1) keywords that might suggest different activities (e.g., landscape, architecture, history) and (2) constraints that can be imposed on navigation (e.g., avoid, fastest, difficult, walking, rush hour). 3.3 Tabular Analysis To get a sense of how the instructions vary across dimensions we counted the number of route descriptions that included various keywords that would represent the activities, constraints and modes of travel. Table 4 shows the number of webpages in the corpus that contained specific words or phrases. It turns out that some phrases, such as “rush hour”, are remarkably infrequent with only 17 pages containing that phrase. At the opposite end of the spectrum, there are words like “traffic” that appear 2608 times, often in terms of “turn right at the traffic signal” or “proceed through three traffic lights.” Keywords with over 1000 hits were disqualified from further processing. 3.4 Word Clouds To get a sense of the characteristics of groups of directions, we chose to summarize the set of documents using word clouds [48-50]. A sample of representative sets of directions from categories in the tables above were collected, ignoring the extraneous material on the websites about hours, services, and other non-navigational details. That is to say that while the entire website provided the overall context for the directions, the word clouds were based on the text in the directions alone.
The Effect of Activity on Relevance and Granularity for Navigation
83
Table 4. Counts of selected keywords grouped in general categories of use
Time Pressure Traffic 2608 Quick 865 Speed 466 Fast 388 ArriveAt 342 Fastest 46 RushHour 17 (Ͳ)Leisure 249 (Ͳ)Stroll 74
Beauty of Landscape Beautiful 665 Scenic 359 Landscape 194 Scenery 88 Stunning 74
Urban Architecture History Historic Architecture Innovative Interesting Impressive
Effort
Focus on Destination See 2899 Look 1582 Fields 614 Soccer 347 Arriveat 342 LookFor 820
Focus on Path Follow 6212 pleasant 359 enjoyable 50 easydrive 15
Careful Steep Effort Difficult Exercise
175 168 162 131 105
1669 752 79 62 59 43
Note: Words in italics were too general to serve any discriminatory purpose.
We began by looking at word clouds for some of the attributes identified in the matrices above. Contrasting directions under time pressure with directions for the beauty of the landscape, Figure 1 shows the word cloud for directions that include the word fastest, while Figure 2 shows the word cloud for directions that include the word scenic. The fastest cloud tends to be direction oriented with words like left, right, east, west, turn, proceed, traffic, and so on, whereas the scenic cloud added words like pleasant, follow, approximately, and just, indicating a more leisurely journey rather than just hard directions. One can also look at differences by country. In all cases, the fastest directions consist primarily of place names, highway names, and simple directional commands. Not surprising, the USA tends to interstate-based, with words such as exits and ramps, while the Australian and UK directions tend to road-based with words such as fork, towards, and along. For a more direct test of the role of activities in the relevance of spatial information, Figure 3 shows the word cloud for soccer. Typically in the US, these were directions for youth soccer leagues, directing parents to fields for competitions that are often located behind schools, in parks, and in other hard to find locations.
84
S.C. Hirtle, S. Timpf, and T. Tenbrink
Fig. 1. Word cloud for the purpose of Time Pressure (Fastest)
Fig. 2. Word cloud for the purpose of enjoying the Beauty of Landscape (Scenic)
This is in contrast to major sporting stadiums that would be highly visible and indicated through road signage from multiple directions. The word cloud in Figure 3 conveys a different feel than the two previous clouds with words like located, approximately, entrance, follow, onto, park, parking, etc. It appears that the directions emphasized endpoint problems of how to find the fields and where to park. As a final example, Figure 4 shows the word cloud for the word hospital, which was not part of the tabular analysis. In contrast to both scenic directions and soccer directions, the directions to hospitals are much more succinct. In part, this is due to the locations of hospitals and clinics, which are typically in more urban, or possibly suburban, settings and are often clearly identified. Figure 4 implies fairly basic directions with an emphasis on left and right turns and street names.
The Effect of Activity on Relevance and Granularity for Navigation
85
Fig. 3. Word Cloud for soccer, indicating increased use of words such as parking, approximately, follow, entrance
Fig. 4. Word Cloud for hospital, indicating very basic directions focus around turns and street names
4 Summary and Conclusions In this paper, we have shown how the activities influence the construction of route directions. Activities lead to a chosen granularity of information and to both the inclusion of relevant details and exclusion of irrelevant details. These three components, activities, granularity and relevance, in part, account for the difficulty in automatically generating directions. Consider for example any current automated route guidance system. There are at most a few options in personalizing the route chosen, such as selecting the fastest route or more scenic route, avoiding tolls, or setting a preference for road type. Note that these options affect only the route selection. We know of no automatic systems that, when choosing the most scenic route, also highlights additional information about the area or gives additional
86
S.C. Hirtle, S. Timpf, and T. Tenbrink
warnings about the difficulty of the turns. Instead, the route description settings are independent of the route selection settings. The argument in this paper is that these two aspects are in fact tied to each other in symbiotic ways. We have outlined the importance of activity types for differences in relevance as well as granularity levels within route directions. As a first step towards an operationalization of this general idea via a keyword search, we have suggested a range of potential lexical indicators of dimensions (such as speed or beauty) that are relevant for some activities (such as getting somewhere quickly) rather than others. We adopted the use of word clouds to give an overall feel for a large set of directions under different parameters. This analysis shows how word clouds can indicate the kinds of terms that intermingle for different kinds of directions. At the same time, it is rather cumbersome and there is a need to explore advance text mining and visualization techniques to show how groups of directions vary in complexity. Finally, the theory in Section 2 suggests that granularity can vary in directions in multiple and complex ways. At one level, coarse granularity is useful for planning activities by providing a general overview of the space. The example directions gave overall options in one case (the low or high road) and a general location in another (25 miles south of Hartsfield International Airport). At the same time, scenic directions often have a finer granularity to give visitors to the area a sense of the surrounding area. Overall, additional work is needed. This paper adds to a growing body of literature [e.g., 2, 7, 10, 11, 29] that argues for more ergonomic directions. Even if automated systems will never meet the standards set by human generated directions, it is still useful to delineate the differences. By doing so, a complete theory of the role of navigation in communication acts can be realized. Acknowledgments. The third author acknowledges funding by the DFG (German Science Foundation) for the SFB/TR 8 Spatial Cognition, projects I5-[DiaSpace] and I6-[NavTalk]. The first author acknowledges sabbatical support from the University Center for International Studies at the University of Pittsburgh and the Geoinformatics Program at the University of Augsburg.
References 1. Burnett, G.E.: “Turn right at the traffic lights”: The requirement for landmarks in vehicle navigation systems. The Journal of Navigation 53, 499–510 (2000) 2. Klippel, A., Hansen, S., Richter, K.-F., Winter, S.: Urban Granularities - A Data Structure for Cognitively Ergonomic Route Directions. GeoInformatica 13, 223–247 (2009) 3. Sarjakoski, L., Nivala, A.M.: Adaptation to Context—A Way to Improve the Usability of Mobile Maps. In: Meng, L., Zipf, A., Reichenbacher, T. (eds.) Map-Based Mobile Services, Theories, Methods and Implementations, pp. 107–123. Springer, Berlin (2005) 4. Schwartz, T., Stahl, C., Baus, J., Wahlster, W.: Seamless Resource Adaptive Navigation. In: Crocker, M., Siekmann, J. (eds.) Resource-Adaptive Cognitive Processes. Cognitive Technologies Series, pp. 239–265. Springer, Berlin (2010) 5. Frankenstein, J., Büchner, S.J., Tenbrink, T., Hölscher, C.: Influence of geometry and objects on local route choices during wayfinding. In: Hölscher, C., Shipley, T.F., Olivetti Belardinelli, M., Bateman, J.A., Newcombe, N.S. (eds.) Spatial Cognition VII. LNCS, vol. 6222, pp. 41–53. Springer, Heidelberg (2010)
The Effect of Activity on Relevance and Granularity for Navigation
87
6. Lynch, K.: The Image of the City. The Technology Press and the Harvard University Press, Cambridge, MA (1960) 7. Tenbrink, T., Winter, S.: Presenting spatial information: Granularity, relevance, and integration. Journal of Spatial Information Science 49 (2010) 8. Raper, J.: Geographic relevance. Journal of Documentation 63, 836–852 (2007) 9. Srinivas, S., Hirtle, S.C.: Knowledge based schematization of route directions. In: Barkowsky, T., Knauff, M., Ligozat, G., Montello, D.R. (eds.) Spatial Cognition 2007. LNCS (LNAI), vol. 4387, pp. 346–364. Springer, Heidelberg (2007) 10. Timpf, S.: Geographic activity models. In: Duckham, M., Goodchild, M.F., Worboys, M.F. (eds.) Foundations of Geographic Information Science, pp. 241–254. Taylor & Francis, London (2003) 11. Denis, M., Pazzaglia, F., Cornoldi, C., Bertolo, L.: Spatial discourse and navigation: An analysis of route directions in the city of Venice. Applied Cognitive Psychology 13, 145–174 (1999) 12. Habel, C.: Prozedurale Aspekte der Wegplanung und Wegbeschreibung. In: Schnelle, H., Rickheit, G. (eds.) Sprache in Mensch und Computer, pp. 107–133. Westdt. Verlag, Opladen (1988) 13. Couclelis, H.: Verbal directions for way-finding: space, cognition, and language. In: Portugali, J. (ed.) The Construction of Cognitive Maps, pp. 133–153. Kluwer Academic Publishers, Dordrecht (1996) 14. Denis, M.: The description of routes: A cognitive approach to the production of spatial discourse. Cahiers de Psychologie Cognitive 16, 409–458 (1997) 15. Tversky, B., Lee, P.: Pictorial and Verbal Tools for Conveying Routes. In: Freksa, C., Mark, D.M. (eds.) COSIT 1999. LNCS, vol. 1661, p. 51. Springer, Heidelberg (1999) 16. Allen, G.L.: Principles and practices for communicating route knowledge. Applied Cognitive Psychology 14, 333–359 (2000) 17. Sperber, D., Wilson, D.: Relevance—Communication and cognition. Basil Blackwell, Oxford (1986) 18. Navalpakkam, V., Itti, L.: Modeling the influence of task on attention. Vision Research 45(2), 205–231 (2005) 19. Fauconnier, G., Turner, M.: The Way We Think: Conceptual Blending and the Mind’s Hidden Complexities. Basic Books, New York (2002) 20. Van der Henst, J.-B., Sperber, D., Politzer, G.: When is a conclusion worth deriving? A relevance-based analysis of indeterminate relational problems. Thinking and Reasoning 8, 1–20 (2002) 21. Janzen, G., van Turennout, M.: Selective neural representation of objects relevant for navigation. Nature Neuroscience 7, 673–677 (2004) 22. Purves, R., Jones, C.: Geographic Information Retrieval (GIR). Computers, Environment and Urban Systems 30(4), 375–377 (2006) 23. Reichenbacher, T.: Geographic relevance in mobile services. In: Proceedings of the 2nd International Workshop on Location and the Web (LOCWEB 2009). ACM, New York (2009) 24. Smith, B., Brogaard, B.: A unified theory of truth and reference. Loqique et Analyse 43, 169–170, 49–93 (2003) 25. Hobbs, J.R.: Sketch of an ontology underlying the way we talk about the world. International Journal of Human–Computer Studies 43(5/6), 819–830 (1995) 26. Fonseca, F., Egenhofer, M.J., Davis, C., Câmara, G.: Semantic granularity in ontologydriven geographic information systems. Annals of Mathematics and Artificial Intelligence 36(1-2), 121–151 (2002)
88
S.C. Hirtle, S. Timpf, and T. Tenbrink
27. Timpf, S., Volta, G.S., Pollock, D.W., Egenhofer, M.J.: A Conceptual Model of Wayfinding Using Multiple Levels of Abstractions. In: Frank, A.U., Formentini, U., Campari, I. (eds.) GIS 1992. LNCS, vol. 639, pp. 348–367. Springer, Heidelberg (1992) 28. Timpf, S., Kuhn, W.: Granularity Transformations in Wayfinding. In: Freksa, C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition III. LNCS (LNAI), vol. 2685, pp. 77–88. Springer, Heidelberg (2003) 29. Tenbrink, T., Winter, S.: Variable Granularity in Route Directions. Spatial Cognition and Computation 9, 64–93 (2009) 30. Grice, H.P.: Logic and conversation. In: Cole, P., Morgan, J. (eds.) Syntax and Semantics, vol. 3, pp. 41–58. Academic Press, New York (1975) 31. Allen, G.L.: From knowledge to words to wayfinding: Issues in the production and comprehension of route directions. In: Hirtle, S., Frank, A. (eds.) COSIT 1997. LNCS, vol. 1329, pp. 363–372. Springer, Heidelberg (1997) 32. Schneider, L.F., Taylor, H.A.: How do you get there from here? Mental representations of route descriptions. Applied Cognitive Psychology 13, 415–441 (1999) 33. Richter, K.-F.: Context-Specific Route Directions — Generation of Cognitively Motivated Wayfinding Instructions. In: DisKi 314 / SFB/TR 8 Monographs, vol. 3. IOS Press, Amsterdam (2008) 34. Wunderlich, D., Reinelt, R.: How to get there from here. In: Jarvella, R.J., Klein, W. (eds.) Speech, Place, and Action, pp. 183–201. John Wiley & Sons, Chichester (1982) 35. Tomko, M., Winter, S.: Pragmatic Construction of Destination Descriptions for Urban Environments. Spatial Cognition and Computation 9, 1–29 (2009) 36. Norman, D.A.: The psychology of everyday things. Basic Books, New York (1988) 37. Kuipers, B.J.: The ‘Map in the Head’ metaphor. Environment and Behavior 14, 202–220 (1982) 38. Timpf, S.: Wayfinding with mobile devices: decision support for the mobile citizen. In: Rana, S., Sharma, J. (eds.) Frontiers of Geographic Information Technology, pp. 209–228. Springer, London (2005) 39. Hölscher, C., Tenbrink, T., Wiener, J.: Would you follow your own route description? Cognition (in press) 40. Tenbrink, T., Ross, R.J., Thomas, K.E., Dethlefs, N., Andonova, E.: Route instructions in map-based human-human and human-computer dialogue: a comparative analysis. Journal of Visual Languages and Computing 21, 292–309 (2010) 41. Smith, B.: Ontological Imperialism. In: Talk Held at GIScience in Savannah, Georgia (2000) 42. Hirtle, S.C., Richter, K.-F., Srinivas, S., Firth, R.: This is the tricky part: When directions become difficult. Journal of Spatial Information Science 1, 53–73 (2010) 43. Leontiev, A.N.: Activity, consciousness, and personality. Prentice-Hall, Englewood Cliffs (1978) 44. Nardi, B. (ed.): Context and consciousness - activity theory and human-computer interaction. MIT press, Cambridge (1996) 45. Kaptelinin, V., Nardi, B., Macaulay, C.: The activity checklist: a tool for representing the ’space’ of context. Interactions, 29–39 (July/August 1999) 46. Heye, C., Timpf, S.: Factors influencing the physical complexity of routes in public transportation networks. In: Axhausen, K. (ed.) 10th International Conference on Travel Behaviour Research, ETH Zurich, Luzern (2003) (CD-ROM) 47. Zhang, X., Mitra, P., Xu, S., Jaiswal, A.R., Klippel, A., MacEachren, A.: Extracting route directions from web pages. In: Twelfth International Workshop on the Web and Databases, WebDB 2009 (2009)
The Effect of Activity on Relevance and Granularity for Navigation
89
48. Bateman, S., Gutwin, C., Nacenta, M.: Seeing things in the clouds: the effect of visual features on tag cloud selections. In: HT 2008: Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, pp. 193–202. ACM, New York (2008) 49. Viégas, F., Wattenberg, M.: TIMELINES Tag clouds and the case for vernacular visualization. Interactions 15, 49–52 (2008) 50. Gottron, T.: Document word clouds: Visualising web documents as tag clouds to aid users in relevance decisions. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 94–105. Springer, Heidelberg (2009)
I Can Tell by the Way You Use Your Walk: Real-Time Classification of Wayfinding Performance Makoto Takemiya1 and Toru Ishikawa2 1
Graduate School of Interdisciplinary Information Studies, The University of Tokyo
[email protected] 2 Center for Spatial Information Science, The University of Tokyo
[email protected]
Abstract. Wayfinding activities often pose difficulty, especially for people with poor spatial abilities. If wayfinding aides can take into account individual differences during navigation, targeted assistance may be able to improve wayfinding performance. To enable this, the performance of wayfinders must first be classified. This work proposes a novel method that uses a probabilistic scoring function to classify wayfinding performance using only information available in real-time during route traversal. Training data for the classifier was algorithmically generated as routes representing different levels of wayfinding performance. This approach was tested through an empirical study in which people with different abilities walked from a start to a goal. The results show that performance of wayfinders can be reliably classified into two groups–good and poor–and that this classification can be done using only information available during route traversal. Our results suggest that environmental structure plays an important role in wayfinders’ route choice. Keywords: navigation, route choice, individual differences, spatial cognition, spatial abilities.
1
Introduction
“I’m lost,” is a phrase often heard when engaged in a wayfinding activity. Despite humans’ advanced cognitive faculties, wayfinding tasks often pose considerable difficulty, as anyone who has travelled to an unfamiliar land can attest to. Wayfinding is not randomly navigating an environment, but is a purposeful activity to traverse from a starting point to a goal destination. Montello [23] differentiates wayfinding from locomotion by arguing that wayfinding reflects cognitive processes going on during navigation, whereas locomotion involves only activities of the sensory and motor systems. Thus analysis of wayfinding activities can arguably provide insight into how humans perceive and reason about space. Additionally, it can also be theorized that spatial abilities are likely to have a measurable effect on wayfinding performance. M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 90–109, 2011. c Springer-Verlag Berlin Heidelberg 2011
Real-Time Classification of Wayfinding Performance
91
To assist in wayfinding, maps, and more recently electronic devices that utilize the Global Positioning System (GPS), provide externalized spatial information to assist people navigating spatial environments. However, despite the cost of and advances in technology, Ishikawa et al. [16] showed that when compared with traditional maps, GPS devices can increase the time and distance to traverse a route, while decreasing the wayfinder’s configurational knowledge of the route travelled. Obviously technology is not being properly utilized in many cases. One possible reason for this may be the failure of many contemporary devices to adapt to the individual needs of the people using them. Ishikawa and Montello [17] showed that significant individual differences exist in configurational understanding of routes, which suggests that significant differences may exist in individual spatial capabilities (also see [12]). Since people have significant differences in acquiring spatial knowledge and understanding spatial relations, it is likely that these differences will express themselves during wayfinding tasks in the form of various errors or sub-optimal route choices. This, in turn, implies that people will need different assistance to prevent and/or correct wayfinding errors. In order to provide the assistance that people need, individual wayfinding abilities need to be classified. Once this is done, targeted assistance can be applied. Test-based methods exist to classify spatial abilities of individuals, such as a self-report sense-of-direction questionnaire [13], but these methods cannot be practically done while a route is being traversed. Also, it is unlikely that feasible electronic wayfinding assistants can require users to take a spatial ability test before use. Ideally, a classification of what kind of assistance a wayfinder needs would be done automatically, based on a user’s wayfinding performance. Of especial importance to providing wayfinding assistance is finding wayfinders who are more likely to make poor route choices or get lost. In any non-trivial environment, many viable routes from a starting point to a destination goal can be taken. However, there will naturally be some routes that can be considered sub-optimal due to demands on time or other limited resources (e.g., fuel, energy). People who take these sub-optimal routes are thus more likely to be in greater need of wayfinding assistance, given their quantitatively lower performance (with respect to resources used). Distinguishing between wayfinders who need help and those who do not is a good starting point for providing targeted wayfinding assistance. This classification should be done as soon as possible after a wayfinder begins traversing a route, in order to provide appropriate assistance. Toward this end, a real-time classification algorithm should be used to determine a wayfinder’s ability based on individual performance. The objectives of the present work were twofold: to explore the connection between spatial abilities and wayfinding performance, and to see if wayfinding performance can be reliably classified. In this paper we discuss an empirical study of spatial abilities and wayfinding performance, and present a novel approach to differentiating between those who need help in a wayfinding task versus those who do not, and performing this classification in real-time.
92
2 2.1
M. Takemiya and T. Ishikawa
Empirical Study Study Area and Routes
Our study took place in Nara, Japan, near the Kintetsu Nara train station and Nara Prefectural University. In that area, we selected two routes for participants to traverse. The terrain was completely flat and isolated enough from the station such that finding participants who had never been to the study area did not prove to be too challenging. The two routes were on opposite sides of the university, thus geographically isolated from each other. The first route was longer and in a more complex (cf. [8]) area than the second route. The shortest path between the start and goal points were 1050 m and 560 m, for the first and second routes, respectively. For each route, the goal was not visible from the starting point, or vice versa, and participants could physically only walk on the streets, since there were buildings on all sides of the roads and no open spaces. There was a large variety of small streets, however, requiring the participants to choose from among many possible routes. The shortest path for the first route contained 18 decision points (see Section 3.1), where participants had to choose from among two or more possible directions to walk in. The shortest path for the second route contained 8 decision points (including the start and goal points for both routes). 2.2
Participants and Design
Thirty participants (fifteen female) participated in our study, in return for monetary compensation. All participants navigated using only provided paper maps, printed from Google Maps, without any other navigational aides and they were all unfamiliar with the area where the experiment took place. 2.3
Testing Spatial Abilities
To assess participants’ spatial abilities, we asked them to take the Santa Barbara Sense-of-Direction (SBSOD) scale and the mental rotation test. The SBSOD scale consists of 15 Likert-type statements with a 7-point scale [13], where participants expressed their agreement to each statement by circling a number from 1 (strongly agree) to 7 (strongly disagree). Seven of the statements show good navigational abilities or tendencies, such as “I am very good at giving directions,” while the remaining eight concern poor navigational abilities or tendencies, such as “I have trouble understanding directions.” The test boasts a high internal consistency and test-retest reliability, and has been used in many studies as a measure of spatial ability. Hegarty et al. [13] showed that people who scored highly on this test were good at updating their orientation and location in space when traversing an environment, making this a reasonable metric to try to correlate spatial ability and wayfinding performance.
Real-Time Classification of Wayfinding Performance
93
The mental rotation test consists of 21 questions that required the participants to mentally rotate line drawings and identify matching drawings with different orientations. Unlike the SBSOD scale, the mental rotation test had a time limit of 6 minutes, requiring participants to react as quickly and accurately as possible. This test is often used in studies of map use in different environments [20]. 2.4
Procedure
At the beginning of the experiment, instructions were given (e.g., do not use your cell phone or other navigational aides during the experiment) and participants answered the SBSOD scale and the mental rotation test. They were then taken individually from the train station to the starting point of the first route. While facing to the north (map was also north-oriented by the experimenter), participants were given a single A4-size map of the area before traversing each route. All participants traversed the two routes in the same order. On each map, only the starting point and the goal were marked, and the maps were not allowed to be written on. Participants were instructed to walk toward the goal and to inform the experimenter when they think that they successfully reached the goal. The experimenter walked approximately 3 m behind the participants and observed their behavior, without providing any assistance. For the first route, if participants got lost and did not reach the goal after 20 minutes had passed, they were taken to the goal by the experimenter. For the second route, participants were given up to 15 minutes to reach the goal. Upon completion of the first route, all participants were taken to the second starting position along a predetermined, fixed path. It took approximately one hour to complete the written and wayfinding tasks.
3 3.1
Wayfinding Classification Decision Points
When analyzing human wayfinding activities, it is useful to consider decision points that wayfinders must traverse when navigating an environment. A decision point is a location in an environment, such as an intersection, where a person navigating has to decide among competing paths to continue their traversal of the environment. Since the wayfinder has to make a decision as to which path to take at a decision point, this requires spatial reasoning. Compared with decision points, wayfinders presumably perform less spatial reasoning between decision points, and so a route taken through an environment can be represented abstractly as a set of decision points encountered on the traversal. 3.2
Metrics
Armed with the computational abstraction of routes in terms of decision points, we next decided on what metrics to use to analyze wayfinding performance. In this work, we considered the following metrics:
94
M. Takemiya and T. Ishikawa
Length. The distance in meters of a given route. Number of turns. A turn was any change in bearing greater than 30◦ . Total angle turned. The angular sum of all changes in bearing along the route from the starting point, up to the current decision point being measured. Average edge length. The length divided by the number of turns plus one. Complexity. We defined route complexity the same as in [4,26], where the complexity of a route was calculated as the sum of the complexity of all decision points along the route. At each decision point, the angle of the incident branch from the previous to the current decision point, the number of branches, and the angles between them were calculated, and complexity was assigned based on heuristics (e.g., angles separated by less than 45◦ were assigned a predetermined complexity value). 3.3
Other Metrics
Specifically for analyzing routes traversed by participants, we also considered two time-related metrics: Time. The time in seconds required for the participant to traverse a route. The maximum allowed value was the time limit for each route. Speed. The average meters travelled per second. Stops. If a participant stopped moving for more than 30 seconds, for example, to study their map, this was recorded as a stop. 3.4
Synthetic Route Generation
In order for classification of wayfinders to be feasible given real-world constraints, we could not use training data from humans, since it is not possible to collect training data for every possible start and goal location for the entire world (nay, even for a single city). Instead, we implemented a heuristic-based method to synthetically generate routes that are likely to be similar to routes that humans would take. As explained in more detail in Section 3.6, since we used a conditional probabilistic scoring function for classification, even though our generated routes were not always perfectly plausible routes that humans would take, we still obtained good results. Since the goal of synthetic route generation was to generate training data for our probabilistic classifier, we needed to generate a large variety of different routes. At first we tried permuting a large number of possible routes from the graph of decision points and their connecting edges, but too many infeasible routes that no human would ever dream of taking were created, thus making it difficult to generate routes that were actually useful for classification. To get around this, we decided to start with a heuristic search algorithm as a baseline to find the shortest-path route and then modified it slightly to generate variations on that shortest route. We used the A* heuristic search algorithm [11], modifying it slightly, to create routes by searching for routes from the starting point to the goal point for each
Real-Time Classification of Wayfinding Performance
95
route used in our empirical study. When searching for the goal, at each decision point A* chooses the edge from the current point that has the lowest heuristic cost. We used Cartesian distance from the current decision point to the goal as the heuristic function, to judge the value of each decision point, with the algorithm trying to minimize distance travelled. Since at each iteration of A*, the algorithm chooses the next path to explore based on the available edges and their heuristic cost values, and since we wanted to create a good diversity of routes around the shortest route, we modified the algorithm so that as each edge was considered, there was a 10% chance that the edge would become “tabu” and thus unable to be used. This had the effect of randomly forcing the algorithm to take sub-optimal routes and created a nice spread where a majority of the generated routes hovered around the shortest route, but occasionally very long routes would also exist. In order to get a good variety of routes, we generated 250 routes. Due to the introduced randomness, both “poor” routes that diverged sharply from the shortest path and “good” routes that were close to the shortest path from start to goal points were generated. To separate the good from the poor, we then clustered the generated routes into two clusters. We used the KMeans++ [1] clustering algorithm with a predefined cluster size of two in order to cluster our synthetically generated routes. From the metrics described in Section 3.2, we examined all combinations of up to two metrics and found that the total angle turned as the metric for clustering provided the best results. Thus for the KMeans++ clustering, distance between routes was calculated based on the difference between the total angles turned, and cluster centers were simply a single total angle turned value. 3.5
Poor Route Generation
Since there was still not enough variation to capture all the poor routes that wayfinders sometimes took, even with randomness included in the synthetic route generation, we purposely created routes to be poor. We did this by inverting the distance heuristic in the A* routing algorithm. This inverse was done by finding the decision point furthest from the goal in our graph of decision points and edges, and then using that maximal distance to subtract all the heuristic values (cartesian distances) from. This had the effect that shorter routes from the goal would show a higher heuristic cost than longer routes, thus inverting the heuristic score. There are other ways this inversion could have been accomplished, but this was chosen as a simple method to avoid negative heuristic values for A*. Inverting the heuristic cost function was the only change that we made to the A* router to purposefully generate poor routes. We used the same method as before of randomly marking edges of the current decision point under consideration as tabu, 10% of the time, in order to create variation in the routes generated. We generated 500 routes. Since some of the poor routes generated were indeed good routes, because of the included randomness, we then compared each poor route generated to the calculated shortest route between the start and goal points. If more than half the
96
M. Takemiya and T. Ishikawa
Good
Poor
Goal
Goal
Start
Start
Fig. 1. Synthetically generated training routes for the classifier, for the first wayfinding route of our empirical study, after clustering and adding the generated poor routes. As can be seen, the good routes typically take more direct paths to the goal.
decision points overlapped or if the length of the generated route was less than 10% greater than the shortest path, the generated poor route was added to the cluster containing good routes, otherwise the route was added to the cluster of poor routes. The final routes generated for our method for the first wayfinding route of our empirical study are shown in Figure 1. Our method here of route generation only considered the length of routes, but future work may want to consider taking into account additional features, such as route complexity [4,8,26] or routing through regions that have similar complexities [25]. Techniques from the field of agent-based modeling could also be applied to synthetically generate routes. 3.6
Probabilistic Classification
The probability that a route belonged to a certain class was computed by summing up all the scores for each decision point. Scores were computed with a conditional probability function and the algorithm determined the class of decision points that was most closely associated by assigning a score to each decision point in a route. A route’s class was then evaluated based on the sum of the scores for all decision points. This is shown in the following equations, adapted from [7,22]. C eval(ri ) > 0 class(ri ) = (1) C otherwise eval(ri ) = score(dj ) (2) j
score(dj ) =
P (dj |C) − P (dj |C ) P (dj |C) + P (dj |C )
(3)
Real-Time Classification of Wayfinding Performance
97
Related to wayfinding, dj is the jth decision point on a route ri to classify. If a route’s score is greater than 0, it is a poor route (class C), otherwise it is good (C , compliment of C). A route’s score is defined as the sum of the scores of all its decision points. Decision point scores are calculated from Equation 3. P (dj |C) is a probability function that calculates the probability of decision point dj given class C. This is calculated by taking the frequency of a decision point dj in routes of a given class, and dividing it by the total number of decision points in the given class, multiplied by their respective frequencies. Thus the score will always be in the range [-1, 1]. For example, if class C represents good wayfinders and class C represents everyone else, if the sum of the scores for all decision points in a route is greater than zero, the route is in the good group. This is very easy to conceptualize and is one of the advantages of using probabilistic classification. Also, as will be discussed with respect to real-time classification, the scoring function used allows a great amount of leeway in dealing with missing information, since it can make a classification given as few as one decision point, albeit accuracy will suffer when not enough data are available. Since probabilistic classification is dependent on the generated routes in order to decide whether a given route is good or poor, the algorithm, without modification, cannot assign values to decision points that are not included in any of the synthetically generated routes used to train the classifier. This would happen if a wayfinder took an unusual route that deviated greatly from any of the variety of routes that were synthetically generated. In this case, it is fairly safe to say that the wayfinder is in need of some help and should thus be classified as poor. To facilitate this, we slightly modified the score function to assign a slightly negative weight to decision points that do not appear in any of the training data. Thus if a wayfinder were to deviate drastically from the synthetically generated routes, our algorithm is capable of making a reasonable classification. 3.7
Real-Time Classification
In order for classification of wayfinders to be useful, it needs to take place in realtime, while a wayfinder is navigating. By “real-time classification,” what we mean is that given only data from the starting point up until the current point in space that a wayfinder has traversed, a classification of wayfinding performance can be made. We accomplished this using the same method of conditional probabilistic classification as discussed in Section 3.6. Even though some data are needed to obtain an accurate classification, this can still be done in real-time while a wayfinder is traversing a route. For many cases it is expected that a reasonably accurate prediction can be obtained after a participant has traversed only a small percentage of a route (see Section 4.6). We simulated real-time classification by taking traversal data from our empirical study and limiting the data available for classification to mimic a real-time situation.
98
3.8
M. Takemiya and T. Ishikawa
Participant Labeling Based on Wayfinding Performance
One of the problems of determining the performance of our classifier is not having ground truth labels that say whether or not a participant’s route traversal is good or poor. A human looking at the data can judge which routes appear to be better than others, but this is not necessarily a repeatable criteria for determining performance. To test the efficacy of our classification method, we labeled participants as having poor performance if they: 1. Failed to reach the goal. 2. Took a route that involved less than half of the decision points located along the shortest route.
Route 1
Good Wayfinders
Poor Wayfinders
Goal
Goal
Start
Route 2
Good Wayfinders
Start
Start
Poor Wayfinders
Goal
Start
Goal
Gray Buildings © 2008 Zenrin © 2011 Zenrin © 2010 Google
Fig. 2. Route traversals for all participants. Section 3.8 explains good and poor labels.
Meeting either of these conditions resulted in a participant’s traversal of a route in the experiment being labeled “poor,” otherwise a participant’s traversal was labeled as “good.” This label was called the “ground truth,” and is abbreviated GT in the following analysis and discussion.
Real-Time Classification of Wayfinding Performance
99
It should be noted that these criteria, although reasonable given the structure of the two routes in our experiment, are somewhat arbitrary and are by no means universal criteria for labeling wayfinding performance, though for this study, we determined that these criteria produced labels that seemed correct. This highlights the need to quantitatively evaluate a person’s route traversal, but due to the somewhat fuzzy nature of this evaluation, this is not likely to be a trivial task; thus we did not attempt to solve this problem definitively. Our classifications of participants for both routes are shown in Figure 2, so we will let the reader judge our attempt at labeling the performance of wayfinders.
4 4.1
Results Spatial Ability Test Results
For the SBSOD scale, answers for statements showing good abilities were reversed so that high scores corresponded to better spatial abilities. After this was accomplished, SBSOD scores ranged from 2.2 to 6.2, with a mean of 3.9. It should be noted that the lowest and highest possible scores for the SBSOD test are 1 and 7, respectively. The mental rotation test was scored by giving one point for each correctly identified item and subtracting one point for each incorrectly identified item, and the scores ranged from -3 to 42, with a mean of 28.6. Of the 30 participants in our study, only 14 finished answering all the questions on the mental rotation test within the 6-minute time limit. 4.2
Route Performance
Traversal data from all participants are shown in Figure 2. Participants’ times on the first route ranged from 10 minutes and 20 seconds to 20 minutes, with a mean of 14 minutes and 54 seconds. For the second route, traversal times ranged from 5 minutes and 52 seconds to 15 minutes, with a mean time of 7 minutes and 59 seconds. For both routes, 28 of 30 participants successfully reached the goal. Those who did not reach the goal successfully used the fully allotted time of 20 and 15 minutes, respectively, for the first and second routes. Traversal lengths ranged from 1050 to 1976 m, with a mean of 1229 m for the first route. For the second route, participants’ traversal lengths ranged from 560 to 1412 m, with a mean of 645 m. Looking only at the differences in traversal lengths and times, it is apparent that profound individual differences exist in wayfinding performance. For the first route, 21 participants had the ground truth label of good, 9 poor. For the second route, 16 were good, 14 poor. Half of all the participants (15) had the same ground truth label for both routes. The first route required participants to travel much farther (1050 versus 560 m, for the shortest routes). The average complexity values of the participants’ traversals were 3.81 and 3.13, respectively, for the first and second routes (a difference of over 20%), even though the first route was almost twice as long, for the shortest route. Good and poor wayfinders for the first route had an average traversal length of 1168 m and 1396 m, respectively. For the second route, good wayfinders travelled an average of 577 m, while poor wayfinders travelled 722 m.
100
4.3
M. Takemiya and T. Ishikawa
Metric Correlations
From the empirical data from our 30 participants, we computed the Pearson correlation between all of our metrics. The results are shown in Figure 3.
Fig. 3. Metric correlations. Non-white squares are significantly correlated p < .05.
Our results show the only significant correlations including the SBSOD scale were with average speed of traversal and, in the case of route 1, with time. The ground truth scores were significantly correlated with time, but not speed. The lack of significant correlations between the mental-rotation score and any other metric was somewhat surprising, as the wayfinding task involved map reading. Total angle turned for route 2 was significantly negatively correlated with its corresponding ground truth (p < .01), but this was not the case with route 1 and its ground truth. This significant correlation for the second route between
Real-Time Classification of Wayfinding Performance
101
the total angle turned and the ground truth, may show support for a strategy of minimizing deviation from the perceived goal direction [6,15]. The reason that this was not significantly correlated with the first route is likely due to structural differences between the two routes. Frankenstein et al. [9] discussed that wayfinders have a preference for longer lines of sight as well as higher numbers of visible branching paths. For the second route it is likely that wayfinders could get longer lines of sight while minimizing their angle turned. However, the first route’s environment likely precluded wayfinders from achieving both these criteria without taking a longer, more arduous traversal to the periphery, where participants could walk in straighter lines. Also, in support of this, the average edge length is significantly negatively correlated with the ground truth score for route 1, but positively correlated for route 2 (p < .01 for both). This shows that for route 1, better wayfinders took shorter average edges before turning, whereas for route 2, good wayfinders took longer edges. The average edge length metric does not necessarily take into account actual line of sight in the environment, but comparing the first and second route environments, the environment of the first route seems to have more short roads compared to the second environment. Route complexity was significantly correlated with ground truth performance for the second route (p < .01), but not the first, thus further suggesting the extent to which the environments for routes 1 and 2 were different. The positive correlation shows that better wayfinders took more complex traversals for the second route. 4.4
Measuring Classification Performance
In order to evaluate our wayfinding performance classifiers, we used precision, recall, and F-score, common metrics in information extraction. Precision quantifies “how precise,” or accurate, the results are, and is defined as:
precision =
# correctly classified routes # correctly classified + # incorrectly classified routes
(4)
Recall measures “how many” out of the total number of routes that are labeled with a category are correctly classified as that category (e.g., good and poor). Recall is defined as: recall =
# correctly classified routes # routes in the category
(5)
Since precision and recall are often inversely related to each other, the harmonic mean between the two is computed. This is called the F-score and is a good overall measure of the performance of the classifier: F-score =
2 ∗ precision ∗ recall precision + recall
(6)
102
4.5
M. Takemiya and T. Ishikawa
Probabilistic Route Classification
Before attempting to classify route traversals in real-time, we first used probabilistic classification to classify complete routes from our participants. For the good and poor classifications, we based the correctness of the classification on the SBSOD score and ground truth labels. Our threshold score for the SBSOD score was 4.0 (inclusive; the midpoint of the 7-point scale), up to and including which, participants were classified as having a poor ability, and thus more likely in need of assistance when navigating environments. Scores above 4.0 were considered to be good. The “GT good” and “GT poor” labels were determined as in Section 3.8. F-scores from the overall classification, averaged over 100 runs, with respect to good, poor, GT good, and GT poor, are shown in Figure 4. The results with respect to the SBSOD score were fairly poor. F-scores for the poor classification never rose above 0.55, while the good classification was 0.65 and 0.54 for the first and second routes, respectively, which is not much higher than chance. Performing the same classification with respect to the GT labels yielded Fscores of 0.99 for the first route and 0.97 for the second route, for both good and poor classifications. Thus if the GT labels are accepted as the measure of wayfinding performance, the results demonstrate the feasibility of using a conditional probabilistic scoring function to classify wayfinding performance. Precision
Recall
F-Score
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Poor
Good
GT Poor
Route 1
GT Good
Poor
Good
GT Poor
GT Good
Route 2
Fig. 4. Classification of complete routes traversed by participants. Results are averaged from 100 runs of the program.
4.6
Real-Time Probabilistic Route Classification
Figure 5 shows the results of our real-time route classification. As with the overall route classification, the classification with respect to GT outperformed classification with respect to the SBSOD scale. Whereas overall classification used all the decision points from participants’ traversals, the real-time classification used only decision points from the starting point up to the current point where the classification was performed. We simulated real-time by taking the participants’
-:JVYL=HS\L
+LJPZPVU7VPU[Z
+LJPZPVU7VPU[Z
9V\[L9LHS;PTL*SHZZPÄJH[PVU^Y[.YV\UK;Y\[O
-:JVYL=HS\L -:JVYL=HS\L
5\TILYVM7HY[PJPWHU[Z 5\TILYVM7HY[PJPWHU[Z
+LJPZPVU7VPU[Z
9V\[L9LHS;PTL*SHZZPÄJH[PVU^Y[.YV\UK;Y\[O
+LJPZPVU7VPU[Z
9V\[L9LHS;PTL*SHZZPÄJH[PVU^Y[:):6+
Fig. 5. Real-time route classification. The x-axis numbers the nth decision point, where only decision points 1 to n are classified. The bars show the number of participants to classify at each iteration, which decreases as participants reach the goal and stop their traversals. Plotted F-scores are averaged from 100 runs.
-:JVYL=HS\L
5\TILYVM7HY[PJPWHU[Z 5\TILYVM7HY[PJPWHU[Z
9V\[L9LHS;PTL*SHZZPÄJH[PVU^Y[:):6+
Real-Time Classification of Wayfinding Performance 103
104
M. Takemiya and T. Ishikawa
routes and iterating through all the decision points used, starting at the first point and continuing until the maximum point (35 and 25, for the first and second routes, respectively) was reached. Since some participants took routes with fewer decision points than others, Figure 5 also shows the number of participants classified at each iteration, as the bar graph behind the F-score line graphs. After several decision points were available, classification performance increased and was comparable to the overall classification results (Figure 4). Again, classification with respect to the SBSOD scale was poor, but the GT classification became very good after three decision points for the first route and seven decision points for the second route. As participants began to finish (and thus were no longer classified), the classification performance became more erratic and was no longer consistently good, although the performance eventually improved later on. This was likely caused by the remaining wayfinders being somewhat harder to classify. In some cases the wayfinders came close to or actually reached the goal and then continued to walk on past it, so decision points near the goal were likely to have confused the classifier until the wayfinder actually went far enough from the goal in order to be correctly classified.
5 5.1
Discussion Relating Spatial Abilities with Wayfinding Performance
One objective of our work was to study the relationship between spatial abilities, as measured by the SBSOD and mental-rotation scores, and wayfinding performance, measured by our metrics (length, total angle turned, etc.). Speed was the only wayfinding performance metric significantly correlated with the SBSOD for both routes. Speed can actually be thought of as incorporating several aspects of wayfinding, spanning physical, emotional, and spatial abilities. Physically, different people are in different physical condition and can thus travel more or less faster than others. People also have different heights and leg lengths, which may affect the natural pace at which a person walks. Emotionally, people who are more confident in their ability to navigate are likely to go faster. With regards to abilities, people with better spatial abilities are likely to not have to slow down in order to ponder where they are or where they should go next. Thus the self-reported abilities in the SBSOD seem to be a good indicator of some overarching factors that temporally influence wayfinding. With respect to classification, as shown in Figures 4 and 5, classifying wayfinders with respect to the SBSOD scale resulted in low classification performance, with the highest F-score still below 70%. Since our work empirically studied functional wayfinding performance in an outdoor environment, the SBSOD may not have measured spatial abilities relevant to our task. People who have a high SBSOD or mental-rotation score tend to do better on “survey” tasks about routes or map use in an environment, thus it is likely that our wayfinding task in a novel environment did not use enough of these skills to be relevant.
Real-Time Classification of Wayfinding Performance
5.2
105
Real-Time Performance Classification
In this work in addition to demonstrating that wayfinders can be reliably classified after completing route traversal, we have also shown the efficacy of classifying wayfinding performance using only information available during route traversal. The fact that wayfinding performance can be classified during real-time traversal using only a relatively small amount of information is quite revealing in that is suggests: a) that decision points alone contain enough information to reliably classify wayfinding performance, and b) not all decision points contain the same amount of information. At a functional level of abstraction, decision points are the places in an environment where a wayfinder must make a decision before route traversal can continue. Reasoning using decision points alone is computationally convenient, since models of wayfinding can ignore many specifics such as time, and also areas between decision points do not have to be taken into account. In reality, our method ignores quite a large amount of information about route traversals, such as landmarks, average speed, non-structural information about the environment, information about the environment between decision points, and even ordering of the decision points themselves. Including this various information may likely increase classification performance, but only including information about the existence of decision points in a traversal still provided enough information for classification, with respect to ground truth wayfinding performance. This supports findings such as Hillier and Iida [14] that suggest the geometrical and topological structure of an environment has a larger role to play than metric properties such as distance. Anecdotally, our findings also lend credence to Simon’s thesis that the complex behavior that humans exhibit is in part a reflection of the complexity of the environment [27], since human behavior can be classified in terms of environmental features. The viability of real-time classification also suggests that there are decision points with high salience (e.g., with respect to information gain for classification). This can be clearly seen in Figure 5, where for the first route, participants were reliably classified (GT good and GT poor) given only the first three decision points, whereas seven were required for the second route, to reach a comparable level. This is likely caused by the structure of the environment, since whereas most good and poor wayfinders deviated at the second or third decision points for the first route, the second route did not have a clear divergence until the fifth decision point (counting the starting point for each route). This can be seen on the graph for classification of route 2 with respect to GT, where points 5 and onward have an increasing score classifying poor wayfinders. These results in part mirror Golledge [10] in that the structure of the environment played a large role in wayfinders’ route choice. Since our probabilistic classifier scored decision points based on frequency in the generated routes, it is likely that the early decision points traversed by our participants were present in more good than poor synthetically generated routes, thus classifying all wayfinders as “good” until enough decision points were processed to calculate a more correct score. This delay is in part dependent on the
106
M. Takemiya and T. Ishikawa
structure of the environment. For environments where many possible routes can be taken from the starting point, as was the case with the first route, a correct classification seems to need very little data, since good and poor wayfinders diverge early on. For environments as in the second route, where the wayfinders took the same route at the beginning and then diverged later on, a correct classification takes longer, since the data are ambiguous early on. This is one of the weaknesses of our method and ways to address this should be explored in future work, such as giving higher weights to decision points that may be more important. That being said, this is not that serious of a problem since this work is focused on finding people with outlying performance who need help. At the beginning of the second route, due to the structure of the environment, almost every wayfinder (with the exception of one person) traversed the same first four decision points. Thus it seems reasonable that people with outlying performance cannot be found until outlying performance actually begins to be demonstrated. The fact that wayfinders can be consistently classified based on performance shows support for large individual differences in wayfinding ability. Studying this more, perhaps with the aim of quantifying or qualitatively specifying the relationship between salience of decision points (i.e., further analysis of environmental structure) and the impact upon wayfinding performance is a likely next step for future work. 5.3
Implications for Electronic Navigational Aides
Due to the apparent efficacy of real-time wayfinding classification, applications to navigational assistance are interesting to consider. Implementation of our method may enable navigational software, for instance, to automatically adapt to the ability and needs of the user by changing the way information is presented. Although future work will have to identify the specifics of this, if done correctly, adapting the presentation of information to the individual abilities of users could have a positive impact on both the efficiency with which routes can be travelled and the enjoyment that users have when performing wayfinding activities. The importance of adapting to individual needs has been shown in that people who have different abilities tend to employ different strategies when navigating [19] and also men [5] and older people [2] tend to prefer cardinal coordinate systems. Cognitively impaired individuals also require wayfinding information to be displayed differently [21]. The efficacy of verbal versus spatial guidance has also been shown to differ with abilities [3]. Our results suggest that determining whether wayfinders need help or not is unlikely to be able to be determined a priori, since fully half our participants did not have the same ground truth label on both routes. This means that users who have trouble navigating at one time may or may not have trouble later on. It is likely that the need for wayfinding assistance is largely determined by the environment. In addition to adaptation, real-time classification of wayfinders based on decision points could also be used to improve knowledge about traveled routes.
Real-Time Classification of Wayfinding Performance
107
Ishikawa et al. [16] showed that GPS users suffer from poor configurational knowledge of routes, when compared to traditional paper map users. Decision points with a high salience for classification may naturally have a high salience in the environment. Thus bringing these high-salience decision points to the attention of wayfinders using electronic navigation aides, such as GPS devices, during navigation, may help to improve knowledge about routes. An approach like this could also consider landmark information, similar to [24]. Exploring the merits of this is left to future work.
6
Conclusions
We conducted an empirical study to explore the relationship between spatial abilities and wayfinding performance. Although tests such as the SBSOD scale and the mental rotation test attempt to quantify spatial abilities relating to survey understanding of environments, scores from our participants seem to have been only weakly related to wayfinding performance. Our classification of wayfinders’ performance with respect to actual route performance (i.e., ground truth) showed the importance of environmental structure, since we performed probabilistic classification using only existential information about decision points traversed on a route, without even including ordering information. This is not to say that non-structural aspects are not important to wayfinding, but it does highlight the importance of environmental structure on human spatial cognition. Furthermore, we were able to classify wayfinders consistently given only the first few decision points of a traversal, suggesting that certain decision points in an environment may play key roles in wayfinding and that focusing on these high-salience points may help to specify which parts of an environment humans use or ignore, while taking into account individual differences. Due to the apparently weak correlation between the SBSOD and mentalrotation scores and human wayfinding performance, finding good ways to quantify human spatial abilities and how they relate to wayfinding performance has ample room for future work. The SBSOD and mental-rotation scores are typically related to configurational understanding of routes, which we did not study explicitly here. Thus other ways of measuring abilities related to functional route performance should be studied. One possible direction of study may be to use non-invasive neuro-imaging techniques to quantitatively measure brain activity. For example, recent studies of neuronal activity have shown that for virtual-environment navigation, the entorhinal cortex likely processes information about the current wayfinding context (e.g., “I am facing north” or “I just turned left”) [18]. Thus one approach could be to have participants navigate a virtual environment in an fMRI scanner and then to analyze brain activity data to determine quantitative measures of spatial abilities that relate to wayfinding in particular. This may help determine important neurophysiological aspects involved in wayfinding.
108
M. Takemiya and T. Ishikawa
Acknowledgements. We would like to thank Kai-Florian Richter and Masako Tamaki for reviewing a draft of this manuscript and four anonymous reviewers for the thoughtful comments. Thanks are also due to our study participants, especially from the Advanced Telecommunications Research Institute International and Nara Women’s University.
References 1. Arthur, D., Vassilvitskii, S.: K-Means++: The Advantages of Careful Seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007) 2. Baldwin, C.: Individual Differences in Navigational Strategy: Implications for Display Design. Theoretical Issues in Ergonomics Science 10(5), 443–458 (2009) 3. Baldwin, C., Reagan, I.: Individual Differences in Route-Learning Strategy and Associated Working Memory Resources. Human Factors: The Journal of the Human Factors and Ergonomics Society 51(3), 368–377 (2009) 4. Bojduj, B., Weber, B., Richter, K.F., Bertel, S.: Computer Aided Architectural Design: Wayfinding Complexity Analysis. In: 12th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2008, pp. 919–924. IEEE, Los Alamitos (2008) 5. Dabbs, J., Chang, E., Strong, R., Milun, R.: Spatial Ability, Navigation Strategy, and Geographic Knowledge among Men and Women. Evolution and Human Behavior 19(2), 89–98 (1998) 6. Dalton, R.: The Secret is to Follow Your Nose. Environment and Behavior 35(1), 107 (2003) 7. Dave, K., Lawrence, S., Pennock, D.M.: Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In: Proceedings of the 12th International Conference on World Wide Web, WWW 2003, pp. 519–528. ACM, New York (2003) 8. Duckham, M., Kulik, L.: “Simplest” Paths: Automated Route Selection for Navigation. In: Kuhn, W., Worboys, M.F., Timpf, S. (eds.) COSIT 2003. LNCS, vol. 2825, pp. 169–185. Springer, Heidelberg (2003) 9. Frankenstein, J., B¨ uchner, S., Tenbrink, T., H¨ olscher, C.: Influence of Geometry and Objects on Local Route Choices during Wayfinding. In: H¨ olscher, C., Shipley, T., Olivetti Belardinelli, M., Bateman, J., Newcombe, N. (eds.) Spatial Cognition VII. LNCS, vol. 6222, pp. 41–53. Springer, Heidelberg (2010) 10. Golledge, R.: Path Selection and Route Preference in Human Navigation: A Progress Report. In: Kuhn, W., Frank, A. (eds.) COSIT 1995. LNCS, vol. 988, pp. 207–222. Springer, Heidelberg (1995) 11. Hart, P., Nilsson, N., Raphael, B.: A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics 4(2), 100–107 (1968) 12. Hegarty, M., Montello, D., Richardson, A., Ishikawa, T., Lovelace, K.: Spatial Abilities at Different Scales: Individual Differences in Aptitude-Test Performance and Spatial-Layout Learning. Intelligence 34(2), 151–176 (2006) 13. Hegarty, M., Richardson, A., Montello, D., Lovelace, K., Subbiah, I.: Development of a Self-Report Measure of Environmental Spatial Ability. Intelligence 30(5), 425–447 (2002)
Real-Time Classification of Wayfinding Performance
109
14. Hillier, B., Iida, S.: Network and Psychological Effects in Urban Movement. In: Cohn, A.G., Mark, D.M. (eds.) COSIT 2005. LNCS, vol. 3693, pp. 475–490. Springer, Heidelberg (2005) 15. Hochmair, H., Frank, A.U.: Influence of Estimation Errors on Wayfinding-Decisions in Unknown Street Networks–Analyzing the Least-Angle Strategy. Spatial Cognition and Computation 2, 283–313 (2002) 16. Ishikawa, T., Fujiwara, H., Imai, O., Okabe, A.: Wayfinding with a GPS-Based Mobile Navigation System: A Comparison with Maps and Direct Experience. Journal of Environmental Psychology 28(1), 74–82 (2008) 17. Ishikawa, T., Montello, D.: Spatial Knowledge Acquisition from Direct Experience in the Environment: Individual Differences in the Development of Metric Knowledge and the Integration of Separately Learned Places. Cognitive Psychology 52(2), 93–129 (2006) 18. Jacobs, J., Kahana, M.J., Ekstrom, A.D., Mollison, M.V., Fried, I.: A Sense of Direction in Human Entorhinal Cortex. Proceedings of the National Academy of Sciences 107(14), 6487–6492 (2010) 19. Kato, Y., Takeuchi, Y.: Individual Differences in Wayfinding Strategies. Journal of Environmental Psychology 23(2), 171–188 (2003) 20. Liben, L., Downs, R.: Understanding Person-Space-Map Relations: Cartographic and Developmental Perspectives. Developmental Psychology 29(4), 739–752 (1993) 21. Liu, A., Hile, H., Borriello, G., Brown, P., Harniss, M., Kautz, H., Johnson, K.: Customizing Directions in an Automated Wayfinding System for Individuals with Cognitive Impairment. In: Proceeding of the Eleventh International ACM SIGACCESS Conference on Computers and Accessibility, pp. 27–34. ACM, New York (2009) 22. Liu, B.: Web Data Mining. Springer, Heidelberg (2007) 23. Montello, D.: Navigation. In: The Cambridge Handbook of Visuospatial Thinking, vol. 18, pp. 257–294 (2005) 24. Raubal, M., Winter, S.: Enriching Wayfinding Instructions with Local Landmarks. In: Egenhofer, M., Mark, D. (eds.) GIScience 2002. LNCS, vol. 2478, pp. 243–259. Springer, Heidelberg (2002) 25. Richter, K. F.: Adaptable Path Planning in Regionalized Environments. In: Hornsby, K.S., Claramunt, C., Denis, M., Ligozat, G. (eds.) COSIT 2009. LNCS, vol. 5756, pp. 453–470. Springer, Heidelberg (2009) 26. Richter, K. F., Weber, B., Bojduj, B., Bertel, S.: Supporting the Designer’s and the User’s Perspectives in Computer-Aided Architectural Design. Advanced Engineering Informatics 24(2), 180–187 (2010) 27. Simon, H.: The Sciences of the Artificial. The MIT Press, Cambridge (1996)
From Video to RCC8: Exploiting a Distance Based Semantics to Stabilise the Interpretation of Mereotopological Relations Muralikrishna Sridhar, Anthony G. Cohn, and David C. Hogg University of Leeds, UK {krishna,agc,dch}@comp.leeds.ac.uk
Abstract. Mereotopologies have traditionally been defined in terms of the intersection of point sets representing the regions in question. Whilst these semantic schemes work well for purely topological aspects, they do not give any semantic insight into the degree to which the different mereotopological relations hold. This paper explores this idea of a distance based interpretation for mereotopology. By introducing a distance measure between x and y, and for various Boolean combinations of x and y, we show that all the RCC8 relations can be distinguished. We then introduce a distance measure which combines these individual measures which we show reflect different paths through the RCC8 conceptual neighbourhood – i.e. the measure decreases/increases monotonically given certain monotonic transitions (such as one region expanding). There are several possible applications of this revised semantics; in the second half of the paper we explore one of these in some depth – the problem of abstracting mereotopological relations from noisy video data, such that the sequences of qualitative relations between pairs of objects do not suffer from “jitter”. We show how a Hidden Markov Model can exploit this distance based semantics to yield improved interpretation of video data at a qualitative level.
1
Introduction
Mereotopologies have traditionally been defined in terms of the intersection of point sets representing the regions in question. This is true for both RCC8[14], the 4- and 9- intersection calculi[3,6] and indeed many other mereotopologies covered in [2]. Alternatively, Galton [8] gives a semantics in which the eight RCC relations are distinguished by whether all, some or none of x is inside y or not, and vice versa, and whether there are shared boundary points or not. Whilst these semantic schemes work well for purely topological aspects, they do not give any semantic insight into the degree to which the different mereotopological relations hold. The authors in [1] and [5] provide a way of describing
The financial support of the EU Framework 7 project Co-friend (FP7-ICT-214975), and the DARPA Mind’s Eye program (project VIGIL, W911NF-10-C-0083) is gratefully acknowledged, the latter also supplying “person-ball verb” videos.
M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 110–125, 2011. c Springer-Verlag Berlin Heidelberg 2011
From Video to RCC8: Exploiting a Distance Based Semantics
111
relations between regions which have uncertain boundaries. However, neither provides an alternative semantics for mereotopology, nor do they address the issue of learning a robust transformation function from metric to qualitative relations. For EC [2], and PO [9], more granular calculi have been designed which distinguish the degree to which these relations hold; however, to our knowledge, the other RCC8 relations have not been so refined. In any case these refinements are at the calculus level (rather than the semantic level), and are discretised (a finite number of refinements) and with no metric on the refinements. For the other relations, the degree of holding has not been covered at all in the semantics of the relationship (though there are fuzzy versions of RCC[16]). This paper explores this idea of a distance1 based interpretation for mereotopology. By introducing a distance measure between x and y, and for various Boolean combinations of x and y, we show that all the RCC8 relations can be distinguished. We then introduce a distance measure which combines these individual measures which we show reflects different paths through the RCC8 conceptual neighbourhood graph (CNG) – i.e. the measure decreases/increases monotonically given certain monotonic transitions (such as one region expanding) as previously discussed in [4]. There are several possible applications of this revised semantics. First, we note that [11] present a way of approximating constraint satisfaction in reduced expressivity constraint networks by using the idea of a relation ‘almost’ holding – the semantics presented here would fit well with this technique. A second application, which we explore more extensively, is an application to abstracting qualitative spatial relations from video data. In video data, objects are frequently represented by shape abstractions such as minimum bounding rectangles (MBR). However, visual noise and other errors introduced by video processing frequently result in instability in mereotopological relations over time, when these are defined point-set theoretically – i.e. ‘jitter’ can result as relations change frequently depending on the exact position and size of the MBR. We show how using a distance based metric can result in a much more stable qualitative spatial abstraction; the distance based semantics can be used to decide when to transition between relations. We show that good transition points can be learnt automatically by training an HMM [13]. The impact of this improved technique for abstracting qualitative spatial relations from noisy video data can be demonstrated in a procedure to learn event classes from video data. The rest of the paper is structured as follows. Section 2 introduces the proposed distance based semantics for RCC. Section 3 proposes a way of using this semantics within a HMM based framework, in order to handle noise in video. Section 4 describes experiments on real video data for validating the effectiveness of the proposed approach in handling noise, in relation to the traditional 1
In this paper, we refer to distance as a numerical description of how proximal regions are, without necessarily assuming it is a metric. What is important, as described below, is that it captures certain monotonicity properties over ther conceptual neighbourhood graph.
112
M. Sridhar, A.G. Cohn, and D.C. Hogg
approaches. In sections 5 and 6 we summarize the work, and point out certain limitations that provide insights into interesting directions for future research .
2
A Distance Based Semantics for RCC
In [14], the binary primitive C(x, y), x is connected to y was introduced with the semantics that the closure of the region x shares a point with the closure of region y. From this primitive C(x, y), the eight jointly exhaustive and pairwise disjoint relations of RCC8 can be defined: = {DC, EC, PO, TPP, NTPP, EQ, TPPi, NTPPi} The 4-intersection model of [6] from which an essentially equivalent set of eight relations can be derived is defined in terms of examining the patterns of intersection between the interior and boundary point sets of a pair of regions x and y: each relation is characterised by a particular combination of ∅ and ¬ symbols, denoting empty and non-empty intersections respectively. y
Fig. 1. The RCC8 relations x (green) and y (orange) along with their conceptual neighbourhood. The six circles relating the various Boolean combinations of x and y are also depicted when they have non zero diameter.
In this section, we introduce an alternative semantics for RCC8 (and thus effectively also for Egenhofer’s relations). For the sake of simplicity, we restrict our analysis to rectangular one-piece regions with no holes. We discuss the limitations of the proposed framework for other shapes in section 6 pointing to possible ways of generalizing our current approach. We confine our attention
From Video to RCC8: Exploiting a Distance Based Semantics
113
here to rectangles aligned to two orthogonal axes, which naturally correspond to rectangular bounding boxes, that are obtained using low video analysis. It is worth noting that addressing the noise arising from low level video analysis has partly inspired the proposed approach. Our point of departure from previous work is to note that the standard semantics says nothing about how far apart two regions are when they are disconnected (DC). In much earlier work, a CanConnect(x, y, z) relation was introduced [10] which holds when the rigid body x is sufficiently large so as to be able to connect regions y and z if translated into a suitable position. This gives rise to the idea of measuring the degree of disconnection between x and y by a third region. We wish to choose a canonical shape region for this, and the obvious choice is the n-sphere for n-dimensional space. Although RCC can be interpreted in arbitrary dimensions, for the sake of simplicity we restrict our attention in this paper to 2D, so we thus measure the degree of disconnectedness between a pair of regions x and y by the smallest circle which can connect them. We will call this circle c1 (x, y). As x and y approach each other, the diameter of c1 (x, y) will decrease until they become EC, and the diameter of c1 (x, y) is zero2 . Inspired by this, we now introduce further circles to provide a measure for the other RCC8 relations. Considering the RCC conceptual neighbourhood, the next relation to hold after EC is PO. A circle c2 (x, y) being the largest circle in the intersection of x and y neatly captures the degree to which x and y partially overlap – as x and y transform/translate towards TPP/TPPi/EQ, so the diameter of c2 (x, y) will increase3 . If we now consider the value of the expression |c1 (x, y)| − |c2 (x, y)|, (where |...| denotes the diameter of the circle) then it can easily be seen that it will start off positive, reduce to 0 when EC(x, y) holds, and then become negative for all the other relations. To distinguish all eight relations, we need to introduce further measurements. We can do this by considering the other Boolean combinations of x and y. Thus c3 (x, y) denotes the smallest circle which can connect y to the complement of x; dually c4 (x, y) denotes the smallest circle which can connect x to the complement of y; c5 (x, y) denotes the largest circle in the region x − y and c6 (x, y) denotes the largest circle in y − x. These circles are all depicted for the RCC8 relations in figure 1. To show that these six circles are sufficient to distinguish all the RCC8 relations, consider Figure 2. By inspection of the columns labelled by c1 (x, y) − c6 (x, y), it can be seen that each row is unique, and thus the RCC8 relation which holds can be determined by inspection of these six circles and whether their diameter is zero or non-zero, in just the same way as the 4-intersection model 2 3
Technically a circle has to have a non-zero diameter. But for convenience, here we refer to a point as a circle of diameter 0. It may be noted that in some cases, the diameter of c2 and (other circles defined below) may not change for prolonged periods (e.g. if we were to take two rectangles of the same height and translate one of them horizontally inside the other). However, since the principal purpose of the work is give more information near the relation boundaries, it is sufficient that the diameter of these circles changes significantly near these boundaries.)
114
DC EC PO TPP NTPP EQ TPPi NTPPi
M. Sridhar, A.G. Cohn, and D.C. Hogg
c1(x,y) c2(x,y) c3(x,y) minC(x,y) maxC(xy) minC(Ͳx,y) + 0 0 0 0 0 0 + 0 0 + 0 0 + 0 0 + 0 0 + 0 0 + +
c4(x,y) c5(x,y) c6(x,y) minC(x,Ͳy) maxC(xͲy) maxC(yͲx) 0 + + 0 + + 0 + + 0 0 + + 0 + 0 0 0 0 + 0 0 + 0
Fig. 2. A table showing whether the diameter of the six circles are 0 or non zero for the RCC8 relations. minC(x, y) denotes a minimum sized circle which can connect x and y and maxC(x) denotes the maximal sized circle which can fit into x.
allows the eight Egenhofer relations to be distinguished. Here we require eight values to characterise each relation rather than the four in the 4-intersection model (though less than the nine of the 9-intersection model). This thus gives an alternative way of defining the standard set of eight mereotopological relations, which differs from the RCC8 definitions based on C(x, y), or the 4-/9-intersection model, or indeed the modal semantics found e.g. in [15]. This may have some theoretical interest, but our purpose in defining this semantics was to address a problem arising in abstracting RCC8 relations from video (in fact, the problem also occurs even in the simpler RCC-5 calculus – see our earlier work in which we briefly outline a much simpler version of the approach here for RCC-5[19,18]). Typically in video interpretation, objects/blobs are identified and then tracked, such that a unique identifier can be associated with the different positions of the object at different times. However, it is not just the position of the object which can change, the shape can change too, either because it actually changes shape (as in the case of a person changing their posture), or because, in the image plane, an object appears to get larger as it moves closer to the camera, or because of visual noise which results in the the object detection software assigning a different shape to the object in different frames. It is this final problem which is of particular concern to us, since the changes are not “real changes” but rather artefacts of the software system. Often the size/position of the object will change rapidly from frame to frame, resulting in “jitter”. Such problems are well known and endemic in computer vision. The use of shape abstraction primitives, such as bounding boxes, or the convex hull of an object can help alleviate these problems but the issue still remains. The problem of interest to us here is when this jitter causes undesirable changes of spatial relation – for example when two objects approach each other, the RCC8 relation between the bounding boxes may not simply transition from DC to EC and then to PO following the arcs in the RCC8 conceptual neighbourhood. Rather, there is likely to be a jittering of relations, such as DC, EC, PO,
From Video to RCC8: Exploiting a Distance Based Semantics
115
EC, DC, PO, EC, PO, DC, EC, PO (where each of these relations indicates that it holds over some maximal interval with no intervening relations). There are a variety of computer vision smoothing techniques (such as a Kalman filter [20]) which can be applied to the tracks and shape abstractions which can help reduce such jitter; however we have not found these to be satisfactory, possibly because they are not specifically aimed at the discretisations of a qualitative calculus, but generally simply aim to smooth in continuous spaces such as are generally found in low level computer vision representations. Our approach, detailed in the second half of this paper, is to try to learn when to transition from one relation to another. We do this by training a Hidden Markov Model (HMM) with one state for each relation. In order to build such an HMM, it is convenient to have a variable which can be used to assess which state holds. The idea of using the distance based semantics for RCC8 is attractive in this regard since it allows the possibility of combining the different circle measures to produce an overall measure which can be used to decide when to transition from each relation; a vector of measures could also be used, e.g. taking each circle diameter individually as in input, but this makes the learning task more difficult. So we turn to the question of how to combine the circular measures to provide an appropriate value for the HMM. Consider the path through the conceptual neighbourhood DC, EC, PO, TPP, NTPP – this can be seen in the first five rows of the table in figure 2. Initially c1 , c5 , c6 all have positive diameter whilst the others have zero diameter. Then c1 becomes zero, followed by c2 becoming non zero, then c5 becomes zero, then c3 becomes non zero, then finally c4 becomes non zero. Thus c1 and c5 have become zero, whilst c2 and c4 have become non-zero; c3 and c6 remain unchanged. Note that all changes are qualitatively monotonic along this path, i.e. there is no change back to a previous qualitative value. It may also be remarked that, assuming the change in size/position of x and y is monotonic (i.e. x/y change smoothly and without “reversal” between the relations DC and NTPP), then the actual metric value of the ci (x, y) will in general4 be monotonic too; e.g. consider |c5 (x, y)|: its value will be constant whilst DC and EC hold, but once PO starts to hold, then the diameter of c5 (x, y) will steadily reduce until it becomes zero on the transition to TPP. For the dual path, DC, EC, PO, TPPi, NTPPi, a similar analysis applies, but with the roles of c3 /c4 flipped and similarly for c5 /c6 5 . This leads to a formulation of a measure d(x, y) in which c1 , c5 and c6 contribute positively to the measure and c2 , c3 , c4 negatively: d(x, y) = |c1 (x, y)| + |c5 (x, y)| + |c6 (x, y)| − |c2 (x, y)| − |c3 (x, y)| − |c4 (x, y)| However, this measure is dependent on the absolute sizes of x and y; e.g. if x and y are both scaled to twice their size, then |c5 (x, y)| will double too. Thus 4 5
Some limitations on this monotonicity are discussed in section 6. We could also analyse the other paths through the conceptual neighbourhood of RCC8, and indeed the various processes which engender these, following the analysis of [4] .
116
M. Sridhar, A.G. Cohn, and D.C. Hogg
it is appropriate to normalise the measure. Rather than normalise the entire expression, it is better to normalise each component; this also allows for the possibility of x or y changing their size over time (such as when the entire transition from DC to NTPP is caused by an expansion of the region y [4]. Region based Distance c
1
c
2
c
3
c
4
c
5
c6 d 0 DC
1
1.5
EC
2
PO
2.5
TPP
3
3.5
Spatial Relations
4
NTPP
4.5
5
Fig. 3. Illustration of circles c1 to c6 as the relationships changes from DC to NTPP. The thick line represents the normalized region based distance obtained using c1 to c6 (Original in colour).
We thus consider the normalisation to be applied to each of the ci (x, y) expressions. The term c1 (x, y) is independent of the size of x and y but is dependent on the size of the universe (we assume here a bounded universe, such as the field of view in a camera image); thus we divide |c1 (x, y)| by δ(x ∪ −x), where δ(x) denotes the diameter of largest circle inside the bounding box of x, as a measure of the size of x. For c2 (x, y), the maximum value will be when the smaller region is part of the other, so normalising by dividing by the minimum of x and y is appropriate. For c3 (x, y), it only has non zero diameter when NTPP(x, y) holds. The maximum value for |c3 (x, y)| is 0.5 ∗ (δ(y) − d(x)), ie when circles on which δ(x) and δ(y) are based are concentric. So this suggests normalising |c3 (x, y)| by dividing by 0.5 ∗ (δ(y) − δ(x)). c4 (x, y) is dual to c3 (x, y), so the term should be divided by 0.5 ∗ (δ(x) − δ(y)). For the |c5 (x, y)| term, the maximum value is δ(x) so this is the normalising factor, and dually for the |c6 (x, y)| term, it should be divided by δ(y). This results in a revised, normalised distance measure: d(x, y) =
|c1 (x, y)| |c5 (x, y)| |c6 (x, y)| + + − δ(x ∪ −x)) δ(x) δ(y) |c2 (x, y)| |c3 (x, y)| |c4 (x, y)| − − min(δ(x), δ(y)) 0.5 ∗ (δ(x) − δ(y)) 0.5 ∗ (δ(y) − δ(x))
(1)
From Video to RCC8: Exploiting a Distance Based Semantics
117
The normalized measures for each of the circles c1 to c6 , together with the normalized region based distance d(x, y) is shown in figure 3. Below we refer to d(x, y) as the Region Based Distance (RBD) between x and y.
3
Application to Handling Noise in Video
Recent work [19] [17] has shown the benefits of qualitative spatio-temporal relationships in representing and learning about human activities. In [19], activities are regarded as being composed of events, which are modelled as interactions between a set of objects in space and time. Events such as unloading, representing interactions between trolleys, planes and loaders, were learned from videos of aircraft activities. Interactions were represented in terms of a graph structure that captures the temporal evolution of qualitative spatial relationships between the respective pairs of objects. These pairwise spatial relationships were computed from their tracks, where a track is simply a temporal sequence of minimum bounding rectangles (MBR) covering each object. However, visual noise and other errors introduced by video processing frequently result in instability in mereotopological relations over time, when these are defined point-set theoretically – i.e. ‘jitter’ can result as relations change frequently depending on the exact position and size of the MBR. The following describes a solution to this problem using a HMM that overlays a temporal model in order to regularizing these rapidly flipping spatial relationships. The states of the HMM are labelled by the RCC8 relationships. The observations are a sequence of region based distances between the respective pairs of object MBRs. The probability distribution between the states and the observations are modelled by an observation model for each state. With a trained HMM, it is possible to predict the most likely sequence of spatial relationships given a sequence of observed RBDs. The regularizing effect of the HMM is achieved by defining transition probabilities on RCC8 in such a way that it encourages objects to remain in the same state, while allowing transitions that are constrained by the connections in the RCC8 CNG. In other words, the HMM prevents rapidly flipping transitions by encouraging transitions to take place only when there is sufficient evidence from the observations, that is compelling enough to proceed to the next state. We show that good transition probabilities and observation model can be learnt automatically by training the HMM on manually annotated training videos. In this manner, the HMM learns a temporal model that can be regarded as an approximation of the way humans perceive these spatial transitions. The following describes the proposed approach more formally. Optimal Spatial Sequence for a Pair of Tracks 1. Let τ = (..., ot , ...), τ = (..., ot , ...) be a pair of tracks. It is assumed that all the corresponding MBRs ot ∈ τ and ot ∈ τ are observed together6 . 6
We are interested in computing the spatial relationships only when the objects are observed together [19].
118
M. Sridhar, A.G. Cohn, and D.C. Hogg
2. Let D(τ, τ ) = (..., d(ot , ot ), ...) be an observed sequence of RBDs between (τ, τ ). Here d(ot , ot ) is the RBD (equation 1) between corresponding MBRs (ot , ot ) at time t. 3. Let S(τ, τ ) = (..., s(ot , ot ), ...) be a hypothesized sequence of qualitative spatial relationships between (τ, τ ). Here s(ot , ot ) ∈ is the hypothesized spatial relationship between the corresponding MBRs (ot , ot ) at time t. The goal is to use a HMM model Θ to predict the most likely sequence of spatial ˆ τ ), given a sequence of observed distances D(τ, τ ), between relationships S(τ, the tracks τ, τ ˆ τ ) = arg max P (S(τ, τ )|D(τ, τ ), Θ) S(τ, S(τ,τ )
A HMM to Obtain an Optimal Spatial Sequence In order to address this problem, we formulate a HMM that models the joint probability distribution of the observed and hidden states as given by the tuple Θ = (, A, B, π), where 1. are the states of the HMM. They correspond to the spatial states in RCC8. 2. A = (aij )ij is the state transition matrix representing the probability aij = P (stt+1 = sj |stt = si ) of transition from state stt = si ∈ to stt+1 = sj ∈ . Only those transitions that are physically possible, as given by the CNG of the RCC8 have non-zero transition probabilities. 3. B = (bi (δt ))it is the observation model, where bi (δt ) represents the probability P (δt |stt = si ) of observing a RBD δt while being in state stt = si ∈ . The observation models for each state si ∈ are modelled as normal distributions7 N (μi , σi2 ). Here μi represents the mean RBD for the state si and σi2 represents the variance of this distance for this state. 4. π = (πi )i is the initial state distribution, where πi represents the probability of state stt = si ∈ being the initial state. The above HMM model Θ is trained with a dataset of sequences of region based distances (between tracks) that are manually annotated with the subjectively correct spatial relations. The Baum-Welch algorithm [13] is used to learn the parameters of the model Θ of the HMM. For a given pair of co-temporal tracks, a Viterbi decoder [13] is used to find the most likely sequence of spatial relationships.
4
Experiments
The following paragraphs describe the two video datasets and the evaluation of the proposed approach on these two datasets. 7
While non-symmetric distributions may be better suited as observation models, the normal distribution offers the simplicity with respect to learning the parameters of the model. We have also experimentally observed that the results with the normal distribution closely approximates those arising from these non-symmetric distributions.
From Video to RCC8: Exploiting a Distance Based Semantics
119
Datasets Two real video datasets are used to evaluate the proposed framework. The first consists of activities in an airport apron showing servicing of aircraft between flights. The second video dataset consists of activities representing simple verbs such as throw (a ball), catch etc. We will henceforth refer to these two datasets as the airport apron dataset and person-ball verbs dataset respectively. The processing of the real data sets involved two stages: detection and tracking. For the first stage, a multi-class object detector [12] based on HOG features was trained on a separate part of the dataset and applied to each frame of the rest of the dataset. The trained classes for the aircraft apron datasets were (i) plane; (ii) trolley; (iii) loader; (iv) bridge; (v) plane-puller. The trained classes for the person-ball verbs dataset were person and ball respectively. The second stage involves applying our implementation of the tracking technique reported in [21] to the detected blobs. We chose this technique since it performs global optimization to obtain the most likely set of tracks. Evaluation of the HMM Based Procedure The following experiment evaluates the proposed framework and compares it with the point set intersection technique, which is regarded as the baseline. In order to train and test the HMM, the tracked dataset is randomly divided into two parts: The first consisting of two thirds is used for training, and the remaining for testing. Ten such random partitions are created for evaluation. The training data is hand annotated by associating pairs of tracks in the training set with a corresponding sequence of spatial relationships8 . These annotations are subjectively assigned by the annotators. A part of an annotated sequence is shown in figure 4. Instead of labelling the entire data of several thousand frames, only those segments where there are changes in spatial relationships are considered for training and testing the HMM. This is because the main purpose of the HMM is to learn a stable transition between the spatial states, rather than parts where there is a high certainty of the spatial states. For the airport apron dataset, a total of 27 training segments and 14 test segments were prepared. For the person-ball verbs dataset, a total of 10 training segments and 5 test segments were prepared. In these segments, those pairs of tracks for which the spatial relationships change are first identified. These pairs are subjectively labelled, with the appropriate spatial relationship for each frame, in which both the tracks are observed. The segments are also provided with respective episodes9 for this sequence of spatial relationships. One such segment is illustrated in figure 4. 8 9
This is because the purpose of the HMM is to learns a mapping from a pair of tracks to a corresponding sequence of spatial relationships. Episodes are defined in [17] as sequence of spatial relationships such that within each episode the same spatial relation holds, but a different spatial relation holds immediately before and after the episode.
120
M. Sridhar, A.G. Cohn, and D.C. Hogg
Fig. 4. A segment with which the HMM based procedure for obtaining spatial relationships is trained and evaluated. Some images from a segment that has been manually annotated for training and evaluating the HMM are shown. An example of an annotation in the form of a sequence of spatial relationships is shown for a pair of tracks corresponding to a loader and trolley respectively.
Fig. 5. Some images sampled from the video footage of a scene depicting a person jumping over a ball, with bounding boxes on them. At the top of each image are two bands showing the spatial relationships (PO in blue and DR in red) up until the time of the depicted frame. For this video sequence, it can be observed that the HMM based on the distance based semantics has eliminated noise arising from jitter of the bounding boxes. The noise is evident in the band corresponding to the traditional point set intersection (PSI) based approach. For example, in the middle frame in the bottom row, it can be seen that prior to this point, both the PSI and HMM have inferred a DRrelationship, but the PSI relationship jitters back to PO in this frame, whilst the HMM relationship is stable at DR.
From Video to RCC8: Exploiting a Distance Based Semantics
121
The HMM is trained on the training segments for each random partition. The trained HMM is then applied on the test segments for the corresponding random partition. This gives rise to a sequence of spatial relationships between pairs of tracks on the test segments. A corresponding sequence of episodes are constructed from the inferred sequence of spatial relationships, for the sake of evaluation which is described below. Two such sequences for the verb catch and another event where the trolley detaches from a loader are illustrated in figure 6.
Fig. 6. Examples of correctly inferred spatial relationships for the airport and personball verbs datasets. Each image represents a sample from the sequence of images, corresponding to the interval, during which the spatial relationships given below hold. These spatial relationships have been inferred by the proposed approach. Note that the relation in the bottom row is PO according to PSI semantics, but is inferred as EC by the HMM.
Qualitative Evaluation. The performance of the HMM is evaluated qualitatively by examining the sequence of qualitative relationships obtained using the proposed approach and comparing it with the corresponding sequence obtained using the traditional point set intersection based computation. Figure 5 illustrates such a comparison, as explained in the corresponding caption. It can be seen that for this video sequence, the use of a HMM with the distance based semantics plays a significant role in eliminating noise arising from jitter of the bounding boxes. Other examples of correctly inferred spatial relationships for the airport and person-ball verbs datasets are shown in Figure 6. Quantitative Evaluation. The proposed approach can be quantitatively evaluated in two ways. The first involves evaluating the extent to which the HMM outputs a correct sequence of episodes. The accuracy is reported in terms of the mean and variance of the percentage of test segments for which the sequence of episodes exactly correspond to those segments in the ground truth, across the 10 random train-test partitions. These results for the proposed HMM based
122
M. Sridhar, A.G. Cohn, and D.C. Hogg
Table 1. Results evaluating the extent to which the inferred sequence of episodes exactly correspond to those segments in the ground truth. Each entry consists of the mean and variance respectively.
HMM Point Set Intersection
Aircraft Apron 82.3%, 7.1% 30.8%, 13.2%
Person-Ball Verbs 90.6%, 6.2% 67.9%, 10.4%
Table 2. Results evaluating the extent to which the outputted episodes temporally align with those of the ground truth. Each entry consists of the mean and variance respectively.
HMM Point Set Intersection
Aircraft Apron 66.1%, 2.2% 27.8%, 11.2%
Person-Ball Verbs 81.1%, 2.0% 57.8%, 8.3%
approach and the traditional point set intersection based approach for the two datasets are reported in Table 1. The second evaluation involves evaluating the extent to which the outputted episodes temporally align with those of the ground truth. This evaluation is restricted only to those those segments whose sequence of episodes obtained from the HMM matches the ground truth. This is because the purpose is to understand the extent of deviation in temporal alignment, despite the fact that the episodes have been matched correctly. A good alignment ensures a reduced chance of structural difference in temporal relationships (amongst the episodes) between the ground truth and the output of the HMM. Accuracy is measured in terms of the mean and variance of the percentage of temporal overlap, between the outcome of the HMM and the ground truth, across the 10 random partitions. These results for the proposed HMM based approach and the traditional point set intersection based approach for the two datasets are reported in Table 2. It can be concluded that the HMM significantly outperforms the traditional point intersection based technique. In particular, the potential advantage of using the HMM based approach using the RBDs on RCC8, for inducing stable sequence of qualitative spatial relationships from video data, has been demonstrated.
5
Summary
This paper explores this idea of a distance based interpretation for mereotopology. By introducing a distance measure between two regions x and y, and for various Boolean combinations of x and y, we show that all the RCC8 relations can be distinguished. We then introduce a distance measure which combines these individual measures which we show reflect different paths through the RCC8 conceptual neighbourhood (i.e. the measure decreases/increases monotonically given certain monotonic transitions, such as one region expanding). In contrast to traditional definitions of mereotopologies, in terms of point set intersections,
From Video to RCC8: Exploiting a Distance Based Semantics
123
our region based distance measures the degree to which the different mereotopological relations hold. We have demonstrated how a Hidden Markov Model can be used to exploit this distance based semantics to yield improved interpretation of video data at a qualitative level.
6
Limitations and Future Work
There are a number of avenues of further work which might be fruitfully explored. For simplicity in this work we limited the regions considered to bounding boxes aligned to orthogonal axes. If this assumption is relaxed then the RBD measure can fail to work as expected. Figure 7 illustrates three such regions, where c5 remains constant as the orange shape translates from left to right. One possible solution that we are currently investigating is to formulate a semantics that is based on a separate analysis of projections of a region along the horizontal and vertical coordinate axes respectively.
Fig. 7. Examples of regions where the proposed distance measure between regions can fail, as the smaller orange shape translates left to right. All though c2 and c4 will change as expected, c5 will remain constant!
Again for simplicity, in this paper we only considered computing the most probably RCC8 relation for each pair regions considered in isolation. In general, there will be multiple objects and it is necessary to ensure that their spatial relationships are globally consistent. For example, consider the relations between three regions x, y, z and the three relations R1 (x, y), R2 (y, z), R3 (x, z). The proposed approach does not ensure path consistency (i.e. the most probable interpretation of R1 (x, y), R2 (x, y) and R3 (x, y) might not be mutually consistent e.g. R1 = R2 = TPP, whilst R3 = TPPi). Thus an interesting possibility for future work is to couple HMMs between each pair of objects and introduce constraints between them. It would be worthwhile to evaluate evaluate whether such a coupling improves performance in abstracting stable qualitative spatial relations between multiple objects. Am alternative way of ensuring a globally consistent set of spatial relations is to check for this property on the most probable set of spatial relations, and if inconsistency is detected, then to choose the most probable set with this property. Another direction for future work is to carefully analyse all the different processes over the conceptual neighbourhood graph to be found in [4] in terms of the RBD measure defined here, and to build a system which can recognise processes reliably from video data. Another avenue of research is to explore different
124
M. Sridhar, A.G. Cohn, and D.C. Hogg
formulations of the RBD from the six individual circle metrics: should they be weighted (are some not relevant/useful) – it might appear that c2 is less useful since c5 and c6 capture much of what c2 does, except in very particular cases of transition. It will also be interesting to investigate if the HMM can be used to learn topological relationships[7] and to what extent these learnt relationships are qualitatively interesting or how they compare to qualitatively interesting topologies such as RCC8, when applied to a video event analysis tasks [19]. Mereotopologies such as RCC8 are not the only kind of qualitative spatial calculi, and another avenue of research would be to use HMMs to learn when to transition between the relations of other qualitative spatial calculi other than mereotopologies. Finally we note that if a probabilistic approach to QSR is desired, whereby for example, RCC8 relations have probabilities attached to them (for use in stochastic logic programming), then the HMM could provide such probabilities.
References 1. Cohn, A.G., Gotts, N.M.: Representing spatial vagueness: a mereological approach. In: Proceedings of the 5th Conference on Principles of Knowledge Representation and Reasoning, KR (1996) 2. Cohn, A.G., Varzi, A.: Mereotopological connection. Journal of Philosophical Logic 32, 357–390 (2003) 3. Egenhofer, M.J.: Reasoning about binary topological relations. In: G¨ unther, O., Schek, H.-J. (eds.) SSD 1991. LNCS, vol. 525. Springer, Heidelberg (1991) 4. Egenhofer, M.J., Al-Taha, K.K.: Reasoning about gradual changes of topological relationships. In: Frank, A.U., Formentini, U., Campari, I. (eds.) GIS 1992. LNCS, vol. 639, pp. 196–219. Springer, Heidelberg (1992) 5. Egenhofer, M.J., Dube, M.P.: Topological relations from metric refinements. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (2009) 6. Egenhofer, M.J.: A formal definition of binary topological relationships. In: Litwin, W., Schek, H.-J. (eds.) FODO 1989. LNCS, vol. 367. Springer, Heidelberg (1989) 7. Galata, A., Cohn, A.G., Magee, D.R., Hogg, D.C.: Modeling interaction using learnt qualitative spatio-temporal relations and variable length Markov models. In: Proceedings of the European Conference on Artifical Intelligence, ECAI (2002) 8. Galton, A.: Towards a qualitative theory of movement. In: Kuhn, W., Frank, A.U. (eds.) COSIT 1995. LNCS, vol. 988. Springer, Heidelberg (1995) 9. Galton, A.: Modes of overlap. Journal of Visual Languages and Computing 9, 61–79 (1998) 10. de Laguna, T.: Point, line and surface as sets of solids. The Journal of Philosophy 19, 449–461 (1922) 11. Li, S., Cohn, A.G.: Reasoning with topological and directional spatial information. Computational Intelligence (to appear) 12. Ott, P., Everingham, M.: Implicit color segmentation features for pedestrian and object detection. In: Proceedings of the International Conference on Computer Vision, ICCV (2009) 13. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings IEEE, 257–286 (1989)
From Video to RCC8: Exploiting a Distance Based Semantics
125
14. Randell, D., Cui, Z., Cohn, A.: A spatial logic based on regions and connection. In: Proc. 3rd Int. Conf. on Knowledge Representation and Reasoning (1992) 15. Renz, J.: A canonical model of the region connection calculus. Journal of Applied Non-Classical Logics (JANCL) 12, 469–494 (2002) 16. Schockaert, S., Cock, M.D., Kerre, E.E.: Spatial reasoning in a fuzzy region connection calculus. Artif. Intell. 173(2), 258–298 (2009) 17. Sridhar, M., Cohn, A.G., Hogg, D.C.: Learning functional object-categories from a relational spatio-temporal representation. In: Proceedings of the European Conference on Artifical Intelligence, ECAI (2008) 18. Sridhar, M., Cohn, A.G., Hogg, D.C.: Discovering an event taxonomy from video using qualitative spatio-temporal graphs. In: Proceedings of the European Conference on Artifical Intelligence (ECAI). IOS Press, Amsterdam (2010) 19. Sridhar, M., Cohn, A.G., Hogg, D.C.: Unsupervised learning of event classes from video. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 1631–1638. AAAI, Menlo Park (2010) 20. Welch, G., Bishop, G.: An introduction to the Kalman filter. Tech. rep., University of North Carolina at Chapel Hill (1995) 21. Yu, Q., Medioni, G.: Integrated detection and tracking for multiple moving objects using data-driven MCMC data association. In: IEEE Workshop on Motion and Video Computing (2008)
Decentralized Reasoning about Gradual Changes of Topological Relationships between Continuously Evolving Regions Lin-Jie Guan and Matt Duckham Department of Infrastructure Engineering, The University of Melbourne, Victoria 3010, Australia
[email protected],
[email protected]
Abstract. A key challenge facing many applications of new geosensor networks technology is to derive meaningful spatial knowledge from lowlevel sensed data. This paper presents a formal model for representing and computing topological relationship changes between continuously evolving regions monitored by a geosensor network. The definition of “continuity” is used to constrain region evolution and enables the local detection of node state transitions in the network. The model provides a computational framework for the detection of global high-level qualitative relationship changes from local low-level quantitative sensor measurements. In this paper, an efficient decentralized algorithm is also designed and implemented to detect relationship changes and its computational efficiency is evaluated experimentally using simulation.
1
Introduction
A geosensor network (GSN) is a wireless network of tiny, sensor-enabled computing devices, called sensor nodes [1]. The connected nodes distributed over a geographical surface can collaboratively monitor dynamic environmental phenomena at fine spatial and temporal granularities. They are tasked to detect spatiotemporal changes, and to answer high level spatial and spatiotemporal queries about dynamic geographic phenomena. Real-time queries of gradual changes in topological relationships between two spatial regions are of special interest in many applications. The conventional models for representing and computing with topological relationships use the 9-intersection model [2], RCC [3], or CBM [4]. However, there are three challenges to directly applying these approaches in a GSN. First, a GSN requires decentralized algorithms to compute spatiotemporal information inside the network to conserve the energy. Conventional models are centralized in the sense that they require all sensor data to be collected and processed in a centralized server, like a GIS or spatial database. Second, conventional models of topological relations assume information about regions in continuous space. By contrast, our information about such regions is always at limited spatial (and temporal) granularity. A discrete model of topological relationships between regions is needed for correctly interpreting the changes to our finite-granularity M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 126–147, 2011. c Springer-Verlag Berlin Heidelberg 2011
Decentralized Reasoning about Gradual Changes
127
observations of those regions. Third, conventional models adopt high level qualitative approaches to handling spatial information, but a GSN has only low level quantitative sensor measurements about dynamic environmental phenomena. Therefore, there exists a gap between the global, high-level, qualitative spatial information and the local, low-level, quantitative sensor measurements in the context of a GSN. A GSN should allow users to interact with the network using high-level queries to avoid the need to understand low-level details. The goal of this research is to bridge the gap and design a decentralized, efficient computational framework to detect relationship changes between continuously evolving spatial regions monitored by a GSN. Common deformations of a region (i.e. scaling, translation) can change the topological relationship between two regions. This paper focuses on two scaling types (a region’s expansion and contraction) of deformations, and systematically investigates a decentralized computational framework for inferring relationship changes in a geosensor network. After a review of the relevant literature (Section 2), Section 3 will describe a formal model for the decentralized representation of spatial regions and topological relationships in a GSN. The key contribution of this paper is the design and development of a computational framework to reason about gradual changes in topological relationships of continuously evolving regions in a decentralized manner (Section 4). A decentralized algorithm is provided to infer relationship changes (Section 5) and evaluated by simulations (Section 6). The algorithm relies only on local information about a node’s own state and its immediate (one-hop) neighbors’, with no centralized control. Further, the approach can operate without coordinate information about node locations, an important consideration in many practical applications where on-board node positioning systems may be unavailable or unreliable. A discussion of results and conclusion are provided in Section 7.
2
Background
Wireless communication technology can enable the networking of a large amount of sensing devices embedded into our physical environment, such as forests, the ocean, high-rise buildings or highways [5]. A GSN can provide a fine-grained monitoring of dynamic spatial entities, like regions, across a wide variety of applications. In many of those applications, qualitative spatiotemporal queries are expected to be important, for example queries about the topological relationships between dynamic regions. For example, a GSN capable of sensing environmental parameters like temperature, humidity, fuel load, and wind in a forest, might be tasked with monitoring bushfire susceptibility. In such an application, changes to the topological relationships between regions of high fuel load, low humidity, and high temperature (e.g., from disjoint to overlap to contain) will be highly salient. The resource constraints on sensor nodes require that algorithms for generating information about such high-level qualitative changes can operate in the network in order to maximize the overall network lifetime [6]. This paper will provide a decentralized approach to processing spatiotemporal information in a geosensor network.
128
L.-J. Guan and M. Duckham
Many existing works use traditional numerical operations (or arithmetic operations such as Min, Max, Average and Sum) to retrieve and manipulate sensed data [7]. However, extracting high-level spatial knowledge requires more sophisticated spatial operations [8]. Designing operations for processing qualitative spatiotemporal information (i.e. topological relationships) is not only of our interest but also cognitively important [9]. Reasoning techniques should also be exploited when designing operations for processing spatiotemporal information, because incompleteness and imprecision are fundamental characteristics of sensor data acquired by a geosensor network [10]. Modeling and processing spatiotemporal information in a geosensor network has already received a lot of attention. An event-based approach to detecting topological changes such as region (or hole) merging, splitting, appearance, and disappearance was proposed in [11]. Although this research focuses on detecting regions’ incremental changes, the authors also give a definition of “continuity” to constrain regions’ evolution. Similarly, in this paper we require that only region boundaries can transition to region interiors or exteriors for any two consecutive time steps. Other related work has designed and implemented efficient decentralized algorithms to detect topological changes, where coordinates of each node are generally assumed to be available in the network [12–16]. Recent work has also investigated tracking multiple objects simultaneously for detecting their topological changes [17, 18]. Other related work has studied the static topological structure [19] and dynamic topological changes (e.g., merge, split, appear, disappear, [13]) within complex areal objects. Finally, recent related work has also studied the static topological relationships between heterogeneous regions [20, 21]. Thus, in contrast to previous work, this paper investigates dynamic topological relationships between heterogeneous regions (e.g., overlap, meet, equal, etc.), using the conceptual neighborhood graph to constrain gradual changes. Further, unlike most previous work cited above, our algorithm can operate without coordinate position or cyclic ordering information, relying solely on qualitative information about the identities of each node’s immediate (one-hop) neighbors.
3
Formal Model
This section outlines a formal model for representing spatial regions and their topological relationships in a geosensor network. Most of the definitions are derived directly from related work in this area [11, 14]. The paper concerns gradual changes in topological relationships between two spatial regions. However, the model can be simply extended to deal with three (or more) spatial regions simultaneously. 3.1
Preliminaries
A geosensor network is modeled as a planar graph G = (V, E), where V denotes a set of geosensor nodes and E denotes a set of direct (one-hop) communication
Decentralized Reasoning about Gradual Changes
129
links between nodes. Each node can collect a large volume of quantitative sensor measurements by locally sensing dynamic geographic phenomena over time. By giving a threshold, qualitative sensor information can be computed locally, such as “high temperature” (e.g., above 40◦ C) or “low humidity” (e.g., below 30%). The qualitative sensor information can be then modeled as a sensor function of the form s : V → {0, 1}. A set of geosensor nodes for which s(v) = 1 can approximate a spatial region of 1, where each node can locally decide whether it is either inside or outside the spatial region. Choosing an appropriate threshold for defining spatial regions will depend on the specific application. In this paper we do not consider cases where the threshold is not crisp. However, this is a topic of ongoing work, for example studying the topological changes to regions with broad boundaries [22]. Each node v can exchange qualitative sensor information s(v) with one-hop neighbors, denoted as nbr(v), where nbr(v) = {v |{v, v } ∈ E}. Thus, neighboring nodes can collaboratively determine whether they are boundary nodes, denoted as the form b : V → {0, 1}. A node v is a boundary node (b(v) = 1) if there exists a v ∈ nbr(v) such that s(v) = s(v ). Otherwise, the node is non-boundary node (b(v) = 0). As shown in the Figure 1, combining two local values s(v) and b(v) can distinguish four different states for node v in a spatial region: interior node (s(v) = 1, b(v) = 0), inner boundary node (s(v) = 1, b(v) = 1), outer boundary node (s(v) = 0, b(v) = 1), exterior node (s(v) = 0, b(v) = 0).
Exterior node Outer boundary node
Inner boundary node
Region boundary Spatial region
Interior node
Fig. 1. Geosensor network showing a set of nodes, direct communication links between geosensor nodes, and sensed data (s(v) = 1 shown as gray node; s(v) = 0 shown as white node; b(v) = 1 shown as nodes with thick stroke; b(v) = 0 shown as nodes with thin stroke)
3.2
Region Topology
The model above can be simply extended to consider two (or more) regions by assuming two (or more) sensor functions for sensing region A and B, sA : V → {0, A} and sB : V → {0, B}.
130
L.-J. Guan and M. Duckham
An obvious way to describe a set of fundamental topological relationships between different sensor regions monitored by a geosensor network is to adapt the well-known 4-intersection model (or the 9-intersection model, equivalent in the case of areal objects [23]). First, define the “interior” A◦ of a sensor region A as the set of interior nodes that are inside the region A but are not boundary nodes, A◦ = {v ∈ V |s(v) = 1 ∧ b(v) = 0}. Next, define the “boundary” ∂A of a sensor region A as the set of nodes that are inside the region and are boundary nodes, ∂A = {v ∈ V |s(v) = 1 ∧ b(v) = 1}. For two regions A and B, examining the intersections between the sets A◦ , ∂A and B ◦ , ∂B leads directly to the familiar eight topological relationships between regions: disjoint (D), contains (C), inside (I), equal (E), meet (M), covers (V), covered by (B), and overlap (O) (cf. [24]). 3.3
From Node States to Region Topology
The discussion above in Section 3.2 provides a global definition of the topological relationships between sensor regions in a geosensor network. However, individual nodes do not have access to this global information, only to local information assumed to include their own state and those of their immediate (one-hop) neighbors. Thus, each node can locally sense whether it is in or out of some sensed region; and can also determine whether it is at the boundary of a region by communicating with its immediate one-hop neighbors. Consequently, each node has access to four bits of information: whether it can sense A or B, and whether it is a boundary node of A or B. A node’s state can therefore be concisely represented as a 4-bit binary number. In this paper we adopt the convention that for two sensed binary variables A and B, a node’s state, denoted as ns is represented using the truth values for the tuple sA (v), bA (v), sB (v), bB (v). For example, a node with state 1010 is inside both regions A and B, but is not a boundary node of either region. Similarly a node with state 0111 is at the boundary but not inside region A (an outer boundary node of region A) and inside region B and at its boundary (an inner boundary node of region B). Figure 2 illustrates a topological configuration (overlap) of two sensor regions that exhibits all 16 possible node states. Table 1 shows the knowledge a single node can infer about the possible global topological configurations using purely its own 4-bit state. As indicated in the table, some node states can infer a large number of possible topological relationships between spatial regions. The next section will provide a computational framework to detect changes in topological relationships between continuously evolving regions from these node states. 3.4
Topology in Discrete Spaces
The problem of topology for discrete spaces has been an active topic of study. In digital topology, for example, the question of the correct interpretation of topology based on raster images has been thoroughly investigated [25]. A raster image can be modeled as a graph, where each raster cell is a node, and edges
Decentralized Reasoning about Gradual Changes
131
0100
1100 1000
1101
0111
1001 1011
0010
1110
0011
0001
0110
1010 1111 0101
0000
Region R2
Region R1
Fig. 2. The overlapping relationship between two regions exhibiting all 16 possible node states; these node states are labeled on the corresponding geosensor nodes shown as dark nodes
Table 1. Candidate topological relationships that can be inferred by a single node
1 2 3 4 5 6 7 8
Node state 1111 1110 1011 1100 0011 1010 1000 0010
Candidate relationships {B,E,O,V} {B,I,O} {C,O,V} {C,D,M,O,V} {B,D,I,M,O} {B,C,E,I,O,V} {C,D,M,O,V} {B,D,I,M,O}
9 10 11 12 13 14 15 16
Node state 0000 0100 0001 0110 1001 0111 1101 0101
Candidate relationships {B,C,D,E,I,M,O,V} {C,D,M,O,V} {B,D,I,M,O} {B,I,O} {C,O,V} {B,I,M,O} {C,M,O,V} {B,D,E,M,O,V}
connect adjacent cells. Such a graph-based model of a raster is closely related to our model of a geosensor network, although in a geosensor network we cannot rely on such regular neighborhoods for each node as we can for cells in a raster image (i.e., where we have 4 or 8 neighbors for every cell, ignoring edge effects). A known theoretical problem of digital topology is the lack of one-dimensional Jordan boundaries, a foundation of topology in continuous spaces [26]. In overcoming this limitation, there are two closely-related alternatives. One is to treat the boundary of a region explicitly as the set of elements that are in (or out) of that region, but have neighboring elements that are out (or in) the region. This is the approach adopted in [27]. Similarly, in the context of geosensor networks [21] adopt this interpretation. However, as a result of granularity effects, this interpretation leads to a somewhat counter-intuitive result where for discretized regions to meet in the discrete space, they must overlap in the continuous space (Figure 1a).
132
L.-J. Guan and M. Duckham
a. Explicit boundary
b. Implicit boundary (overlap)
c. Implicit boundary (disjoint) d. Implicit boundary (meet) Fig. 3. Comparing discrete definitions of the meet relationship between two regions. Regions that meet in the discrete space must overlap in the continuous space (a), in the interpretation where the boundary composed of nodes inside the region with neighbors outside the region (bold nodes). Instead, if the boundary is composed of pairs of neighbors, where one node is inside and one node is outside the region (bold edges), regions that meet in the discrete space may overlap (b), be disjoint (c), or meet (d).
The alternative is to define implicit boundaries between discrete elements, where the boundary itself is regarded as a pair of neighboring boundary elements, one of which is in the region and one of which is out of the region. In the domain of digital topology, [28] adopt this approach. Again due to granularity effects it is still possible that regions in the continuous space in actuality overlap, or indeed are disjoint, as well as truly meet (see Figures 1b, 1c, and 1d, respectively). Ultimately, there is no “correct” interpretation: limited granularity means some distortions are inevitable. However, this paper and our previous work [20] similarly adopt the latter, implicit-boundary interpretation, in contrast to [21]. It has been argued that this second interpretation, of pairs of neighboring nodes, is more cognitively “natural” [28] (since it does not lead the counter-intuitive result that discretized regions that meet in the discrete space must overlap in the continuous space). Further, this implicit interpretation does not require an arbitrary decision about whether to treat the boundary as the set of elements inside the region with neighbors outside versus the set of elements outside the region with neighbors inside.
Decentralized Reasoning about Gradual Changes
4
133
Computational Framework for Decentralized Reasoning
The model above has provided a static view of node states and topological relationship between two spatial regions at a point in time. When considering continuously evolving spatial regions monitored by a geosensor network, the model clearly needs extension. Regions’ deformations can lead to gradual changes in topological relationships between two regions. In representing spatiotemporal geographic phenomena, the simplest way is to represent them as a sequence of snapshots—static states of spatial entities monitored by a geosensor network. Thus, the formal model can be extended with a set of timestamps T for every sensor function (e.g., sA : T × V → {0, A}). Some changes in sensed data may lead to a node changing its state (see Table 1). Local state transitions can help in locally detecting global changes in topological relationships. This section will present a computational framework for detecting global high-level changes in topological relationships from such local low-level state transitions. 4.1
Continuity of Region Evolutions
Given dynamic a geographic phenomenon monitored at an appropriately fine level of a temporal granularity, evolving spatial entities can exhibit various degrees of continuity, such as interior-continuity, boundary-continuity, or sizecontinuity [29, 30]. In modeling the evolution of dynamic geographic phenomena, these three types of continuity can be used to classify the continuous evolution of a spatial region, including a region’s expansion and contraction. The objective of this research is to detect changes in topological relationships between continuously evolving regions. It is therefore necessary to adopt a definition of “continuity” to constrain a region’s evolution in the context of a geosensor network. Let R : T → V be a temporally evolving region where R(t) is the region at time t ∈ T . Suppose R only undergoes a scaling type of deformation over time, then a region’s evolution can be modeled as follows: Definition 1. Two time steps t , t ∈ T are consecutive if t < t and there does not exist a t ∈ T such that t < t < t. For two consecutive time steps t , t ∈ T , the region has expanded if R(t ) ⊂ R(t); the region has contracted if R(t) ⊂ R(t ); otherwise the region is unchanged. Each node can locally detect changes in its sensed value when a spatial region expands or contracts over time. Its state might involve a transition between exterior node, outer boundary node, inner boundary node, and interior node for the spatial region. Thus, the definition of region continuity can be specified as follows: Definition 2. For consecutive time steps t , t ∈ T , a spatial region is evolving continuously if for all nodes v ∈ R(t )−R(t), s(t , v) = b(t , v) = 1 and s(t, v) = 0 and b(t, v) = 1; and for all nodes v ∈ R(t) − R(t ), s(t, v) = b(t, v) = 1 and s(t , v) = 0 and b(t , v) = 1.
134
L.-J. Guan and M. Duckham
Figure 4 illustrates four node states and their possible transitions when a spatial region undergoes a continuous evolution over time. Continuous evolution of a spatial region will always involve a state transition between inner boundary nodes and outer boundary nodes. When a region expands continuously, outer boundary nodes transition to inner boundary nodes; When a region contracts continuously, inner boundary nodes transition to outer boundary nodes. Note that the definition of continuity excludes the possibility of region appearance or disappearance. While this is a reasonable restriction in the context of scaling deformations [31], clearly this constraint may be violated by changes to real-world phenomena (i.e. appearance and disappearance of regions measuring high/low moisture or temperature). However, it is expected with prior work that already monitors region appearance and disappearance (i.e. [14, 15]) could be integrated into our current framework as a topic for ongoing research.
00
01
Exterior node (00) Interior node (10)
11
10
Outer boundary node (01) Inner boundary node (11)
Fig. 4. The transitions between neighboring states for a continuously evolving region monitored by a geosensor network
Considering two spatial regions monitored by a geosensor network, continuous expansion or contraction of one region will lead to a gradual change in topological relationships between these two regions. Possible relationship changes between two spatial regions have already been investigated by other researchers [32], as shown in Figure 5. This conceptual neighborhood graph assumes only one region evolves at one time. Every edge connecting two topological relationships indicates a relationship change over time by a region’s expansion or contraction. Note that relationship change between overlap and equal is only feasible when a region undergoes a translation type of deformation, which is beyond the topic of this paper and will be investigated in the future work. The following sections will describe how to reason about gradual changes in these topological relationships in the network. 4.2
Node State Transitions
The definition of “continuity” above constrains a region’s evolution and defines the transitions between four states for a single region. Intuitively, the same definition can be applied to the transitions between the 16 node states for two spatial regions. As shown in Figure 6, the state transition diagram illustrates possible state transitions. In this diagram, each edge connects two neighboring node states and indicates a possible state transition between them under the continuous evolution of spatial regions.
Decentralized Reasoning about Gradual Changes
135
Disjoint Meet Overlap Cover
CoveredBy Inside
Contain
Equal
Fig. 5. The conceptual neighborhood graph describing possible relationship changes between two spatial regions under the scaling type of deformations, adapted from [32]
1010 1110
1011 1111
0110 0111
0010
1001
0101
0011
1000
1101 1100 0100
0001 0000
Fig. 6. The state transition diagram exhibiting possible transitions among the 16 node states; node states sharing the same properties are grouped together and assigned a gray-scaled color (white color shows nodes in group I, light gray shows nodes in group II, dark gray shows nodes in group III, and dark color shows nodes in group IV)
The 16 node states are classified into four groups according to their roles and properties in a geosensor network: • Group I (0000, 0010, 1000, 1010): nodes with states in this group can not sense any part of the boundary. • Group II (0011, 0001, 1100, 0100): nodes can sense the (inner or outer) boundary of one region only, which means the nodes can detect continuous evolutions of single region in the network. • Group III (1110, 1011, 1001, 0110): nodes can sense both regions in the network, but they can sense the inner or outer boundary of only one region. • Group IV (1111, 1101, 0111, 0101): nodes can sense inner and/or outer boundary of both regions. The nodes in this group are especially important for detecting deformation in both regions, and so relationship changes in a geosensor network.
136
L.-J. Guan and M. Duckham
Table 2. The node states in group IV are derived from Table 1 to characterize each topological relationship
1 2 3 4 5 6 7 8
Topological relationship disjoint (D) contains (C) inside (I) equal (E) meet (M) covers (V) coveredBy (B) overlap (O)
Node 0101 1101 0111 1111, 1101, 1111, 1111, 1111,
states in group IV
0101 0111, 1101, 0111, 1101,
0101 0101 0101 0111, 0101
Table 2 illustrates the characterization of each topological relationship by a distinct set of node states in group IV. For group IV, if the topological relationship has changed, a state transition must occur between the node states because no two relationships have nodes with identical states. On the contrary, if state transitions occurred between the node states in group IV, the topological relationship could have remained the same. Therefore, a state transition for nodes in group IV is a necessary but not sufficient condition for topological relationship changes. The next section will analyze various types of relationship changes based on the transitions between the node states in group IV. 4.3
Inferring Relationship Changes
Both regions’ dynamism and scaling types influence the transitions between node states in group IV as well as relationship changes in a geosensor network. Different region’s dynamism or scaling types can result in the same relationship change but very different state transitions. Therefore, it is necessary to systematically investigate the effects of continuous deformations to either region A or B leading to relationship changes between two spatial regions. Table 3 gives us the result of exhaustive analysis of the effects, where possible relationship changes can be inferred when nodes in group IV locally detect state transitions in the network. There are three types of relationship changes in the table, and each one is labeled with a number. For the relationship change of type 1, if there exists a state transition locally detected by a node, the global relationship change can then be correctly inferred from the state transition in the network. For example, if a state transition from 0111 to 1111 is detected by a node v locally and the current topological relationship is meet in the network, the node v can infer a relationship change from meet to overlap based on the Table 3. Similarly to type 1, the relationship change of type 2 can be detected when there exists a state transition locally detected by a node. For type 2, two possible relationship changes can be inferred from one state transition when the current topological relationship is inside (I), equal (E) or contain (C). For example, as shown in Table 3, having a state transition from 0111 to 1111 and the topological
Decentralized Reasoning about Gradual Changes
137
Table 3. Candidate gradual changes of topological relationships inferring from state transitions between node states in group IV 0111 1111 M
1
O 3 3
O V I B I
∗
1 2
0101 1101 D
3 1
B O
B
V
E
E
3 3 2 2
E 2
∗
1
M
E
∗
3 3
1111 1101
M
M
B O
O
C
I
V
B
C
∗
O
1 1
C
V
V
E
C
∗
2 2 3 2 2
∗
3
0111 0101
E
3 3 2 2 3 2 2
∗
I
3
D 1 3
V 1 2
B 3 3
E 2 2
E∗ 2
relationship is inside (I), either relationship change from I to B or from I to E is possible purely from local detection of a state transition. In this situation, we need to determine which of the possible relationship change has occurred. Choosing a candidate relationship can be based on the analysis of the set of node states for each topological relationship. From observations in the Table 2, some topological relationships have a smaller set of node state, which is a subset of other topological relationships. Therefore, it is possible to use set difference of node states to justify the existence of a topological relationship in the network. As shown in Table 3, candidate relationships are labeled with the symbol “ ∗ ”, indicating they should be inferred firstly when the relationship change of type 2 is detected locally from a state transition. An inconsistency is defined to be detected if a node state is not matched with the candidate relationship according to the Table 2. If an inconsistency is detected somewhere in the network, the global topological relationship will be updated with the alternative one. For the relationship change of type 3, the detection of a state transition is not sufficient to correctly infer a new topological relationship in the network. It cannot ensure that the global topological relationship has changed to a new one, unless the state transition occurs at all nodes with this state in the network. If any node with this state remains unchanged, it means this state is inconsistent with the newly detected relationship. If inconsistency is detected for relationship changes of type 3, the global topological relationship remains unchanged. In summary, relationship change of type 1 can be easily detected from local state transitions in the network. For the relationship changes of type 2 and 3, inconsistency might exist in the network when local state transitions between nodes in group IV are detected by nodes. In order to correctly infer what relationship change occurs, geosensor nodes need to collaboratively perform the consistency check for each relationship newly detected in the network. The following section will describe a decentralized algorithm to demonstrate a collaborative task for checking consistency and inferring relationship changes in the network.
138
5
L.-J. Guan and M. Duckham
Algorithm
The paper adopts Nicola Santoro’s style for decentralized algorithm specification, which executes in parallel on every node in the network [33]. Briefly, the algorithm defines four states1 (INIT, ACTIVE, IDLE, DONE) for each node; each node runs as an automaton and transitions between the four states for detecting relationship changes in a geosensor network. In each state, nodes can respond to a number of events, including receiving a message (Receiving), an alarm trigger (When), and spontaneous events (Spontaneously). Associated with each event is an action, which contains a finite number of atomic operations that are executed by nodes, without interruption until completion. The following part will explain details of the decentralized algorithm using this convention. 5.1
Procedures of the Decentralized Algorithm
Algorithm 1 specifies the main part of decentralized reasoning about gradual changes in topological relationships between two spatial regions; and Algorithm 2 specifies some procedures used for the computational process. Assume that the global topological relationship r is known at the initialization of a geosensor network, which can be computed in the network or at the sink node (line 3, in Algorithm 1) [21]. The decentralized algorithm consists of following key stages: • For each sensing round, a node v senses two regions and broadcasts sensor values to its one hop neighbors in the INIT state (line 5–8). Node v will transition to the IDLE state before it sets the local clock alarm for the next sensing round at t + Δt (line 9). • Once transition to the IDLE state, node v can compute its node state based on sensor values received from its neighbors, and transition to the ACTIVE state if v is not a node in group I (line 11–16). • In the ACTIVE state, node v starts to compute a state transition and infer a relationship change (line 32–34). If the relationship change is type 1, v updates the global topological relationship with a new one and broadcasts it to one-hop neighbors before transition to the DONE state (line 49–52 in Algorithm 2). Otherwise, v stores the new relationship in the checking set, initializes a collaborative consistency checking process and transitions to the IDLE state (line 54–56). • In the IDLE state, v might receive a “checking” message and perform onetime consistency check for the new relationship (line 17–19). If an inconsistency is detected and relationship change is type 2, v updates its local knowledge with the new topological relationship and broadcasts the new update before transition to the DONE state (line 62–65). • If an inconsistency is detected and the relationship change is type 3, v broadcasts the inconsistency to neighbors (line 67). And if neighboring nodes 1
Note that four states defined here are used to illustrate the flow of the algorithm, and are not related to the 16 node states defined in Section 3.
Decentralized Reasoning about Gradual Changes
139
Algorithm 1. The decentralized algorithm for a node v to detect changes 1: States: S={INIT, ACTIVE, IDLE, DONE} 2: Local variables: checking set ← ∅, new relationship, nst = (sA , bA , sB , bB ), nst−1 3: Local knowledge: the topological relationship r 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46:
INIT Spontaneously sA ← sA (v) and sB ← sB (v) // Sense region A and B Set alarm ← t + Δt // Set the local clock for the next sensing time Broadcast(“initializing”, sA , sB ) message to nbr(v) Become IDLE // Transition to the IDLE state IDLE Receiving(“initializing”, sA , sB ) messages from nbr(v) nst ← (sA , bA , sB , bB ) // Compute node states if nst belongs to group I then Become DONE else Become ACTIVE Receiving(“checking”, r ) messages from nbr(v) if r ∈ / checking set then Check-consistency // see Algorithm 2 Receiving(“eliminating”, r ) messages from nbr(v) if r ∈ checking set then checking set ← checking set − {r } // Remove r from the set Broadcast(“eliminating”, r ) message to nbr(v) Receiving(“updating”, r ) messages from nbr(v) r ← r // Update local knowledge with new relationship r Broadcast(“updating”, r) message to nbr(v) Become DONE Receiving(“maintaining”, r ) messages from nbr(v) r ← r ACTIVE Spontaneously if nst = nst−1 then // Compute node state transitions if nst and nst−1 belong to group IV then Infer-relationship-changes // see Algorithm 2 else Broadcast(“maintaining”, r) message to nbr(v) Become IDLE else Become IDLE IDLE, DONE When(t = alarm) // when the clock alarm ringing nst−1 ← nst ; nst ← 0 if checking set = ∅ then r ← new relationship where new relationship ∈ checking set checking set ← ∅ Become INIT
140
L.-J. Guan and M. Duckham
Algorithm 2. Procedures used by the Algorithm 1 47: procedure Infer-relationship-changes 48: Compute relationship change based on the current relationship r and the state transition from nst−1 to nst // See Table 3 49: if relationship change is type 1 then 50: r ← new relationship // Update local knowledge with new relationship 51: Broadcast(“updating”, r) message to nbr(v) 52: Become DONE 53: else 54: checking set ← checking set ∪ {new relationship} 55: Broadcast(“checking”, new relationship) message to nbr(v) 56: Become IDLE 57: procedure Check-consistency 58: if r is consistent with nst then 59: checking set ← checking set ∪ {r } // Add r to the set 60: Broadcast(“checking”, r ) message to nbr(v) 61: else 62: if r is I, E, or C then // Indicating relationship change is type 2 63: r ← new relationship 64: Broadcast(“updating”, r) message to nbr(v) 65: Become DONE 66: else 67: Broadcast(“eliminating”, r ) message to nbr(v)
receive this inconsistency in the IDLE state, they will remove the new relationship from the checking set (line 20–23). • Node v can eventually transition to either IDLE or DONE state in each sensing round. In both states, v archives its node state when the clock alarm rings (line 41–42). For relationship change of type 2 or 3, if checking set is not empty, it indicates the new relationship is consistent with all node states in the network. v updates the topological relationship with the new one and transitions back to the INIT state for the next round of sensing (line 43–46). 5.2
Computational Analysis
As communication is more expensive than computation of data in a geosensor network, computational efficiency is measured in terms of communication complexity (the total number of messages). To analyze the communication complexity, let us first consider the types of messages exchanged in the network. In total, there are five types of messages (“initializing”, “maintaining”, “checking”, “eliminating” and “updating”) transmitted in the network. The communication complexity of “initializing” messages is Θ(|V |) (each node transmits exactly one message to exchange sensor data). The number of “maintaining” messages depends on the total number of nodes who detect regions’ evolutions. Because the boundary maintenance is a local operation, it is performed where region evolutions are detected. It might be
Decentralized Reasoning about Gradual Changes
141
expected in the worst case, that every node in the network were a boundary node, leading to an overall communication complexity for “maintaining” messages of O(|V |). However, in actuality, the number of nodes at the boundary of a region scales in proportion to |V |0.5∗D , where D ∈ [1, 2) is the fractal dimension of the region [19]. Further, in practice D is typically found to lie in the range 1.2–1.3 for many geographic features [34]. Thus, in the worst case we expect the complexity of the “maintaining” messages to be O(|V |x ), where 0.5 ≤ x < 1.0; however on average we expect 0.5 ≤ x ≤ 0.7. For other message types, the communication complexity depends on both the length of the boundary (which scales |V |0.5∗D , D ∈ [1, 2)) and the type of relationship changes detected in the network. • If the relationship change of type 1 is detected, then only an “updating” message is transmitted once among boundary nodes in the network. The communication complexity is therefore O(|V |x ). • If the relationship change of type 2 or 3 is detected and the new relationship is consistent with node states in the network, only “checking” messages will be exchanged by boundary nodes. The communication complexity is O(|V |x ). • If the new relationship is inconsistent with node states, “updating” messages are required for the relationship change of type 2. Similarly, “eliminating” messages are required for type 3 to notify the detection of an inconsistency in the network. Thus, the worse case performance is O(2|V |x ) for these three types of relationship changes. The overall communication complexity of the algorithm is O(3|V |x + |V |) (0.5 ≤ x < 1.0) for each sensing cycle. Another important measure of efficiency is load balance (the number of messages sent by each node). The algorithm can achieve constant communication complexity O(1), with any node transmitting at most 4 messages for each sensing cycle. However, each node in group I only needs to transmit one “initializing” message to compute a node state. And depending on the type of a relationship change, other nodes might transmit 3 more messages to collaboratively infer a relationship change in a geosensor network.
6
Experiments
The decentralized algorithm was implemented within the agent-based simulation system NetLogo. NetLogo is an ideal platform for empirically testing decentralized spatiotemporal algorithms, as it offers support not only for modeling sensor nodes’ behavior in the network but also for simulating dynamic geographical phenomena monitored by the sensor nodes. Real-world deployments of networks are beyond the scope of this research. Although sensor network technology continues to become cheaper and more easily accessible, large-scale deployments of hundreds or thousands of nodes, required for testing spatial algorithms, are still not yet practical for widespread use and experimentation. In each simulation, geosensor nodes were randomly distributed in a space of 100×100 units. A communication graph (either unit disk graph, UDG, or
142
L.-J. Guan and M. Duckham
planar Delaunay triangulation, DT) were used to connect nodes. In the UDG, nodes communicate with neighboring nodes within a unit communication range; a planar DT can be created based on the UDG. Each experiment comprises 250 simulation runs, each of which ran for 400 time steps. Using the same sequence of region evolutions across all experiments ensured the comparability between simulation sets. In each simulation, two spatial regions were generated randomly in the space; the scaling type of a region was randomly selected and evolved continuously for 10 steps. At each step, nodes locally sensed the two regions and computed any state transition. The experiments also compared the decentralized algorithm with another innetwork aggregation algorithm (TAG). TAG is commonly used for information aggregation, based on a tree-structure [7]. Essentially, the TAG aggregates node states of group IV nodes up to the root of the tree (the sink node), where a relationship change can be computed in a centralized way. The new topological relationship is then computed and sent back to geosensor nodes along the tree for updating their local knowledge. Based on the computational framework proposed in this paper, an improved version of this aggregation algorithm is also implemented (denoted as TAG-change). The TAG-change detects relationship changes at the sink node and only broadcasts the new topological relationship back to the nodes once a relationship change has been detected. 6.1
Results
Figure 7 illustrates the comparison of three algorithms where the communication cost of each algorithm is measured at each time step and a sample data is selected from 250 simulation results. The X axis denotes every time step and is labeled by a topological relationship at the time and the Y axis is the total number of messages transmitted in the network. As shown in the Figure 7, the TAG-change algorithm consistently outperforms the TAG algorithm; the spikes in the response curve of the TAG-change algorithm correspond to detection of a relationship change in the network. The decentralized algorithm significantly outperforms TAG; and generally performs better than TAG-change over the most time steps. However, in some cases where no relationship change is detected, the decentralized algorithm can consume more messages for inferring relationship changes than TAG-change. Since the three algorithms required the same O(|N |) “initializing” messages for computing node states at each time step, measuring other types of messages becomes a key factor for comparing their efficiency. Figure 8 gives the overall number of messages for inferring relationship changes where “initializing” messages are excluded. In order to compare their scalability, experiments varies the network density from 400 up to 1200 geosensor nodes in the network. Figure 8 shows the overall number of messages for three algorithms (two of them using the UDG) linearly increase as the network density. Compared with TAG, both the decentralized algorithm and TAG-change increase more slowly and consume fewer messages. The overall number of message for the decentralized algorithm is less than that of TAG-change. At a network density of 400, the
Decentralized Reasoning about Gradual Changes
1800
143
TAG TAG - change
1600
# of message
Decentralized 1400 1200 1000 800 600 400 OOB I I I I I BOVCVOVOVOVCCCVCCVOOOBO I BOB I I I I O
Topological relationship at every ten steps
x1000
# of messages
Fig. 7. The number of messages for three algorithms measured at each time step
450
Decentralized-UDG
400
TAG- change-UDG
350
Decentralized
300
TAG-change TAG
250 200 150 100 50 0 400
600
800
1000
1200
Network density(number of nodes) Fig. 8. The overall number of messages for inferring relationship changes excluding the “initializing” messages
TAG-change saves a significant amount of communication cost compared to the TAG, but consumes double amounts of messages than the decentralized algorithm. The overall performance of the decentralized algorithm outperforms the other two. A power regression analysis of experiment results for the decentralized algorithm shows that the regression curve achieves very high goodness of fit (y = 159.03x0.8328, R2 = 0.9597). It supports the theoretical analysis of the overall communication complexity above being of order O(|V |x ) where 0.5 ≤ x < 1.0. Further, the experiment evaluates and compares the decentralized algorithm and
144
L.-J. Guan and M. Duckham
the TAG-change using the UDG as the underlying communication graph. The result has shown that both of them can successfully infer relationship changes in the network. However, it may be observed that the decentralized algorithm using UDG still performs better than TAG-change using the UDG. As discussed in Section 5.2, load balance is another important measure of efficiency in a geosensor network. Because both TAG and TAG-change rely on a tree structure for detecting relationship changes in the network, they should have the same load balance. Thus, only the load balances of the decentralized algorithm and the TAG-change are evaluated in the experiment. A sample result for about 600 nodes is shown in Figure 9, where the X axis indicates the total number of messages sent by a node over 400 time steps; the Y axis shows the frequency of nodes sending that number of messages. The number of nodes for the TAG-change is labeled in the figure.
334
The number of nodes
350
Decentralized
300
TAG-change
250 201
200 150 100 50 0
0 0
0
19 14 9 2 1 1 17 2 5 1 0 0 0 0 0 0 02 0 0 0 0 0 1
600 1200 1800 2400 3000 3600 4200 4800 5400
# of messages Fig. 9. The load balance for the decentralized algorithm and TAG-change algorithm
Observed from the figure, the decentralized algorithm exhibits a significant improvement on load balance over the same simulation using the TAG-change algorithm. The half of the nodes using the decentralized algorithm transmit only 400 messages in the simulation, with no nodes transmitting more than 800 messages. By contrast, for the TAG-change algorithm, some nodes can transit over 2000 messages, with a few nodes sending around 4000 messages in the simulation. Similar results were obtained from all tested scenarios. Discussion. The experimental results illustrated that the decentralized algorithm is more energy efficient than the TAG and the TAG-change algorithm for inferring relationship changes in the network. Besides, the decentralized algorithm improves the load balance by attenuating the higher loads over the space and time in the network. The conventional TAG algorithm relies on a tree structure, which results in a load unbalance for energy usage in the network. In the
Decentralized Reasoning about Gradual Changes
145
tree based aggregation approach, nodes nearby the sink need to consume more messages to aggregate spatiotemporal information for nodes further from the sink in the network. Consequently, the nodes nearby can deplete their energy more quickly and eventually cause the network to become disconnected. It can also be observed that the UDG always consumes more messages than the DT in terms of overall number of messages for inferring relationship changes in the network. The reason for this is that in the UDG, nodes can have more neighbors within the same network density and communication range, leading to more boundary nodes participating in the detection of relationship changes. However, computing a planar DT requires nodes to have access to accurate coordinates—a common assumption in the literature, but unrealistic for practical network deployment. This research relaxed this assumption and demonstrated that our computational framework can still operate in situations when no accurate location information available in the network.
7
Conclusions and Future Work
The paper has proposed a representational and computational framework for decentralized reasoning about gradual changes in topological relationships between two continuously evolving regions. The framework relies on qualitative information including node states and state transitions. It enables the detection of global, high-level, qualitative relationship changes from local low-level state transitions in a geosensor network. In the paper, an efficient decentralized algorithm is also developed and implemented in a simulation system. The algorithm relies purely on local information about a node’s own state and its immediate (one hop) neighbors’, without centralized control. Its computational efficiency is evaluated and compared with another two in-network aggregate algorithms in the simulation. The results have shown that the decentralized algorithm outperforms the other two in terms of overall messages and load balance. And the experiments also demonstrate that the computational framework and decentralized algorithm can successfully infer relationship changes without any coordinate information for nodes in the network. The future work includes several directions. The computational framework needs to be extended to deal with other types of deformations because each deformation type might yield a different conceptual neighborhood graph [31]. For example, the translation type of deformation enables a gradual change from overlap to equal relationship when spatial regions move continuously over the space and time. The translation type exhibits more complexity and can be regarded as a simultaneous expansion and contraction of a region in the network. The paper assumes that only one region evolves at each time step. Relaxing this assumption to consider the dynamism of two spatial regions at the same time is more desirable for some applications in a geosensor network. Another direction is to consider composite and complex objects when detecting relationship changes. Tracking complex objects and detecting their topological changes have already
146
L.-J. Guan and M. Duckham
been studied by other researchers. Integrating the prior work into the current framework seems necessary to deal with more realistic situations in a geosensor network. Acknowledgments. The authors acknowledge the comments received from Jan Oliver Wallgr¨ un during the visit at the University of Bremen, Germany and Go8/DAAD Scientific Exchange Program for funding this trip. Dr Duckham’s research is supported by an ARC Future Fellowship, FT0990531.
References 1. Stefanidis, A., Nittel, S.: GeoSensor Networks. CRC Press, Boca Raton (2005) 2. Egenhofer, M., Franzosa, R.: Point-set topological spatial relations. International Journal of Geographical Information Systems 5, 161–174 (1991) 3. Randell, D.A., Cui, Z., Cohn, A.G.: A spatial logic based on regions and connection. In: 3rd International Conference on Knowledge Representation and Reasoning, pp. 165–176. Morgan Kaufmann, San Francisco (1992) 4. Clementini, E., Felice, P.D., van Oosterom, P.: A small set of formal topological relationships suitable for end-user interaction. In: Proceedings of the Third International Sym. on Advances in Spatial Databases, pp. 277–295 (1993) 5. Zhao, F., Guibas, L.J.: Wireless Sensor Networks: An Information Processing Approach. Elsevier, Amsterdam (2004) 6. Estrin, D., Govindan, R., Heidemann, J., Kumar, S.: Next century challenges: scalable coordination in sensor networks. In: Proceedings of the 5th International Conference on Mobile Computing and Networking, pp. 263–270. ACM, New York (1999) 7. Madden, S., Franklin, M., Hellerstein, J., Hong, W.: TAG: a tiny aggregation service for ad-hoc sensor networks. In: 5th Annual Symposium on Operating Systems Design and Implementation (OSDI), pp. 1–16 (2002) 8. Hellerstein, J.M., Hong, C.-M., Madden, S., Stanek, K.: Beyond average: Toward sophisticated sensing with queries. In: Zhao, F., Guibas, L.J. (eds.) IPSN 2003. LNCS, vol. 2634, pp. 63–79. Springer, Heidelberg (2003) 9. Klippel, A., Li, R.: The endpoint hypothesis: A topological-cognitive assessment of geographic scale movement patterns. In: Hornsby, K.S., Claramunt, C., Denis, M., Ligozat, G. (eds.) COSIT 2009. LNCS, vol. 5756, pp. 177–194. Springer, Heidelberg (2009) 10. Guibas, L.J.: Sensing, tracking and reasoning with relations. IEEE Signal Processing Magazine 19(2), 73–85 (2002) 11. Worboys, M.F., Duckham, M.: Monitoring qualitative spatiotemporal change for geosensor networks. International Journal of Geographic Information Science 20(10), 1087–1108 (2006) 12. Farah, C., Zhong, C., Worboys, M., Nittel, S.: Detecting topological change using a wireless sensor network. In: Cova, T.J., Miller, H.J., Beard, K., Frank, A.U., Goodchild, M.F. (eds.) GIScience 2008. LNCS, vol. 5266, pp. 55–69. Springer, Heidelberg (2008) 13. Jiang, J., Worboys, M.: Event-based topology for dynamic planar areal objects. International Journal of Geographical Information Science 23(1), 33–60 (2009) 14. Sadeq, M.J.: In network detection of topological change of region with a wireless sensor network. PhD thesis, The University of Melbourne (2009)
Decentralized Reasoning about Gradual Changes
147
15. Jiang, J., Worboys, M., Nittel, S.: Qualitative change detection using sensor networks based on connectivity information. GeoInformatica, 1–24 (2009) (accepted) 16. Shi, M., Winter, S.: Detecting change in snapshot sequences. In: Fabrikant, S.I., Reichenbacher, T., van Kreveld, M., Schlieder, C. (eds.) GIScience 2010. LNCS, vol. 6292, pp. 219–233. Springer, Heidelberg (2010) 17. Liu, H., Schneider, M.: Tracking continuous topology changes of complex moving regions. In: 26th Annual ACM Symp. on Applied Computing, ACM SAC (2011) 18. Jin, G., Nittel, S.: Efficient tracking of 2D objects with spatiotemporal properties in wireless sensor networks. Distributed and Parallel Databases 29(1), 3–30 (2011) 19. Duckham, M., Nussbaum, D., Sack, J.R., Santoro, N.: Efficient, decentralized computation of the topology of spatial regions. IEEE Transactions on Computers 60 (2011), doi:10.1109/TC.2010.177 (in press) 20. Guan, L.J., Duckham, M.: Decentralized computing of topological relationships between heterogeneous regions. In: Lees, B., Laffan, S. (eds.) Proc. 10th International Conference on GeoComputation, Sydney, Australia (2009) 21. Duckham, M., Jeong, M.H., Li, S., Renz, J.: Decentralized querying of topological relations between regions without using localization. In: Agrawal, A.A.D., Mokbel, M., Zhang, P. (eds.) Proc. 18th ACM SIGSPATIAL GIS, pp. 414–417. ACM, New York (2010) 22. Duckham, M., Stell, J., Vasardani, M., Worboys, M.: Qualitative change to 3-valued regions. In: Fabrikant, S.I., Reichenbacher, T., van Kreveld, M., Schlieder, C. (eds.) GIScience 2010. LNCS, vol. 6292, pp. 249–263. Springer, Heidelberg (2010) 23. Egenhofer, M., Sharma, g., Mark, D.: A critical comparison of the 4-intersection and 9-intersection models for spatial relations: Formal analysis. In: McMaster, R., Armstrong, M. (eds.) Autocarto 11, pp. 1–11 (1993) 24. Egenhofer, M.J., Franzosa, R.D.: On the equivalence of topological relations. International Journal of Geographical Information Systems 9(2), 133–152 (1995) 25. Rosenfeld, A.: Digital topology. The American Mathematical Monthly 86(8), 621–630 (1979) 26. Kong, T.Y., Rosenfeld, A.: Digital topology: introduction and survey. Comput. Vision Graph. Image Process. 48(3), 357–393 71400 (1989) 27. Egenhofer, M.J., Sharma, J.: Topological relations between regions in R2 and Z 2 . In: Abel, D., Ooi, B.C. (eds.) SSD 1993. LNCS, vol. 692, pp. 316–336. Springer, Heidelberg (1993) 28. Winter, S.: Topological relations between discrete regions. In: Egenhofer, M., Herring, J. (eds.) SSD 1995. LNCS, vol. 951, pp. 310–327. Springer, Heidelberg (1995) 29. Galton, A.: Continuous change in spatial regions. In: Spatial Information Theory A Theoretical Basis for GIS, pp. 1–13. Springer, Berlin (1997) 30. Galton, A.: Continuous motion in discrete space. In: Principles of Knowledge Representation and Reasoning: Proceedings of the Seventh International Conference, pp. 26–37. Morgan Kaufmann Publishers, San Francisco (2000) 31. Egenhofer, M.: The family of conceptual neighborhood graphs for region-region relations. In: Fabrikant, S., Reichenbacher, T., van Kreveld, M., Schlieder, C. (eds.) GIScience 2010. LNCS, vol. 6292, pp. 42–55. Springer, Heidelberg (2010) 32. Egenhofer, M., Al-Taha, K.: Reasoning about gradual changes of topological relationships. In: Frank, A., Campari, I., Formentini, U. (eds.) GIS 1992. LNCS, vol. 639, pp. 196–219. Springer, Heidelberg (1992) 33. Santoro, N.: Design and Analysis of Distributed Algorithms. Wiley Series on Parallel and Distributed Computing. Wiley-Interscience, Hoboken (2006) 34. Mandelbrot, B.: Fractals, Form, Chance and Dimension. Freeman, San Francisco (1977)
Spatio-temporal Evolution as Bigraph Dynamics John Stell1 , G´eraldine Del Mondo2 , Remy Thibaud2 , and Christophe Claramunt2 1
School of Computing, University of Leeds, U.K.
[email protected] 2 Naval Academy Research Institute, Brest, France {geraldine.del mondo,remy.thibaud,christophe.claramunt}@ecole-navale.fr
Abstract. We present a novel approach to modelling the evolution of spatial entities over time by using bigraphs. We use the links in a bigraph to represent the sharing of a common ancestor and the places in a bigraph to represent spatial nesting as usual. We provide bigraphical reaction rules that are able to model situations such as two crowds of people merging together while still keeping track of the resulting crowd’s historical links. Keywords: spatio-temporal change, bigraphs, filiation.
1
Introduction
The combined modelling of space and time is a well-established aspect of the theory of spatial information [8,12,18,9,3,2]. It also provides particular challenges when dealing with granularity and vagueness [14,4]. Objects can move, cities and countries can retain their identities while changing their boundaries, new entities can be formed from old ones as in the redistribution of parcels of parcels of land or the more rapid change seen as crowds of demonstrators are divided by police and then re-form and take on new activity. Such examples are recorded in systems having purposes as diverse as tracking the delivery of consumer goods in a postal system, the legal record of land ownership, or the surveillance of crowds of people in public demonstrations. Despite all these, and many more examples that could be mentioned, the formal description of spatio-temporal change in a way that suits the needs of information systems is still at an early stage. In the purely spatial case, certain basic systems of spatial relationships have been found useful; the 9-intersection model [5] has acquired the status of a standard and systems for qualitative spatial reasoning [1], including the Region-Connection Calculus, have been very widely studied and applied. The spatial relations modelled in such systems include widely accepted notions such as ‘overlapping’, ‘inside but not touching the boundary’, and ‘disjoint’. In contrast models of spatio-temporal change, while numerous and containing much valuable work, have not reached any consensus about the atomic concepts they need to provide. We can imagine spatio-temporal scenarios between regions, M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 148–167, 2011. c Springer-Verlag Berlin Heidelberg 2011
Spatio-temporal Evolution as Bigraph Dynamics
149
such as one moving to encircle another, or two regions moving further apart to allow a third to pass between them. The most basic scenarios of single regions splitting and merging have been rigorously analysed in [9,15], but it is not clear whether more complex behaviours can be treated in a similar way. In order to study such behaviour it is necessary to have a formal framework that is capable of modelling spatio-temporal change without pre-judging the kinds of higher level events and process that will be significant. This means that we should base our study on primitive concepts that appear to be essential and which can be combined to exhibit a variety of different behaviours. In this paper we propose that structures known as bigraphs provide what is needed. In addition to drawing the attention of the spatial information theory community to this area, we also introduce a novel way of using bigraphs to model relationships between entities in terms of shared ancestry. Bigraphs were introduced by Milner [11] and are so called as they provide a single set of nodes having two distinct kinds of edges between them. The nodes with one kind of edge form a set of trees which allow the nodes to represent spatial nesting. This can model situations such as a person being in a room which is inside a building. The nodes taken together with the other kind of edge constitute a hypergraph where one edge may be incident with a set of nodes (not just one or two). The original motivation for bigraphs uses this hypergraph (called the link graph) as a way of modelling communication between the things represented by the nodes. For example, two nodes representing people might be joined by a link representing their participation in a phone call. In another scenario one of the hyperedges could represent a local area network, and the nodes computers connected by means of it. The applicability of bigraphs to spatial information theory has already been noted in [16] and [7]. In [16] Walton and Worboys make extensive use of bigraphs to model image schemas. Their work proposes bigraph reaction rules to model dynamic schemas and uses bigraph composition to model change in level of detail. The spatial relationships modelled in bigraphs are clearly restricted, as even simple overlapping of spatial entities is excluded. However, the interaction of spatial structure and communication even in this simplified case presents challenges to a fully rigorous analysis, and it is appropriate to ensure the simpler setting is fully understood before proceeding to more elaborate models. There has been some work [13] on bigraphs in which a node may be shared between two distinct containing nodes, but we do not make use of this in the present paper. There are a number of reasons why bigraphs deserve to be studied in the context of spatio-temporal change. One is that they have a sound theoretical basis with a catalogue of results that can be used in any situation to which they are applied. Another is that besides the presence of an explicit spatial component they also come with mechanisms to specify change, that is to specify when and how one bigraph may be modified to another. This is achieved by means of rewrite rules allowing one part of a bigraph to be replaced by another. The
150
J. Stell et al.
process of rewriting is essentially familiar from simplifications such as replacing an instance of x + 0 by x in an algebraic expression such as (2 + 0)y to end up with 2y. The fact that a bigraph can be written as an algebraic expression means that sequences of spatial changes have algebraic counterparts allowing these changes to be analysed in a rigorous way. The main novelty to which we draw attention in the present work is the way in we are able to use the links in a bigraph (the edges in the hypergraph) to model shared ancestry. The idea behind this is explained in terms of relations and hypergraphs in Section 2. Bigraphs are introduced in Section 3 where, as these structures are not widely known in spatial information theory, we provide an expository account of the basic ideas and refer the reader to [11] for more details. In Section 4 we present a scenario of one kind of situation where spatio-temporal modelling is important. Our case study involves crowds of people moving in a city. The ability of bigraphs to model the essential dynamic features of this case study is demonstrated in Section 5 where we give reaction rules for changes in the location and compostion of the crowds. Finally, Section 6 provides conclusions and outlines directions for further work.
2 2.1
Relations and Summaries Filiation
Many formal models proposed for spatio-temporal evolution involve the mathematical concept of a relation between two sets. If the sets are X and Y then a relation from X to Y can be visualized as a set of arrows leading from elements of X to elements of Y . These arrows are subject only to the restriction that given x ∈ X and y ∈ Y there is at most one arrow from x to y. The suitability of this for modelling the most basic features of change is evident if we take X and Y to be sets of entities at two times, the second coming after the first. Considering more times than just two we can use a sequence of sets. In Figure 1 there are four sets of entities and three relations between them. Each set represents a snapshot of the entities at a paticular time, and the relations model links between these entities and the ones present at the previous or next time in the sequence. The nature of the links will depend on the particular scenario being modelled. To give some examples of the possible meaning of such a link we can consider Figure 1 where in relation Q we see that a1 is linked, or related, to both b1 and b2 . This situation of one entity at the earlier time being related to two at the later allows many interpretations. These include a parcel of land divided into two, a mother having a child with her own existence continuing, an island being split into two by rising sea levels, a group of animals separating into two groups, a plant producing an offshoot which develops into a separate individual plant, and so on. The example of the mother and child shows that we need not use exactly the same interpretation for every link. The link between a1 (the mother at the earlier time) and b1 (the mother at the later time) can denote the continuing existence of an entity, while the link from a1 to b2 can denote the earlier entity
Spatio-temporal Evolution as Bigraph Dynamics
R
Q a1 a2 A
S c1
b1 b2 b3 B
151
d1 d2
c2 C
d3 D
Fig. 1. Three relations between four times
giving rise to a separate entity (the child b2 ) at the later stage. We use the term filiation for a link of any kind connecting entities in this way, and this topic has been studied further in [3]. The way in which identity continues in objects that change is a long-standing issue in philosophy [6,17]. However, the existence of a filiation link does not necessarily indicate a continuation of identity. A filiation link from a parent to a child could be regarded as the continuing identity of the family, or with equal validity as the creation of a separate personal identity. The choice between these two would depend on the application domain but would be some additional structure beyond the existence of a filiation link. We do annotate the filiations to show different kinds of behaviour with respect to identity in the case study in Section 4, but in the present work this annotation is not modelled by the operations on bigraphs that we describe. The continuation of identity is important in information systems [8,10]. In [8] Hornsby and Egenhofer study operations for the construction of composite objects based on features of identity which include the creation, continuation and elimination of identity. The incorporation of this type of approach in our use of bigraphs would be an interesting direction for further work. 2.2
Summarizing Evolution
The representation of every known timepoint in the sequence and the filiation links between every successive pair of times is the highest level of detail in the model. For many purposes this level of detail can be unnecessarily complex and a less detailed, or more coarse grained, view is more approriate. In the example involving just four times with sets of entities A, B, C and D illustrated in Figure 1 we might need to summarize the change from the time of A to that of C. The usual way to summarize this change would be to compose the relations Q and R as in Figure 2. In the summary by relation composition we see that c1 has both a1 and a2 as ancestors. The summary however has lost two pieces of information: that c1 and c2 have a common ancestor, and that a1 was linked to two entities between the two times evident in the summary. It is in the nature of a useful summarization technique for information to be lost, but there are practical cases where the fact
152
J. Stell et al.
c1
a1 a2
c2
d1
a1
d2
a2
Q;R
d3 (Q ; R) ; S
Fig. 2. Composite relations
that two entities shared a common ancestor would be something that it would be useful for a summary to maintain. It is possible to define a way of summarizing a composable pair of relations that is different to their composition. The idea is that given relations Q : A → B and R : B → C we can enlarge A to include any entities in B which are not linked to anything in A. In the next definition Q(A) means {b ∈ B | ∃a ∈ A (a Q b)}. Definition 1 (Cumulative Product). Let Q : A → B and R : B → C be relations, where A and B − Q(A) are disjoint. The cumulative product of Q and R is the relation Q R : A ∪ (B − Q(A)) → C where x ∈ B − Q(A) and x R c, or x Q R c iff x ∈ A and x Q ; R c. Examples of this construction are shown in Figure 3. The assumption that A and B − Q(A) are disjoint in the definition may appear restrictive. However the elements of the sets are not the individuals being modelled in the world, rather they are tokens which can be mapped to the world. This permits distinct tokens to take the same identity, and the issue here can be understood more fully by using an analysis analogous to the idea of support for bigraphs used in [11].
c1
a1 a2
c2
b3 QR
a1
d1
a2
d2
b3
d3 (Q R) S
Fig. 3. Examples of the Cumulative Product
Conceptually the cumulative product is quite distinct from composition. The composition of relations describes a process of entities changing from past to present. The cumulative product models the state in the present, looking back.
Spatio-temporal Evolution as Bigraph Dynamics
153
This suggests the idea of a map which shows the present state of the world but also contains evidence of past history indicating how the present state arose through accumulating changes. For this reason we sometimes refer to accumulation instead of the cumulative product. Note that if relations Q, R, S are composable so that Q ; R ; S is defined then we may form the accumulation (Q R) S but not, in general, the accumulation Q (R S). This is because the enlarged domain of R S by the addition of elements not present in R means that the co-domain of Q may not match the domain of R S. This behaviour is as one would expect given the way that accumulation is inherently directional, building on the past. We can visualize a sequence of relations as follows with relations between times. X0
R1 X1
R2 X2
...
Xn−1
t0
t1
t2
...
tn−1
Rn -
Xn tn
For accumulation, the picture naturally places the relations above the times. The relation Ai describes the entities at time ti and (some of) their past history. A1 = R 1 A2 = A1 R2 . . .An−1 = An−2 Rn−1 ... t0 2.3
t1
t2
...
An = An−1 Rn tn−1
tn
Hypergraphs
A hypergraph can represent a view in which the entities in the present are nodes bearing additional structure (edges) which represent the past state and the way the past has become the present. We explain hypergraphs below, and then show how accumulation is seen as an operation describing the change from one time to the next. A hypergraph is essentially a generalization of the notion of graph in which an edge may be incident with an arbitrary number of nodes and not just one or two. A relation R ⊆ X × Y is really just a hypergraph in disguise: the elements of X being the edges, the elements of Y being the nodes, and x R y holding iff node y is incident with edge x. Definition 2. A Hypergraph H consists of sets VH of vertices and EH of edges (where VH ∩ EH = ∅), and an incidence relation iH : EH → P(VH ).
154
J. Stell et al.
A hypergraph differs from an undirected graph in that an edge may be incident with an arbitrary set of nodes and not just one or two. Note that we do allow edges incident with the empty set of nodes, ∅. A hypergraph with edges E and vertices V is the same as a relation from E to V ; an edge is related to the set of vertices with which it is incident. This is illustrated in Figure 4 for the relations Q and R from Figure 1. In the figure the hyperedges appear as loops enclosing their incident nodes.
a1
Q
a2 a1 b1
b2
b3
a2
R b1 b2 b3
c1
b1
c2
Q
b2
c1
b3
c2
R Fig. 4. Relations Q and R from Figure 1 as hypergraphs
The accumulation of two relations can be described in terms of hypergraphs. Definition 3. Let G and H be hypergraphs where VG = EH . We define the hypergraph G H to have vertices VH , and edges EG ∪ {v ∈ VG | i−1 G (v) = ∅}. The incidence relation j is given by {iH (v) ⊆ VH | v ∈ iG (x)} if x ∈ EG j(x) = iH (x) if x ∈ {v ∈ VG | i−1 G (v) = ∅}
3
Bigraphs: Static Aspects
In the previous section we considered entities subject to change, but without modelling any spatial relationships between these entities. If we introduce spatial structure in addition to the links representing shared ancestry between nodes then we have essentially the bigraphs introduced by Milner [11]. 3.1
Bare Bigraphs
To illustrate the basic features of bigraphs we continue with the relations Q : {a1 , a2 } → {b1 , b2 , b3 } and R : {b1 , b2 , b3 } → {c1 , c2 } used in the earlier examples. These provide us with bare bigraphs which are a simple case of the general notion of a bigraph which has interfaces so that it can be combined with other bigraphs as described in section 3.2 below. If we assume that all the entities involved (a1 , a2 , . . .) are spatially disjoint we arrive at Figure 5. This figure shows the usual means of depicting bigraphs with the spatial entities shown as discs in the plane and the links connecting them drawn as lines attached to the discs. This differs from the more usual
Spatio-temporal Evolution as Bigraph Dynamics
155
way of drawing an edge in a hypergraph as a boundary containing those nodes with which it is incident. We used this edge-as-container vizualization in earlier figures with hypergraphs, but this only works when the nodes do not have a spatial extent.
Bare bigraph for Q bare bigraph for Q R (assuming {b1 , b2 , b3 , c1 , c2 } spatially disjoint)
Bare bigraph with nesting
Fig. 5. Examples of bare bigraphs
The examples in Figure 5 of the bigraphs for the relations Q and Q R are particularly simple in that the nodes are spatially disjoint. In general nodes may be nested with each other, as indicated in the example at the right of Figure 5. The place structure (that is the nesting of nodes) is independent of the link structure (that is the edges of the hypergraph part of the bigraph). This means that although it is significant when nodes are drawn inside other nodes, there is no significance attached to where the links cross the boundaries of nodes The bare bigraph for Q shown in Figure 5 has one edge (a2 ) that is incident with just one node. This has been drawn in the diagram as a link which has been terminated, not linking the node to anything. This follows Milner’s diagrams [11] but in other contexts such edges are often drawn as loops with both ends attached to the incident node. 3.2
Substitution
Bare bigraphs display the key features of linking and placing, but an important aspect of the theory of bigraphs is the way that they can be combined with each other. By means of these combinations, complex bigraphs can be constructed out of simpler components, and there are two kinds of composition which enable this. To define these compositions a bigraph needs to contain not only nodes connected by links and by place connections (nestings) but also to have additional machinery to allow substitution. This means that a bigraph can be inserted into a larger context and it can also act as a context into which more detail is inserted. It may be helpful to give an analogy with simple algebra in which letters stand for numbers. A complicated expression such as (3x + 2y)2 + 2(3x + 2y) + 1 can be built out of the simpler expression z 2 + 2z + 1 by replacing z by (3x + 2y).
156
J. Stell et al.
Informally, the z in z 2 + 2z + 1 acts as a ‘hole’ which can be ‘filled in’ by the expression (3x + 2y). General bigraphs may contain ‘holes’ of two types called sites and inner names into which ‘fillers’ called respectively roots and outer names may be placed. These additional features are illustrated in Figure 6. The ability to compose bigraphs makes them morphisms in a category where the objects (known as interfaces) are pairs m, X where m is essentially the number of place holes and X is a finite set of names. The number m is treated as a finite ordinal, that is a natural number viewed as a set of smaller ordinals m = {0, 1, . . . , m − 1}. x1 outer name
x2
x3 x2
x1
0 G
root site
x3
0
0 y2
inner name outer name root site
1
y1 y2
0
y1
0 0
1
H
Fig. 6. Bigraphs G : 1, {y1 , y2 } → 1, {x1 , x2 , x3 } and H : 2, ∅ → 1, {y1 , y2 } and the composite G ◦ H
Although the operations ◦ on bigraphs and ; on relations are both called ‘composition’ they are unrelated. Relations between spatially nested entities can be modelled by bigraphs and the composition of relations can thus be modelled by an operation on bigraphs. However this operation would only be defined under conditions that would be very different from the conditions under which ◦ is defined and the two operations are quite separate. 3.3
Ports
Bigraphs also provide a set K of types for nodes, called a signature. Each k ∈ K has an arity, which gives the number of ports through which attachments to a link (hyperedge) may be made for nodes of type k. These ports are shown as black discs in the diagrams. For example, in the examples we provide later we use different types of node for buildings, suburbs and crowds. In this setting the arity of a particular type of crowd is the number of instances of ‘original crowds’ whose members are present in it. It should be noted that the formal definition of bigraphs [11, p15] allows a link to be connected to the same node by a number of
Spatio-temporal Evolution as Bigraph Dynamics
157
different ports. This means that the link structure is actually more general than a hypergraph as defined above since each edge may be incident with a multiset (or bag) of nodes and not just a set. 3.4
Tensor Product and Derived Operations
Besides the operation of composition, ◦, bigraphs also support an operation ⊗ known as the tensor product. This is easy to visualize and corresponds to placing bigraphs with disjoint names alongside each other aligned horizontally. Further operations that we use in formulating the rules later in the paper can be expressed in terms of composition and tensor product. These are the parallel product, ( ), nesting (.), and the merge product (|). A full account of these operations would occupy more space than we have available, and [11] should be consulted for details. Briefly, however, G H is similar to the tensor product except that common outer names are shared. The nesting G . H places H inside G and allows the outer names of H to be visible. The merge product G | H merges roots in addition to sharing links as in the parallel product. 3.5
Modelling Parents and Children
We introduced hypergraphs by showing how edges might represent the sharing of common ancestors between entities. When in addition entities have spatial structure limited to nesting a bigraph can represent both the filiation links and the nesting relationships. The original motivation for bigraphs uses the links to model communication of various types. An individual person can be represented by a node with three ports where we can attach links to (1) their mother, (2) their father, (3) their children. From this example it is clear that the link structure is independent of the spatial structure: who a person is related to has no bearing on their location. It should also be noted that although the links have no specified direction, we can make use of the signature to use particular ports in particular ways. By this means we can tell for example that a link from port 3 of node a to port 1 of node b means that a is a child of b. The simple notion of links to represent ancestry can be used also in other situations where there has been some transmission of material, such as one might want to observe in a communication between suspected terrorists in a surveillance operation.
4
Case study
We now present a case study that involves crowds of people that move in and between suburbs of a city and where the crowds can split and merge over time. Figure 7 shows a portion of the city where the action takes place. This example reflects the evolution of groups of people in a city during a demonstration. We
158
J. Stell et al.
Q2
B
e5
e4
e1
e3 A
Q1
e2 Boundary between suburbs
Fig. 7. Two suburbs Q1 and Q2 and their areas and specific buildings. The dashed line shows the boundary between suburbs.
assume that the entities to be modelled are groups of people, and that the identity and filiations of an entity are determined by the people that compose this group. 4.1
Overall Scenario
We consider the following four types of group which vary according to their behaviour and the distinction between these types would be significant in a surveillance operation. Different instances of these types arise for different values of i. • • • •
Ci : Demonstrators Wi : Pedestrians not involved in the demonstration Oi : Observers Gi : Unidentified people
Spatio-temporal Evolution as Bigraph Dynamics X
δ
Y
X
δ
Z
Y
δ
Z (a)
δ
159
δ
(b)
T
X
γ
X
(c)
Fig. 8. Filiations: (a) Derivation: persons from Z and X join and create Y , (b) Derivation: some persons from Z and all persons from X create Y , the rest of the people from Z create T (c) Continuation: X remains the same
We can record the filiation using the technique presented in [3] which distinguishes between derivation and continuation. The case of continuation models the preservation of identity (such as an individual persisting throughout their life), and derivation models a new entity which depends in some manner on an earlier but distinct entity (for example a child could be modelled as a derived entity from each of their parents). We assume that filiation in the present scenario is determined as follows: (1) If one or more people leave a group X and join another group Z, and/or if person(s) from another group Y join a group X, then there are filiation links of type derivation δ between X and Y , and X and Z. (Figure 8(a) and 8(b)) (2) Between two times, if a group X remains the same without any addition/ deletion of people, it is considered in continuation relation γ. (Figure 8(c)) In the course of the demonstration, the different groups move around in the city. We suppose that the largest spatial unit we consider in the city is a suburb, and that there are two of these: Q1 and Q2 (Figure 7). Within suburbs there are areas determined by the street pattern and within some of these areas particular buildings have been identified as significant. • Q1 contains three areas (e1 , e2 and e3 ) and a building A located in e3 . • Q2 contains two areas (e4 and e5 ) and a building B located at e4 . 4.2
Filiation Relations
Groups of people can may combine with each other, and they may divide into pieces. For example, the filiation relations shown in Figure 9 shows that a part of group C2 of demonstrators joins the group C1 to become G1 . Groups are renamed when we consider that there is a change of their identity. Between the four times t1 – t4 there are three relations H1 – H3 . Using the cumulative product as described in section 2 we can compute H1 H2 , and this is illustrated in Figure 10. As the group G4 appears only at time t2 and is not present at t1 the cumulative product is able to record the fact that G5 and G9 have a common ancestor group. If we use the conventional composition H1 ; H2 then this information, which could well be significant in a surveillance application, would not be available.
160
J. Stell et al.
C1 C2 W1 W2
δ δ δ
G1
γ
G1
G2
δ
G5
δ
G3
γ
W2
δ δ
γ
t1
G3 G6
γ
G9
γ
t3
H3
G7 G8 G6
δ
O1 O2
δ δ δ δ
δ
G4 γ H1
G9
δ O2
t2
H2
t4
Fig. 9. Filiation relations H1 – H3 between times t1 – t4 C1
G1
C2
G5
W1
G3
W2
G6
O1
G9
O2 G4 Fig. 10. Cumulative product H1 H2 showing that common ancestry of G9 and G5 is recorded whereas in H1 ; H2 it is forgotten
4.3
Bigraph Modelling
Figure 11 represents the state of the region of study for t1 and t4 with groups defined in Figure 9. This spatial information, togther with the filiations, is translated into the bigraph setting in Figures 12 and 13. In the first of these figures we provide the bigraph for the whole spatial area by presenting it as composite S ◦ K1 . This demonstrates another valuable feature of bigraphs as a spatial modelling tool: their ability to deal with spatial granularity. The bigraph S represents a low level of detail in which the two suburbs are distinguished but nothing is said about what may be found within them. The action of adding this detail and passing to the fully detailed description S ◦ K1 corresponds precisely to the operation of composition for bigraphs.
Spatio-temporal Evolution as Bigraph Dynamics
161
Fig. 11. Location of entities at t1 (left) and t4 (right)
As the change only affects a level of detail more specific than that modelled by S, we are able to show the changes at times t2 – t4 by just showing the bigraph which is composed with S. The three bigraphs we need are given in Figure 13. In these a link between two groups appears if there is a filiation between their ancestors. For example, at time t3 , there is a filiation link between G9 and G5 because each contains some part of G4 . Similarly, the link between G1 and G2 at t2 leads to a link between G1 and G5 at t3 . Here G1 remains the same between these two times, and G2 is only changed by the addition of people. We have derived the bigraphs in Figures 12 and 13 by using the filiation data from Figure 9 and adding hypothetical spatial information. This approach means that we have treated each bigraph as a static snapshot of the situation at a given time, albeit a snapshot that contains some additional information about the ancestry of the groups present. This is certainly useful, but the full power of bigraphs only becomes apparent once we include rules as part of our modelling that specify how a bigraph at one stage may evolve into one at a subsequent stage. The introduction of rules is also significant in that by permitting only changes that are possible by given rules we can enforce integrity constraints in the model and ensure that semantically invalid changes are prohibited, such as moving one suburb inside another. In the next section we show how such rules can be introduced.
5 5.1
Bigraphs: Dynamic Aspects Rewriting and Composition
We have seen that bigraphs represent spatial nesting and links. These links may be given several different interpretations, including channels of communication and records of communication having ocurred in the past. Both the place graph and the link graph can be subject to change, and in general these two features
162
J. Stell et al.
Q1
S
O1
K1
e2
e1
C1
Q2
e4
e3
W1
O2
B
A
e5
C2
W2 Fig. 12. Entire bigraph at t1 is the Composite S ◦ K1
G4
K2
e2
e1
A
W2
e2
e1 G6
G1
B G2
e4
e3
e5 G5
A
G3
e5 O2
G1
G3
K3
e4
e3
B
G9
e1
K4
G9
e2
e4
e3 A
e5 G8
B G6
Fig. 13. Bigraphs K2 , K3 , K4 corresponding to times t2 , t3 , t4
G7
Spatio-temporal Evolution as Bigraph Dynamics
163
can change independently. To understand the mechanism of reaction rules, as they are called in the bigraph context, it may be helpful to consider the basic algebraic idea of rewrite rules. An equation −(−x) = x may be seen as a rewrite rule −(−x) x allowing the left hand side to be replaced by the right hand side. Such a rule can be used in a context larger than the left hand side, allowing for example 42 + (−(−x)) to be replaced by 42 + x. The rule can also be used when some expression is substituted for x, for example allowing −(−(y + 3)) to be rewritten to y + 3. In expressing spatial dynamics with bigraphs a particular kind of change will consist of replacing one bigraph by another, just as we can replace −(−x) by x in the above example. The same kind of change may take place in many different contexts. In bigraphs this corresponds to the fact that if H K then G ◦ H G ◦ K (assuming the composition is defined). Also the same kind of change may be made more specific in many different ways, just as essentially the same change is happening in rewriting −(−(y + 3)) to y + 3 as in rewriting −(−(2y + 2)) to 2y + 2. This situation correponds to the fact that for bigraphs if H K then H ◦ G K ◦ G again assuming the composition is defined. 5.2
Rules for the Case Study
Here we present rules that enable the features of the case study to be modelled. To extract the most significant features of the case-study, we assume that there are just crowds of people without distinguishing different types. This restriction could be lifted by introducing a more elaborate signature, but would not involve any essentially different features of rules.
x1 x2 . . . xm y1 y2 . . . yn
x1 x2 . . . xm
0
0
0
0 P
PQ
Fig. 14. The discrete ions mx and (m + n)xy
For each crowd we are interested in what earlier groupings constitute the crowd. To formulate the rules we need to introduce some additional technicalities that were not necessary to convey the main ideas of the case study in the previous section. Milner [11, p30] uses the term discrete ion for a bigraph having a single node containing the single site of the bigraph, where also there are no inner names and the outer names are linked bijectively to the ports of the single node. Our signature has one type of node for each possible arity, where each port is
164
J. Stell et al.
x1 x2 . . . xm
x1 x2 . . . xm 0
0
1
0
P
0
% P
1
% Enter:
B | mx B . (id1 | mx )
x1 x2 . . . xm
x1 x2 . . . xm 0
% 0
1
P
0
1 P
0 %
Leave:
B . (id1 | mx ) B | mx
Fig. 15. Entering and leaving a building
capable of modelling a particular ancestor crowd. Thus a crowd constituted from three earlier ones needs three ports. This is illustrated in Figure 14 showing a discrete ion of arity m and type m. When the inner names are x1 , x2 , . . . , xm we denote the ion by mx . To model the merging of two crowds, given for example by nodes of types mx and ny , we need to refer to a node having arity m + n with inner names x1 , x2 , . . . , xm , y1 , y2 , . . . , yn . We denote the type of such a node by (m + n)xy . In addition to the crowds, we need to model the buildings, areas and suburbs introduced in the case study. For these we use nodes of arity 0, since we do not model the historical development of these entities. The signature includes types A for areas, B for buildings and S for subsurbs. As a first example, the rules need to permit a crowd to enter, say, a building and to leave the building. This is straightforward, and is illustrated in Figure 15. The idea of entering or leaving a building is already well-known and appears as one of the motivating examples in [11]. However, the rules we present in Figures 16, and 17 represent a novel use of bigraphs in their use of links to model shared ancestry. The two rules shown in Figure 16 allow one crowd to divide into two, and allow two crowds to merge into one. These can be used in cases such as the splitting of G4 into G5 and G9 in our case study.
Spatio-temporal Evolution as Bigraph Dynamics
x1 x2 . . . xm
x1 x2 . . . xm 0
0 0
1
0 P
Split:
P
1
P
mx ◦ join mx | mx
x1 x2 . . . xm y1 y2 . . . yn
x1 x2 . . . xm y1 y2 . . . yn
0
0 0
0
1 P
1
Q Merge:
PQ mx | ny (m+n)xy ◦ join
Fig. 16. Crowd Split and Merge Rules
y1 y2 . . . yn
x1 x2 . . . xm 0
x1 x2 . . . xm
y1 y2 . . . yn
0 1
0
Q Capture:
0
2 P
2 1 P Q
P
mx . (id1 | ny | id1 ) mx . ((ny . (mx | id1 )) | id1 )
x 1 . . . x m z1 . . . zp y 1 . . . y n 0
x1 . . . xm z1 . . . zp y1 . . . yn 0
0
S
Release:
1
2 Q
0 P
1 Q
2 PS
mx . ((ny . (pz | id1 )) | id1 ) (m+p)xz . (id1 | ny | id1 ) Fig. 17. Crowd Capture and Release Rules
165
166
J. Stell et al.
The rules shown in Figure 17 provide additional capabilities. These permit one crowd to surround another but to remain distinct. This could arise when a group of police surrounds a small crowd of deomstrators and forces them to move to another location, keeping them surrounded while moving. Although this behaviour is not illustrated by the case study, we include these rules as evidence of the power of bigraph rules to model more elaborate kinds of change.
6
Conclusions and Further Work
We have given an expository account of the basic features of bigraphs and we have shown how a novel interpretation of the communication links as shared ancestry can be incorporated into models of spatio-temporal change. This interpretation is based on a way of combining relations, the cumulative product, that has advantages over the conventional composition operation. While this product is unlikely to be mathematically novel, we are not aware that it has been used before in the context of monitoring change in applications such as our case study. We have formulated bigraph reaction rules which can be used to model the splitting and merging of crowds of people and we have given further rules that model more elaborate behaviour including one group surrounding another so as to contain it. Further work is necessary to analyse the theoretical properties of particular systems of rules. This could establish what kinds of spatio-temporal change are possible from particular rules. There will be close connections between the behaviour of the split and merge rules for bigraphs and the splitting and merging studied in [9,15]. There are also many possible application problems for spatiotemporal analysis described in the literature cited in the introduction. Further evaluation of the value of bigraphs needs to take place using some of these problems. However, the evidence we have presented here demonstrates that bigraphs have several capabilities that are valuable in the modelling of spatio-temporal change.
References 1. Cohn, A.G., Renz, J.: Qualitative spatial representation and reasoning. In: van Harmelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation, pp. 551–596. Elsevier, Amsterdam (2008) 2. Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. John Wiley & Sons, Inc., Chichester (2011) 3. Del Mondo, G., Stell, J., Claramunt, C., Thibaud, R.: A graph model for spatiotemporal evolution. Journal of Universal Computer Science 16(11), 1452–1477 (2010) 4. Duckham, M., Stell, J., Vasardani, M., Worboys, M.: Qualitative change to 3-valued regions. In: Fabrikant, S., Reichenbacher, T., van Kreveld, M., Schlieder, C. (eds.) GIScience 2010. LNCS, vol. 6292, pp. 249–263. Springer, Heidelberg (2010) 5. Egenhofer, M.J., Franzosa, R.: Point-set topological spatial relations. International Journal of Geographical Information Systems 5, 161–174 (1991)
Spatio-temporal Evolution as Bigraph Dynamics
167
6. Gallois, A.: Occasions of Identity. Clarendon Press, Oxford (1998) 7. Giudice, N.A., Walton, L.A., Worboys, M.: The informatics of indoor and outdoor space: A reasearch agenda. In: Winter, S., Jensen, C.S., Li, K.J. (eds.) Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Indoor Spatial Awareness (ISA 2010), pp. 47–53. ACM Press, New York (2010) 8. Hornsby, K., Egenhofer, M.: Identity-based change: a foundation for spatiotemporal knowledge representation. International Journal of Geographical Information Science 14, 207–224 (2000) 9. Jiang, J., Worboys, M.: Event-based topology for dynamic planar areal objects. International Journal of Geographical Information Science 23(1), 33–60 (2009) 10. Medak, D.: Lifestyles - an algebraic approach to change in identity. In: B¨ ohlen, M.H., Jensen, C.S., Scholl, M.O. (eds.) STDBM 1999. LNCS, vol. 1678, pp. 19–38. Springer, Heidelberg (1999) 11. Milner, R.: The Space and Motion of Communicating Agents. Cambridge University Press, Cambridge (2009) 12. O’Sullivan, D.: Geographical information science: time changes everything. Progress in Human Geography 29(6), 749–756 (2005) 13. Sevegnani, M., Calder, M.: Bigraphs with sharing. Tech. rep., University of Glasgow, Department of Computing Science (2010) 14. Stell, J.: Granularity in change over time. In: Duckham, M., Goodchild, M., Worboys, M. (eds.) Foundations of Geographic Information Science, pp. 95–115. Taylor & Francis, Abington (2003) 15. Stell, J., Worboys, M.: Relations between adjacency trees. Theoretical Computer Science (to appear, 2011), doi:10.1016/j.tcs.2011.04.029 16. Walton, L., Worboys, M.: An algebraic approach to image schemas for geographic space. In: Hornsby, K.S., Claramunt, C., Denis, M., Ligozat, G. (eds.) COSIT 2009. LNCS, vol. 5756, pp. 357–370. Springer, Heidelberg (2009) 17. Williams, C.J.F.: What is Identity? Clarendon Press, Oxford (1989) 18. Worboys, M., Duckham, M.: Monitoring spatiotemporal change for geosensor networks. International Journal of Geographical Information Science 20, 1087–1108 (2006)
On Optimal Arrangements of Binary Sensors Parvin Asadzadeh, Lars Kulik, Egemen Tanin, and Anthony Wirth National ICT Australia (NICTA), Department of Computer Science and Software Engineering, University of Melbourne, Parkville, Victoria 3010, Australia
Abstract. A large range of monitoring applications can benefit from binary sensor networks. Binary sensors can detect the presence or absence of a particular target in their sensing regions. They can be used to partition a monitored area and provide localization functionality. If many of these sensors are deployed to monitor an area, the area is partitioned into sub-regions: each sub-region is characterized by the sensors detecting targets within it. We aim to maximize the number of unique, distinguishable sub-regions. Our goal is an optimal placement of both omni-directional and directional static binary sensors. We compute an upper bound on the number of unique sub-regions, which grows quadratically with respect to the number of sensors. In particular, we propose arrangements of sensors within a monitored area whose number of unique sub-regions is asymptotically equivalent to the upper bound.
1
Introduction
Geo-sensor networks generate large interest from researchers in spatial information science. They are used to detect, monitor and track continuous environmental phenomena such as toxic plumes or oil spills in seas. Although sensor nodes are usually considered to be inexpensive, large deployments still incur significant costs. In addition, even if the number of sensors is small, the cost of the actual deployment of sensors might still be significant. Thus, it is an important strategy to minimize the costs by identifying optimal arrangements of sensor nodes to cover a monitored area. Our work will analyze an important subclass of geo-sensor networks, those that use binary sensors to detect the presence of a phenomenon. Examples of binary sensors include motion sensors that can detect the presence of movement, RFID (Radio Frequency IDentification) readers that can detect the presence of RFID tags [13] and binary chemical sensors that can detect the presence of chemical compounds in their fields. Binary sensors can be divided into omni-directional and directional binary sensors. Omni-directional binary sensors can detect the presence of a phenomenon from any direction within a specific distance; whereas, directional binary sensors have a limited range and can only determine the presence of a phenomenon within a sector. Generally, binary sensors are beneficial for movement tracking. RFID antennas, for example, can be used in design of gesture based user interfaces, in which movement of RFID-tagged user hands are detected to enable natural user interaction with computing devices. Moreover, directional RFID M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 168–187, 2011. © Springer-Verlag Berlin Heidelberg 2011
On Optimal Arrangements of Binary Sensors
169
antennas can be installed in a museum to track people carrying RFID-enabled devices to enable personalized recommendations of further exhibits. We propose optimal arrangements for both omni-directional and directional binary sensors. The goal of optimal arrangements of binary sensors is to provide the desired accuracy for a large class of applications with reduced cost. A number of approaches have suggested the use of binary sensor networks to track phenomena or targets in an area [11][1][12][3]. In general, each binary sensor can only return information regarding a target’s presence or absence within its sensing region. However, one positive detection of a target greatly confines its possible locations, since a positive detection indicates that the target is in the space defined by the sensing region of that sensor. In binary sensor networks, the results of all sensor detections can be combined to provide a more accurate estimation of the whereabouts of a target at any given time. In such a case, the monitored area is divided into multiple sub-regions so that each sub-region is in the sensing regions of a particular set of sensors. We refer to the technique of using binary sensors to partition a space as space partitioning. Figure 1(a) shows partitioning of a monitored area into eight sub-regions r1 to r8 , using three omni-directional sensors S1 , S2 and S3 . These sub-regions are distinguishable in the sense that each sub-region is covered by sensing regions of a different set of sensors (Figure 1(b)).
Fig. 1. Sample space partitioning using three omni-directional sensors
In most tracking applications, the sensors are scattered randomly with uniform distribution over a two-dimensional planar monitored area. Shrivastava et al. [11] show that for a fixed sensing radius, the accuracy improves linearly with an increasing sensor density. Furthemore, for a fixed number of sensors, the accuracy improves linearly with an increase in the sensing radius because an increase in the sensing radius leads to a finer geometric partition of the field. Space partitioning using binary sensors can also be used to localize stationary targets in many indoor localization applications. For example, an RFID system could be installed in a library to detect misplacement of books [4]. Medication supply rooms in a hospital can also be equipped with binary sensors to provide
170
P. Asadzadeh et al.
efficiency and security by reducing staff time and frustration in finding what they need faster, and eliminating drug lost or misplacement. An important goal is to minimize the number of sensors in the monitored area to reduce cost while providing the required accuracy. Therefore, it is essential to find an optimal arrangement of sensors. However, space partitioning does not guarantee that every two sub-regions can be uniquely identified. In Figure 2, for example, the two sub-regions of r1 and r2 are in sensing region of the same sensor, i.e., S3 . Therefore, when a target is detected by sensor S3 only, it could be in either of the two sub-regions r1 or r2 . However, we cannot determine in which of the two sub-regions the target is located. We call the set of created sub-regions that are all distinguishable from each other, unique sub-regions.
Fig. 2. Sample space partitioning using three omni-directional sensors
Every localization application needs a particular level of accuracy, which determines the resolution by which the monitored area must be partitioned. More precisely, the localization accuracy is limited by the maximum diameter of the created sub-regions. Generally, the number of created sub-regions can be used as an approximate measure to determine the localization accuracy. Since not all sub-regions are distinguihable in target localization, our goal is to maximize the number of unique sub-regions in contrast to the total number of sub-regions. Our work will (i) provide an upper bound on the number of unique subregions a monitored area can be divided into, given a specific number of static binary omni-directional or directional sensors (Sections 5.1 and 6.2). (ii) propose an arrangement of sensors which creates the number of unique sub-regions that is asymptotically equivalent to the calculated upper bound for both omnidirectional and directional sensors (Sections 5.2 and 6.3). In our proofs, we assume that the object position can be represented as a point and any value can be assigned to the range and angles of binary sensors. This outcome gives researchers an insight into how many sub-regions can be created using a specific number of sensors as well as the number of sensors required to achieve a certain accuracy.
2
Related Work
Space partitioning using binary sensors have been successfully deployed in a range of indoor applications. Although there are different variants of binary
On Optimal Arrangements of Binary Sensors
171
sensors, they all sense a target’s presence using a physical phenomenon within limited range. Murakita et al. [9] have developed a human tracking system, in which floor blocks are fitted with binary pressure sensors. In the Active Badge Location System [14], a network of infrared sensors are placed around a building and detect signals from badges worn by people in order to find the region users are currently located. Space partitioning has also been successfully applied in many indoor positioning systems using omni-directional RFID readers as binary sensors. In the study by [5], a table surface is equipped with an array of omni-directional RFID antennas and is hence divided into many distinguishable sub-regions. When an object tagged with multiple RFID tags is placed on the table surface, the location of each individual tag is determined by the sub-region containing the tag. Similarly, Bouet and Pujolle [2] as well as Reza and Geok [10] deploy a grid of RFID reader antennas on the floor and ceiling of a building to track RFID tagged objects within that building. All approaches proposed so far equip the entire monitored area with many low-range omni-directional antennas simply such that the sensing regions of immediate neighbor sensors overlap. However, a comprehensive study that investigates the maximum number of unique sub-regions or finds optimal arrangements of sensors in terms of localization accuracy is still outstanding. Moreover, there is no study of space partitioning using directional sensors, which provide more focused sensor regions than omni-directional sensors. Mehmood et al. [8] employ another variant of space partitioning technique by deploying passive RFID tags in large numbers covering a deployment space. Each RFID tag has an area in which it can be read, which is approximated as a circular disk. In such a deployment, a partition is defined as a non-empty subregion where a given set of tags can be read by an RFID reader. The location of an agent navigating through the deployment space, is approximated by the closest partition. This approach minimizes the number of used tags for an optimal coverage of space by employing the classical circle covering problem [6]. Kershner [6] has shown that the covering for discs of radius r is optimal when they are placed at the vertices of an equilateral triangular graph overlaying the monitored space. This problem is different to our work because we aim to maximize the number of sub-regions, while in the classical circle covering problem the aim is to cover the whole space with the minimum number of circles.
3
Preliminaries
In this paper, we investigate arrangements of both omni-directional and directional binary sensors. Each omni-directional sensor has a circular sensing region, approximated by a circle in two-dimensional space. Directional sensors, on the other hand, recognize not only the target’s maximal range but also a sector within the circular range around it. Examples of such sensors include cameras, infrared sensors and RFID directional antennas. Generally, the sensing region of
172
P. Asadzadeh et al.
Fig. 3. (a) Approximated sensing region of a directional sensor (b) Sensing region of a directional sensor Si within a circular monitored area
a directional sensor can be approximated with a trapezoid in two-dimensional space (Figure 3(a)). In our proofs, we assume that the object position can be represented as a point and any value can be assigned to the range and angles of binary sensors. We assume that all directional sensors are to be installed on the border of a circular monitored area, as shown in Figure 3(b). It is also assumed that the sensing range of each sensor is longer than the monitored area’s diameter and hence, the sensing region of the sensor within the circle is represented by two circle chords (dashed lines in Figure 3(b)). The following notations are used in lemmas and proofs in the remainder of this paper: n the number of sensors. Si the ith sensor. Ci the corresponding circular region of an omni-directional sensor Si . eil and eir at Si ’s position facing the circle’s center, the left and the right edges of directional sensor Si (shown as two chords within the circle as shown in Figure 3(b)). eix a generic term, to refer to either of the two edges of directional sensor Si . The bounding-arcs of Si two arcs of the circle within the sensing region of directional sensor Si (Figure 3(b)). The end points of Si the intersection points of the sensing region of directional sensor Si with the monitored area (Figure 3(b)).
4
Unique Sub-regions
We denote the set of all created sub-regions by SR. We then define C as a function that assigns every sub-region an n-bit code, i.e., C ∶ SR → {0, 1}n , the k th bit of which is set if a target in sub-region ri is detectable by the k th sensor. Figure 4(b) shows the codes assigned to each sub-region in the sample sensor arrangement shown in Figure 4(a). Sub-region r4 , for example, has the code 001 since a target in this sub-region is detectable by S3 but is not detectable by either of S1 or S2 .
On Optimal Arrangements of Binary Sensors
173
Fig. 4. Sub-region codes in a sample arrangement of three sensors
Two sub-regions ri and rj are equivalent if they have the same codes, i.e., ri ∼ rj ⇐⇒ C(ri ) = C(rj ). Given the mentioned equivalence relation among sub-regions, SR can be divided into different equivalence classes as: [ri ] = {rj ∈ SR ∣ rj ∼ ri } For sensor arrangement in Figure 4(a), for example, there are 23 = 8 equivalence classes: {r1 , r2 }, {r3 , r4 }, {r5 , r6 }, {r7 }, {r8 , r9 , r10 }, {r11 , r12 }, {r13 , r14 } and {r15 }. Definition 1. The selection of one sub-region from each equivalence class of SR, arbitrarily, establishes a set of class representatives. We call the sub-regions in the class representatives of SR, unique sub-regions; all remaining sub-regions are called duplicates. For example, we may choose the set of class representatives to be {r1 , r3 , r5 , r7 , r8 , r11 , r13 , r15 } from the arrangement shown in Figure 4(a). The sub-region r8 is then called unique, but r10 is a duplicate. The number of created sub-regions and their uniqueness depends on the shape of the sensor sensing regions as well as the arrangements of the sensors. The main aim of this paper is to determine, for each value of n, the maximum size of the set of class representatives (the maximum number of unique sub-regions), and an arrangement of sensors that would lead to this maximum, for both omni-directional and directional sensors. For every set of n sensors, there are 2n subsets of sensors, which is therefore an upper bound on the number of unique sub-regions. However, for large values of n, the number of unique sub-regions is also limited by the maximum number of sub-regions that can be geometrically created. In Section 5, we propose an arrangement of omni-directional sensors with maximum geometrically possible number of sub-regions, which are all unique. Then, in Section 6 we prove a tighter upper bound on the number of unique sub-regions in directional sensor arrangements and propose an arrangement with number of unique sub-regions asymptotically equivalent to the calculated upper bound.
174
5
P. Asadzadeh et al.
Omni-directional Binary Sensors
5.1
Maximum Number of Sub-regions
Space partitioning using circles – representing the sensing region of omnidirectional binary sensors – has already been investigated in the literature [7, Problem 137.1]. It has been shown that n circles can divide a plane into n2 −n+2 sub-regions, if each pair of circles intersects in two points, and no three circles intersect in the same point. Therefore, n2 −n+2 is the maximum number of sub-regions that can be created geometrically. However, our aim is to find the maximum number of sub-regions that are all distinguishable – the maximum number of unique sub-regions. In this section, we first introduce Algorithm 1 that creates an arrangement of omnidirectional sensors with maximum number of sub-regions, n2 − n + 2. Theorem 1 then proves that all created sub-regions are unique and hence, n2 − n + 2 is also an upper bound on the number of unique sub-regions in a omni-directional binary sensor network. 5.2
Our Proposed Arrangement of Omni-directional Sensors
The following algorithm generates an arrangement of omni-directional sensors that creates maximum number of sub-regions, n2 − n + 2. Input : n omni-directional sensors Output: An arrangement of n omni-directional sensors with maximum possible number of sub-regions 1
Choose X such that X < r, where r is the sensing range of the circular sensors.
2
a ← X × cos π/n
3
Consider an n-sided convex regular polygon, with the side length equal to a.
4
Place the sensors on the vertices of the regular polygon.
Algorithm 1. Our proposed algorithm for construction of an arrangement of n omni-directional sensors
Figure 5 shows arrangements of four and five omni-directional sensors, with sensing regions Ci , generated by Algorithm 1. In Theorem 1, we prove that all created sub-regions using Algorithm 1 are unique. We will use the following notations in the proofs in this section. Si is the ith sensor and Ci is its corresponding circular region. Sensors are indexed from 1 to n, counterclockwise. We denote the boundary of a circle Ci with ∂Ci . Moreover, ∂Cj ⩀ ∂Ci denotes the rightmost intersection point of the boundaries of two circles Ci and Cj . We also define h as n/2, where n is the number of sensors.
On Optimal Arrangements of Binary Sensors
175
Fig. 5. Our proposed arrangement (a) 4 (b) 5 omni-directional sensors
Theorem 1. All created sub-regions in our proposed arrangement generated by Algorithm 1 are unique. Proof. Consider a line L0 passing through the polygon’s center, vc, and ∂Ch ⩀ ∂Ch+2 and a Line L1 passing through vc and ∂Ch+1 ⩀ ∂Ch+2 , as shown in Figure 6.
Fig. 6. A proposed arrangement of 8 omni-directional sensors
L0 intersects h sub-regions by crossing the following circles/circle intersections in the following order: ∂Ch+1 , ∂Ch ⩀ ∂Ch+2 , ∂Ch−1 ⩀ ∂Ch+3 , ..., ∂C2 ⩀ ∂Cn .
(1)
176
P. Asadzadeh et al.
Similarly, L1 crosses h − 1 sub-regions by crossing the following circle intersections in the following order: ∂Ch+1 ⩀ ∂Ch+2 , ∂Ch ⩀ ∂Ch+3 , ∂Ch−1 ⩀ ∂Ch+4 , ..., ∂C2 ⩀ ∂C1 .
(2)
The sub-region r0 is within the sensing regions of all sensors. So, all bits in its code bit vector, C[r0 ], are set. Starting from r0 , moving further on lines L0 and L1 moving from one sub-region to another, we leave the sensing region of a sensor once the line intersects with its sensing region boundary (in the order defined in 1 and 2). Whenever a circle, describing the sensing region of a sensor, is left, its corresponding bit becomes zero in the code bit vector of the newly met sub-region. Consequently, sub-regions r1 to rn have n − 1, n − 2, ... and 1 bits set, respectively and hence, are all unique. Now, consider n − 1 circles, CC1 to CCn−1 (dashed circles in Figure 6), each CCi defined as follows: CCi is a circle centered at vc that intersects with two vertices of sub-region ri . Therefore, by construction, Circles CC1 to CCn−1 pass through the following n − 1 intersection points. ∂Ch+1 ⩀ ∂Ch+2 , ∂Ch ⩀ ∂Ch+2 , ∂Ch ⩀ ∂Ch+3 , ∂Ch−1 ⩀ ∂Ch+3 , ..., ∂C3 ⩀ ∂Cn , ∂C2 ⩀ ∂Cn , ∂C2 ⩀ ∂C1 .
By symmetry, each of circles CC1 to CCn−1 crosses n sub-regions – sub-regions r1 to rn , respectively, and their n−1 counterparts. Sub-regions r1 to rn and their counterparts have n − 1 to 1 bits set in their code bit vectors. Table 2 shows the code bit vectors of r3 and its counterparts - sub-region intersected by CC3 , which all have only three zero bits in their code bit vectors. The pattern of zeros and ones in the code bit vectors of the intersected sub-regions, as shown in Table 2, will be rotated right one bit position at a time; thus, making all of them unique. Therefore, all sub-regions in the proposed arrangement of circular sensor generated by Algorithm 1 are unique. Table 1. Code bit vectors of the sub-regions intersected with CC3 (Figure 6) sub-region no. 1 2 3 . . . r3 111 rm 111 ... 111 ... ... 000 ... 100 ... ... 111 rn 111
6
h h+1 h +2 h+3 ... n−1 0 0 0 1 1 0 0 0 1 1 0 0
n 1 1 1 1 1 1
1 1
1 1
1 1
1 1
1 1 1 1
0 0
1 0
1 1
1 1
1 1 1 1
Directional Binary Sensors
In Section 6.1, we demonstrate that n directional sensors placed on the border of a circular monitored area divide the area into at most 2n2 + 1 sub-regions. Then,
On Optimal Arrangements of Binary Sensors
177
in Section 6.2 we prove an upper bound on the number of unique sub-regions in any arrangement of directional sensors. Finally, Section 6.3 introduces our proposed arrangements of directional sensors. 6.1
Maximum Number of Sub-regions
It is clear that one sensor divides the circle into three sub-regions (Figure 7(a)). Adding any additional sensor such that its sensing region does not intersect with any existing sensing region, introduces two additional sub-regions (r4 and r5 in Figure 7(b)). Therefore, if no pair of n sensor sensing regions intersect within the circle, then the number of sub-regions is 3 + 2(n − 1) = 2n + 1.
Fig. 7. Intersections between sensing regions
Now, assuming that no more than two edges have the same intersection point, each intersection point between two sensing regions generates one new sub-region (r6 in Figure 7(c)). Therefore, we have: nr = 2n + 1 + nm ,
(3)
where nr and nm denote the total number of sub-regions and intersections within the circle, respectively. Since a sensor region can intersect another sensor region at at most four points, the maximum number of intersections is nmax = 4(n−1).n = 2n2 − 2n, where the m 2 division by two accounts for each intersection being counted twice. Thus, by Equation 3, the maximum number of sub-regions will be: nmax = 2n + 1 + nmax = 2n + 1 + 2n2 − 2n = 2n2 + 1 . r m
(4)
Therefore, n directional sensors placed on the border of a circular monitored area divide the area into at most 2n2 + 1 sub-regions. 6.2
An Upper Bound on the Number of Unique Sub-regions
Notations defined in Section 3 are all used in the lemmas and proofs in this section. To maximize the number of created sub-regions, we make the following assumptions:
178
P. Asadzadeh et al.
Assumption 1 Sensor edges do not overlap. Assumption 2 No two edges intersect on the monitored area. Assumption 3 No more than two edges have the same intersection point within the monitored area. Figure 8 (a-c) illustrate a violation of the Assumptions 1-3, respectively.
Fig. 8. Violating assumption 1 (a), 2 (b), and 3 (c)
Definition 2 and Lemmas 1 to 3 are used in the upper bound proof in Theorem 2. We first divide the sub-regions into boundary and inner sub-regions (Definition 2). Lemmas 1 and 2 prove the number of boundary sub-regions and the number of bits set in their codes, respectively. In Lemma 3, we calculate the maximum number of created sub-regions when the codes of the boundary sub-regions are known. The three lemmas are then used to prove an upper bound on the number of unique sub-regions in Theorem 2. Definition 2. Boundary sub-regions and inner sub-regions. We create the multiset of boundary sub-regions, SRb , by traversing the circle’s perimeter clockwise or counterclockwise until the same point is reached and including the intersected sub-regions in SRb . All sub-regions that are not in SRb are called inner subregions. For arrangement in Figure 9(a), for example, SRb = {r1 , r2 , r3 , r4 , r5 , r6 , r7 , r8 , r9 , r10 , r11 , r12 } . SRb is a multiset because a sub-region might be encountered twice in this traversal. For example, in Figure 9(b), sub-region r10 is intersected twice and hence, included twice in SRb = {r1 , r2 , r3 , r4 , r5 , r6 , r7 , r8 , r9 , r10 , r11 , r10 }. Lemma 1. The cardinality of the multiset of boundary sub-regions SRb is 4n. Proof. The sensing region of each sensor intersects the circle’s perimeter at four end points. Therefore, by Assumptions 1 and 2, there is a total of 4n intersection points on the circle’s perimeter, leading to the size of multiset SRb also being 4n.
On Optimal Arrangements of Binary Sensors
179
Fig. 9. Boundary sub-regions
Lemma 2. The number of sub-regions in SRb whose codes have an odd number of bits set equals the number of sub-regions in SRb with an even number of bits set. Proof. By Assumption 1, each pair of neighboring boundary sub-regions have exactly one edge in common. Therefore, each pair of neighboring boundary subregions has only a one-bit difference in their codes. Thus, if a boundary subregion has an odd (even) number of bits set in its code, its clockwise neighbor has an even (odd) number of bits set in its code, which proves the lemma. Lemma 3. In any arrangement of n sensors, n ≥ 2, there are at most (2n2 + n + 1) − (∑ni=1 i.SR ib )/2 sub-regions within the circle, where SR ib is the number of elements of multiset SRb with i bits set in their codes. Proof. When n ≥ 2, for each end point located in the bounding-arcs of Si , there is an intersection between an edge ejx and an edge of Si that is missed ((ejr ,eir ) and (ejl ,eil ) in Figure 10(a) and (ejr ,eil ) and (ejr ,eir ) in Figure 10(b)). Moreover, if no end point of ejx ends up in the bounding-arcs of Si , ejx might still intersect with none of the edges of sensor eix (ejl Figure 10(c)).
Fig. 10. Different positions of edges of sensor Sj with respect to sensor Si
180
P. Asadzadeh et al.
Therefore, if we denote the number of sensor edges that do not intersect with edge eix by m−eix and the number of end points located in the bounding-arcs of Si with ne , then we have: m−eil + m−eir ≥ ne .
(5)
On the other hand, the number of boundary sub-regions covered in the bounding-arcs of Si equals ne + 2, because each sensor has two bounding-arcs and each end point creates a new boundary sub-region within those boundingarcs. These are also the only boundary sub-regions that have a bit set in the ith position of their codes. Therefore, denoting the number of boundary sub-regions ′ ′ 1 with a bit set at the ith position of their codes by SR i= , using Equation 5: b ′ ′
1 m−eil + m−eir + 2 ≥ SR i= . b
(6)
Moreover, the total number of bits set in all boundary sub-regions codes is , SR 1b + 2SR 2b + 3SR 3b + ... = ∑ni=1 i.SR ib , where SR ib is the number of elements of multiset SR b with i bits set in their codes. Therefore, we have: n
n
i=1
i=1
i − − ∑ (meil + meir + 2) ≥ ∑ i.SR b .
(7)
On the other hand, the number of intersections within the circle is: n
(2n2 − 2n) − ∑ ((m−eil + m−eir )/2), i=1
where the first part, 2n2 −2n, is the number of intersections when all edges intersect with one another. The second part, ∑ni=1 ((m−eil + m−eir )/2), is the number of missed intersections; the sum is divided by two, as each possible intersection is counted twice. Therefore, by Equation 7, the maximum number of intersections within the circle is: n i ∑ i.SR b 2n2 − 2n − ( i=1 − n) . (8) 2 Using Equations 8 and 3 (by Assumption 3), the maximum number of partitions is calculated as: 2n + 1 + 2n2 − 2n − (
n
i
n
i
∑i=1 i.SR b ∑ i.SR b − n) = 2n2 + n + 1 − i=1 . 2 2
(9)
Theorem 2. An upper bound on the number of unique sub-regions for any arrangement of n directional sensors is 2n2 − 3n + 2, n ≥ 2. Proof. As in Definition 2, the created sub-regions in any arrangement of sensors are divided into two groups of boundary and inner sub-regions. An upper bound on the number of unique sub-regions can be the maximum value of SR t − SR bd − SR id , where SR t is the total number of sub-regions and SR bd and SR id are
On Optimal Arrangements of Binary Sensors
181
the number of duplicate sub-regions among boundary and inner sub-regions, min respectively. Therefore, SR upper = SR max − SR min t bd − SR id . Assume that the number of elements of multiset SR b with i bits set in their codes is denoted by SR ib . Since there are at most n unique boundary sub-regions with one bit set in their codes and only one unique boundary sub-region with code zero, the minimum number of duplicate boundary sub-regions SR min is bd (SR 1b − n) + (SR0b − 1). To calculate an upper bound, we can also assume that all inner sub-regions are unique, or SR min = 0. Therefore, an upper bound on the id number of unique sub-regions for any arrangement of n directional sensors is: SR upper = SR max − ((SR 1b − n) + (SR 0b − 1)) . t
(10)
Then, by Lemma 3: SR upper = 2n2 + n + 1 − (
n
i
∑i=1 i.SR b ) − ((SR 1b − n) + (SR 0b − 1)) . 2
(11)
By Lemma 1, ∑ni=0 SR ib = 4n . Therefore, we have: SR upper = 2n2 + n + 1 − (
n
i
n ∑i=1 i.SR b ) − ((4n − (∑ SR ib )) − n − 1) . 2 i=2
or, SR upper = 2n2 − 2n + 2 − (
n
i
n
i
n ∑i=1 SR b ∑ (i − 1).SR b ) − ( i=1 ) + (∑ SR ib ) . 2 2 i=2
Using Lemma 1, we have:
SR upper = 2n2 − 2n + 2 −
n i n (4n − SR 0b ) ∑ (i − 1).SR b − ( i=1 ) + (∑ SR ib ) . 2 2 i=2
The upper bound is therefore:
SR upper = 2n2 − 3n + 2 + (
n i (SR 0b + SR 2b ) ∑ (i − 3).SR b − n) − ( i=4 ). 2 2
(12)
From Lemmas 1 and 2, we know that SR 0b + SR 2b ≤ 2n, i.e., the value of ((SR 0b + SR 2b )/2 − n) is not greater than zero. Thus, we conclude that an upper bound on the number of unique sub-regions for any arrangement of n sensors is 2n2 − 3n + 2. To show that the calculated upper bound is tight, we propose, in the next section, an arrangement of sensors whose number of unique sub-regions is asymptotically equivalent to the calculated upper bound.
182
6.3
P. Asadzadeh et al.
Our Proposed Arrangements of Directional Sensors
We have proved an upper bound on the number of unique sub-regions for any arrangement of n sensors, 2n2 − 3n + 2, n ≥ 2. In this section, we provide an algorithm to construct a regular arrangement of n sensors that has 2n2 − 5n + 1 unique sub-regions for even n and 2n2 − 5n + 5 unique sub-regions for odd n. Using a probabilistic method, we will also show that we can in fact generate arrangements that have a greater number of unique sub-regions than the regular arrangement and are very close to the calculated upper bound in Section 6.2. Arrangement I. We describe in Algorithm 2 the construction of our regular arrangement of n directional sensors, Arrangement I. For an even number of sensors, the sensors are placed equidistant around the circle with an angle of 2π/n apart from each other. If n is odd, the sensors are placed 2π/(n + 1) apart, which leaves one position empty. For a given sensor Si , we define its left (right) angle βi (αi ) as the angle between eil (eir ) and the ray pointing from Si to the centre of the circle (see Figure 11(a) for example for sensor S1 ). In any Arrangement I, all αi s and βi s are equal to a given α and β, respectively. Figure 11 shows such arrangements for five and six sensors, computed by Algorithm 2. The sensors are positioned on a circular monitored area and are numbered anticlockwise from 1 to n (see Figure 11).
Input : n sensors, a circular monitored area, α, β Output: An Arrangement I of n sensors 1 2 3 4 5 6 7 8 9 10
α ← π/n + β ← π/n − ( + ′ ) ∀i ∈ {1, . . . , n} ∶ αi ← α ∀i ∈ {1, . . . , n} ∶ βi ← β if n is even then Place the sensors equidistant 2π/n apart. end if n is odd then Place the sensors equidistant 2π/(n + 1) apart, leaving one position empty. end
Algorithm 2. Our proposed algorithm for construction of an Arrangement I of n directional sensors
We will initially assume that n is even and we define h as n/2. Moreover, arithmetic is modulo n on the domain {1, ..., n}. Given the values of α = π/n + and β = π/n − ( + ′ ) for small and ′ as in Algorithm 2, Sensor Si ’s left edge aims slightly to the left of sensor Si+h+1 , while its right edge aims slightly to the left of sensor Si+h−1 . Lemma 4 is used to compute the number of unique partitions in any Arrangement I.
On Optimal Arrangements of Binary Sensors
183
Fig. 11. Our proposed arrangements of (a) five and (b) six sensors
Lemma 4. Edge e1r intersects the following edges in the following order: e(h+3)r , e2l , e(h+4)r , e3l , . . . , enr , e(h−1)l , e(h+1)l , e2r , e(h+2)l , e3r , e(h+3)l , . . . , e(h−1)r , e(n−1)l .
Proof. Consider two edges: e1r (from sensor S1 to sensor Sh ) and eil (from sensor Si to sensor Si+h+1 ), for i ∈ {2, 3, ..., n − 1}. By symmetry, if there were no and ′ , these two edges would meet at a point that lies on the diameter from sensor S(i+1)/2 to sensor S(i+1)/2+h . Therefore, as i increases, the intersection point of eil with e1r moves further and further counterclockwise, and therefore in the order given by i. Note that none of ehl , ehr , enl intersects e1r . By construction, edge eil is almost coincident with e(h+i+1)r , but due to the effect of the and ′ deflections, e(h+i+1)r intersects e1r slightly closer to sensor h + i + 1 than eil does (for i ≠ h). Using Lemma 4, we enumerate the neighboring sub-regions of edge e1r , which are the sub-regions whose boundaries are formed by parts of edge e1r . In Figure 11, for example, the neighboring sub-regions of edge e1r are sub-regions r1 to r12 . Sub-region r2 is the start sub-region of sensor S1 . Table 2 shows the enumeration of neighboring sub-regions of edge e1r . This enumeration is anologous for edges eir : there will be corresponding sub-regions for each edge of eir . Each row i in Table 2 shows the code bit vector of the ith neighboring sub-region of edge e1r . When counting neighboring sub-regions of e2r , e3r etc., the pattern of zeros and ones in the bit vectors in Table 2 will be rotated right one bit position at a time. Thus, we can establish symmetries and identify when sub-regions are not unique. In particular, row 2n − 5 is equivalent to row 2n − 10, row 2n − 4 is equivalent to row 2n − 8, row 2n − 3 to row 2n − 11 and row 2n − 2 to row 2n − 9. In fact, this process will continue, repeating with period four, due to the symmetries, except that lines 2n − 7, 2n − 6, 4n − 13 have no counterparts and line 4n − 12 is a duplicate of the first line. The number of unique sub-regions among neighboring sub-regions of edge e1r is then 2n−5. The
184
P. Asadzadeh et al.
Table 2. Enumeration of code bit vectors of neighboring sub-regions of edge e1r sub-region no. 1 2 3 4 5 . . . h − 2 h − 1 1 000 00 0 0 2 100 00 0 0 3 000 00 0 0 4 100 00 0 0 5 010 00 0 0 6 110 00 0 0
h h+1 h+2 h +3 h+ 4 ... n−1 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0
n 0 0 0 0 0 0
2n − 11 2n − 10 2n − 9 2n − 8 2n − 7 2n − 6 2n − 5 2n − 4 2n − 3 2n − 2 2n − 1 2n
0 1 0 1 0 1 0 1 0 1 0 1
1 1 1 1 1 1 1 1 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1
4n − 17 4n − 16 4n − 15 4n − 14 4n − 13 4n − 12
0 1 0 1 0 1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
1 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
1 1 1 1 0 0
1 1 1 1 1 1
sub-region in the fields of all sensors is not included in Table 2. Hence, the total number of unique sub-regions is (2n − 5)n + 1 = 2n2 − 5n + 1. Our discussion assumed that n is even. If n is odd, we calculate the number of unique sub-regions in the n + 1-size arrangement: 2(n + 1)2 − 5(n + 1) + 1, because n + 1 is even. We subtract the neighboring sub-regions of the (n + 1)th sensor’s edges: there are 2(n + 1) − 5 intersections on its left edge, 2(n + 1) − 7 intersections on its right edge, each corresponding to one neighboring sub-region of Sn+1 ’s edges. We must also exclude the start sub-region of sensor Sn+1 . In summary: the number of unique sub-regions is 2(n + 1)2 − 5(n + 1) + 1 − ([2(n + 1) − 5] + [2(n + 1) − 7] + 1) = 2n2 − 5n + 5 . Arrangement II. This type of sensor arrangement is constructed using Algorithm 2, but with new constraints for values of α and β, i.e., α + β < 2π/n and α, β < π/n. We assume again that the sensors are anticlockwise ordered from 1 to n. Therefore, each field of sensor Si covers only sensor Si+n/2 . Figure 12 shows a sample of Arrangement II for six sensors. For each n ranging from 6 to 20, the number of unique sub-regions is then computed by doing simulations for every value of angles α and β with angles change step of 0.1 ○ . The achieved maximum number of unique sub-regions for such arrangements (Maximum for Arrangement II) are shown in Table 3 and in Figure 13. We compare these numbers with the maximum number of unique sub-regions from Arrangement I (Section 6.3) and the upper bound calculated in Section 6.2. The number of unique sub-regions in both Arrangements I and II are asymptotically equivalent to the upper bound.
On Optimal Arrangements of Binary Sensors
185
Fig. 12. Arrangement II for 6 sensors
Table 3. The achieved maximum number of unique sub-regions for n ranging from 6 to 20 n Maximum for Maximum for Upper Arrangement I Arrangement II Bound 6 43 53 56 7 67 75 79 8 89 102 106 9 121 132 137 10 151 167 174 11 188 205 213 12 229 248 256 13 274 294 301 14 323 344 353 15 376 399 407 16 433 457 466 17 494 518 529 18 559 584 596 19 628 654 667 20 701 727 742
186
P. Asadzadeh et al.
Fig. 13. Upper bound and maximum on the number of unique sub-regions in our proposed arrangements
7
Conclusions and Future Work
Binary sensors can be used to partition an area into unique sub-regions and hence, provide localization functionality. We calculated an upper bound on the number of unique sub-regions that a set of omni-directional or directional sensors can achieve. We also proposed regular arrangements for both omni-directional and directional sensors whose number of unique sub-regions is asymptotically equivalent to our calculated upper bound. This outcome gives researchers an insight into how many sub-regions can be created using a specific number of sensors as well as the number of sensors required to achieve a certain accuracy. Finding a constructive algorithm to generate an arrangement of n sensors whose number of unique sub-regions equals the calculated upper bound is still an open problem for directional sensors. Moreover, since the size of the created subregions has impact on the localization accuracy, we are currently looking at partitioning schemes that lead to sub-regions that are as equally-sized as possible so that we achieve a uniform resolution. Acknowledgement. This work was funded by National ICT Australia (NICTA). NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.
References 1. Aslam, J., Butler, Z., Constantin, F., Crespi, V., Cybenko, G., Rus, D.: Tracking a Moving Object with a Binary Sensor Network. In: Proceedings of the First International Conference on Embedded Networked Sensor Systems, pp. 150–161 (2003)
On Optimal Arrangements of Binary Sensors
187
2. Bouet, M., Pujolle, G.: 3-D Localization Schemes of RFID Tags with Static and Mobile Readers. Networking, 112–123 (2008) 3. Djuric, P.M., Vemula, M., Bugallo, M.F.: Target Tracking by Particle Filtering in Binary Sensor Networks. IEEE Transactions on Signal Processing 56(6), 2229–2238 (2008) 4. Ehrenberg, I., Floerkemeier, C., Sarma, S.: Inventory Management with an RFIDequipped Mobile Robot. In: International Conference on Automation Science and Engineering, pp. 1020–1026 (2007) 5. Hinske, S.: Determining the Position and Orientation of Multi-Tagged Objects Using RFID Technology. In: IEEE International Conference on Pervasive Computing and Communications Workshops (PerComW), pp. 377–381 (2007) 6. Kershner, R.: The Number of Circles Covering a Set. American Journal of Mathematics 61, 665–671 (1939) 7. Konhauser, J.D., Velleman, D., Wagon, S.: Which Way Did the Bicycle Go?: And Other Intriguing Mathematical Mysteries. Mathematical Association of America (1996) 8. Mehmood, M.A., Kulik, L., Tanin, E.: Autonomous Navigation of Mobile Agents Using RFID-enabled Space Partitions. In: ACM SIGSPATIAL GIS, pp. 173–182 (2008) 9. Murakita, T., Ikeda, T., Ishiguro, H.: Human Tracking using Floor Sensors based on the Markov Chain Monte Carlo Method. In: International Conference on Pattern Recognition, pp. 917–920 (2004) 10. Reza, A.W., Geok, T.K.: Investigation of Indoor Location Sensing via RFID Reader Network Utilizing Grid Covering Algorithm. Wireless Personal Communications 49, 67–80 (2009) 11. Shrivastava, N., Mudumbai, R., Madhow, U., Suri, S.: Target Tracking with Binary Proximity Sensors. ACM Tracnsactions on Sensor Networks 5(4), 30:1–30:33 (2009) 12. Wang, Z., Bulut, E., Szymanski, B.: Distributed Energy-Efficient Target Tracking with Binary Sensor Networks. ACM Tracnsactions on Sensor Networks 6(4), 32:1– 32:32 (2010) 13. Want, R.: An Introduction to RFID Technology. Pervasive Computing 5, 25–33 (2006) 14. Want, R., Hopper, A., Falc˜ ao, V., Gibbons, J.: The Active Badge Location System. Information Systems 10(1), 91–102 (1992)
A Hybrid Geometric-Qualitative Spatial Reasoning System and Its Application in GIS Giorgio De Felice, Paolo Fogliaroni, and Jan Oliver Wallgr¨ un Universit¨ at Bremen, Department of Mathematics and Informatics Enrique-Schmidt-Str. 5, 28359 Bremen, Germany {defelice,paolo,wallgruen}@informatik.uni-bremen.de
Abstract. We propose a hybrid geometric-qualitative spatial reasoning system that is able to simultaneously deal with input information that is partially given geometrically and partially qualitatively using spatial relations of different qualitative spatial calculi. The reasoning system combines a geometric reasoning component based on computational geometry methods with a qualitative reasoning component employing relation algebraic reasoning techniques. An egg-yolk representation approach is used to maintain information about objects with underdetermined geometry and also allows for vague objects in the input. In an experimental evaluation we apply the reasoning system to infer geometric information for a set of only qualitatively described objects. The experiments demonstrate that the hybrid reasoning approach produces better results than geometric and qualitative reasoning individually.
1
Introduction
While traditionally (geo)spatial information systems have mainly been concerned with the storage and processing of quantitative geometric information, the need for intuitive interfaces and the advent of volunteered geographic information (VGI) [10] brought about an increased interest in techniques that allow humans to provide and describe spatial knowledge in a natural and intuitive way and to exploit existing externalizations of spatial knowledge in verbal or graphical form [6]. Qualitative models of space based on finite sets of spatial relations (e.g., topological relations or direction relations) have been proposed exactly for this purpose: to capture human commonsense notions of space, describe the meaning of spatial prepositions in natural language, and retain those spatial properties that are preserved in graphical externalizations of spatial knowledge such as sketched or schematized maps [2,6,8]. In many potential VGI applications as well as applications from other spatial domains, qualitative spatial relations and geometric information have to be maintained and exploited together to reach full potential. Despite this need, relatively little work exists on combining geometrical and qualitative spatial information and performing mixed quantitative-qualitative reasoning. In this paper, we address this gap by developing a hybrid spatial reasoning system that is able to simultaneously deal with input information that is partially given geometrically M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 188–209, 2011. c Springer-Verlag Berlin Heidelberg 2011
A Hybrid Geometric-Qualitative Spatial Reasoning System
189
North of B
A
C D is North of B D is West of A D is inside C
West of A
A
C
Inside C
B
B
(a) Known object configuration
D
(b) Approximation for D
Fig. 1. Introductory example
and partially qualitatively. Qualitative relations can refer to both geometrically defined entities or other qualitatively described entities. The reasoner is able to deduce new information about the geometry of the involved objects as well as new qualitative relations. This is achieved by interlinking a geometric reasoning component based on computational geometry methods and a qualitative reasoning component based on relation-algebraic operations to exploit the different strengths of both approaches. As the information conveyed by most qualitative relations, in particular those falling into the categories of projective and metric relations [1], can directly be construed geometrically, the geometric reasoning component provides a common interpretation for combining relations from different qualitative formalisms. As a result, our approach avoids the need to define and analyze combined approaches (cmp. [19,14,24]), which becomes very complex the more qualitative formalisms should be accommodated. A very simple introductory example of the reasoning performed by our system is shown in Fig. 1. The goal is to derive geometric information about an object D, which is only described qualitatively with respect to the other objects A, B, C for which the geometries are given. The provided geometries and qualitative information using topological and cardinal direction relations are shown in Fig. 1(a), while the result of our hybrid reasoning system is depicted in Fig. 1(b) by the small dark area in the middle which has to contain D. As we will show later in this paper, more complex examples will require both geometric and qualitative reasoning steps to compute the best approximation. The inferred results of our reasoning system can then be used for query answering or for visualization purposes e.g., by drawing the approximated geometries into a map. Research on qualitative spatial representations and reasoning (QSR) [3,17] has led to a multitude of so-called qualitative spatial calculi for different aspects of space. Prominent examples for such qualitative calculi are the Region Connection Calculus RCC-8 [16] and the 9-Intersection model [5] dealing with topological relations such as inside used in the previous example. Other calculi deal, for instance, with direction, orientation, distance or size. The qualitative reasoning component of our spatial reasoning system employs relation algebraic methods based on the operations defined in qualitative calculi. The algebraic closure algorithm [15] is used to deduce new qualitative information from a given set of qualitative relations.
190
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
The geometric reasoning component, on the other hand, uses intersection of geometries as the main operation to infer new or more specific information. We employ a vector-based approach [23] to represent geometries using multi-region polygonal structures that allow for representing objects consisting of several components and contain holes. An extension of these structures is needed to be able to represent the infinite acceptance areas corresponding to qualitative relations. Moreover, as the introductory example above illustrates, our hybrid reasoner needs to be able to also deal with objects with underdetermined geometry derived either from qualitative spatial relations or by spatial reasoning. We therefore adopt the idea of egg-yolk representations [4]—originally introduced to represent vague spatial concepts which do not possess a clear boundary—to maintain minimal and maximal geometric extensions for each object. A benefit of this representation is that the system is also able to deal with vaguely specified objects in the input. Our experiments show that the combined reasoner is able to deduce more specific information than either the qualitative approach or the geometric approach would have been able to, independently. The remainder of this text is organized as follows. Section 2 summarizes the required background knowledge on geometric and qualitative spatial representation and reasoning approaches. In Section 3, we describe the architecture of our hybrid reasoning system and explain and discuss the components in detail. Section 4 describes the implementation of the reasoner and illustrates it with a larger example. Section 5 reports on first experiments performed to evaluate the approach.
2
Geometric and Qualitative Representations of Space
In this section, we briefly summarize the key concepts with regard to geometric and qualitative approaches to represent and reason about space, and introduce the notations used in the remainder of this text. 2.1
Geometric Representation and Reasoning
Computational geometry techniques can be employed to generate new information and perform analyses by manipulating geometric objects that convey quantitative information about the spatial extension of spatial entities, for instance by way of computing intersections, unions, convex hulls, etc. Vector representations employ parameterized primitive objects such as points, lines, and polygons to represent the extension of spatial objects in different dimensionalities. In this work, we exclusively employ 2D geometric objects embedded in the Euclidean space R2 . Most of the primitives and operations used are supported by standard geographic information systems. In the following we introduce the notations used for the primitive geometric objects occurring in the text. We denote points in the plane with a single small letter and identify them by their cartesian coordinates, e.g., p = (xp , yp ). Lines are represented by two points p1 and p2 and we denote them with a small Greek letter, e.g., λ = (p1 , p2 ). Line segments consist of all points that lie between p1 and p2 on the line λ = (p1 , p2 )
A Hybrid Geometric-Qualitative Spatial Reasoning System
191
p18 Q17
p5 Q5
p12 Q6
Q10
Q18
p4
p10
Q3
Q9
Q1
Q12 p8
. . .
p17
Q8
p3 Q14 Q2
Q15
p14 Q2
Q13
Q1
Q15 = [p15 , p16] . . .
Q1 =
P1 = C1 =
p2 p1
p16
Q2 =
p13
Q7
. . .
Q13 = [p13 , p14] Q14 = [p14 , p15]
Q19
p9 p15
p7
Q16
p19
C1
p13 = (x13 , y13) p14 = (x14 , y14)
C2
p11
Q11
p6
Q4
C2 = M =
Fig. 2. Geometric representation of a single multi-region object
(including p1 and p2 themselves) and are written as λ = [p1 , p2 ], while we use → − λ = [p1 , p2 ) for rays which are oriented line segments with start point p1 and end point p2 but extending beyond p2 into infinity. Polylines are finite sequences of connected line segments and denoted by capital Greek letters. They are specified either as lists of line segments or as point lists, e.g., Λ = λ1 , λ2 , . . . λn or Λ = p1 , p2 , . . . pn+1 . We use simple polygons to represent closed connected regions in R2 adopting the usual conceptualization of a region as a point-set. The representing polygons are defined as closed polylines and denoted by a capital letter, e.g., P = p1 , p2 , . . . pn , p1 . To be able to deal with general spatial entities that may have several disconnected components and holes, we employ multi-region objects (also called multi-polygon objects) as defined in the OGC standard [12]. A complex polygonal object with holes is specified by a list of simple polygons of which the first polygon represents the outer boundary of the region, while the other polygons describe the non-overlapping holes, e.g., C = P, Q1 , . . . Qn. A multi-region object composed of n disconnected parts is then written as a list of polygonal objects (potentially with holes), e.g., M = C1 , C2 , . . . Cn . An example of a multi-region object is depicted in Fig. 2. The main geometric operations we use in this text are the union, intersection and set difference of multi-region objects as well as the MBR operation that computes a minimal bounding rectangle for a given object. We use traditional intersection, union and set difference symbols ∩, ∪, \ to denote the respective homonymous operations among spatial objects as described in [12]. When applying the operation to two geometric objects of a particular type, we assume that the resulting point set is always returned as an object of the same type, i.e., A ∩ B for two multi-region objects A, B returns a new multi-region object. Given a multi-region object R the function M BR(R) returns the axis-aligned minimal rectangle S that contains R as a multi-region object. 2.2
Qualitative Spatial Representation and Reasoning
QSR aims at capturing human-level concepts of space. A qualitative spatial calculus consists of a set B of base relations over a domain D of spatial objects
192
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
and a set of operations that enable elementary reasoning and form the basis for more complex reasoning procedures. The eight topological base relations for the RCC-8 calculus [16], for instance, are illustrated in Fig. 3(a). The spatial relations of a qualitative calculus can be interpreted as constraints restricting the possible geometries the related objects can adopt. By taking unions of relations as constraints, one can express uncertainty. We employ the typical notation in which unions of base relations are denoted as sets: A {DC, EC} B, for instance, stands for the union of RCC-8 base relations DC and EC holding between A and B. The resulting set of all possible unions of base relations 2B is called the set of general relations of the qualitative calculus. Given a general (binary) relation holding between regions A and B, the unary converse operation (∼ ) yields the relation holding between B and A, e.g., the converse of {EC, TPP} is {EC, TPPI}. The composition operation (◦) is a binary operation that yields the relation between A and C given the relations holding between A and B and between B and C, e.g., the composition of {EC} and {NTPPI} is {DC}. In addition, classical set-operations {∪, ∩, ·C } can be applied to the general relations of a qualitative calculus. Given a particular qualitative spatial calculus, a qualitative spatial representation is a set of constraints expressed in a quantifier-free constraint language based on the set R of general relations. It can be seen as a constraint network N = (V, D, C) with variables V = {V1 , V2 , . . . , Vn } over the domain D whose valuations are constrained by relations given in the constraint matrix C (Cij gives the constraint relation between Vi and Vj ). A spatial constraint network has a solution if one can assign objects from the domain to the variables such that all constraints are satisfied. One important reasoning problem is to decide whether a spatial constraint network is consistent which means it has a solution. In the case of typical qualitative spatial calculi, the domain is infinite (e.g., the set of points in the plane) and, hence, techniques for solving finite constraint satisfaction problems are not directly applicable. The employed techniques for deciding consistency are based on a procedure called algebraic closure or path consistency algorithm [15] (running in O(n3 ) time) that uses the composition (and converse) operation to enforce consistency of all triples of variables Vi , Vj , Vk by performing Cik = Cik ∩ (Cij ◦ Cjk ) until a fixpoint is reached or a resulting relation becomes empty which indicates inconsistency. In this work, we are not interested in consistency checking but will employ algebraic closure to propagate the input constraints through the network and deduce as much information as possible. In the remainder of this work, we will use three exemplary qualitative spatial calculi dealing with different aspects of space to illustrate the different facets of our hybrid reasoning system. All three calculi deal with extended spatial objects which makes them particularly useful for GIS applications. The first calculus we are using is the already introduced RCC-8 calculus. We use it to describe the topological relations between extended objects in 2D. The other two calculi deal with cardinal directions and visibility relations. The Cardinal Direction Calculus (CDC) [11,20] distinguishes nine binary cardinal directions for extended
A Hybrid Geometric-Qualitative Spatial Reasoning System
A DC B
A EC B
externally connected
disconnected
A TPP B
A PO B
A EQ B
partially overlapping
A TPPI B
A NTPP B
193
NW
N
W
B
SW
S
NE
equal
A NTPPI B
R
E
SE
tangential non-tangential tangential proper non-tangential proper part proper part part inverse proper part inverse
(a) Base relations of the RCC-8 calculus
(b) Frame of reference of CDC
Fig. 3. Region Connection Calculus RCC-8 and Cardinal Direction Calculus
objects in the plane. As illustrated in Fig. 3(b), the frame of reference (FoR) for defining these nine directions is derived from the MBR of the reference object resulting in a partition of the plane into the nine acceptance areas for the relation N,NW,W, etc. and B which corresponds to the MBR itself. The relation A {N} B or N(A, B) means that A (completely) falls into the acceptance area of north. Lastly, the Visibility Calculus (VC) [21,7] defines five ternary relations describing if and how an observer object C perceives an observed object A when an object B is acting as an obstacle. The relations take their names from the acceptance areas depicted in Fig. 4. Symbols V and O stand for Visible and Occluded, while PVR , PVJ , and PVL represent three different versions of P artiallyV isible relations. The relation A{PVL }(B, C), for example, means that object A is partly seen from C on the left of B. In both the CDC and VC models, an extended object can overlap several acceptance areas at once. The actual set of base relations therefore consists of binary matrices indicating whether an overlap exists for each of the acceptance areas. These relations are also called multi-tile relations while those directly corresponding to single acceptance areas are called single-tile. In this work, we focus on the underlying single-tile relations and their acceptance areas as the full set of matrix relations is not yet completely supported by the reasoner.
PVR
V
V
PVR PVJ PVL
O
B
V
C
V
(a) Case with P VJ
V
O
PVL
B
V
C
V
V
(b) Case without P VJ
Fig. 4. Frame of reference of the Visibility Calculus for the two reference objects B, C
194
2.3
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
Representation of Underdetermined and Vague Objects
The geometric objects defined in Section 2.1 as well as the qualitative models from Section 2.2 are suited for objects with completely determined and crisp boundaries. However, in the geographic domain one often has to deal with vague objects whose boundaries cannot be precisely determined. One way proposed to represent such non-crisp regions is the egg-yolk approach [4]. The representation is based on a pair of concentric crisp regions that provide a minimal extension (the yolk) and a maximal extension (the egg) for a vague object. The real extension of the object then has to lie somewhere between the yolk and the egg. In [4], a generalized calculus for egg-yolk objects based on the RCC-8 model is defined consisting of 46 base relations. Besides representing vague objects, egg-yolk objects are well suited to represent geometrically underspecified objects which makes them a good candidate to represent objects with deduced approximate geometry in our reasoner. We define an egg-yolk object R∗ as a pair of crisp objects (R+ , R− ), with R+ representing the maximal extension of R∗ and R− representing the minimal extension. In Fig. 5, an example of an egg-yolk object is depicted. In principle, we allow for both R+ and R− to be multi-region objects; however, in the remainder of this text, we will for the most part use objects whose egg and yolk consist of a single component without any holes. If there is no knowledge about the maximal extension of an object, a special representation for the complete plane R2 is used. Similarly, no knowledge about the minimal extension is represented by a special representation for the empty set ∅.
(a) R∗
(b) R+
(c) R−
Fig. 5. Egg-yolk object R∗ and its maximal and minimal extensions R+ and R−
3
A Hybrid Geometric-Qualitative Reasoning System
In this section, we propose a hybrid geometric-qualitative reasoning system which exploits the individual strength of computational geometry based inference, in particular polygon intersection, and relation algebraic qualitative reasoning methods. The goal is to be able to perform spatial inferences in mixed settings. This means input objects are partially given geometrically in the form of egg-yolk objects, while others are described using qualitative relations from different qualitative spatial calculi, either wrt. the geometrically defined objects or wrt. other qualitatively described objects. The central reasoning task is to
A Hybrid Geometric-Qualitative Spatial Reasoning System
195
Input / output of spatial data (qualitative and geometric) Storage Layer Geometric Information
Qualitative Information Hybrid Reasoning System Qualitative Reasoning
External QS Reasoner
Geometric Reasoning
Qualification
Quantification
Fig. 6. Architecture of the hybrid geometric-qualitative reasoning system
determine approximations for the geometries of input objects for which no exact geometry is given as well as to derive more precise qualitative information. Fig. 6 shows how our reasoning approach combines the qualitative and geometric reasoning component. Two additional components, quantification and qualification, perform the mediation between the geometric and qualitative representations used in the two reasoning components in both directions: The quantification procedure computes the geometric interpretation of a qualitative relation taking into account the geometries of the reference objects as far as they are known. It is called from within the geometric reasoning component. The qualification procedure derives qualitative spatial relations from the geometries of the involved objects. It is called to translate the output of the geometric reasoning and the translation then forms the input for the qualitative reasoner. For both translation procedures, approaches existing for crisp objects needed to be extended to adequately deal with egg-yolk representations. By alternately executing the geometric and qualitative reasoning components, the overall reasoner is able to deduce new spatial information about the input objects both on the qualitative side (new or more specific qualitative relations) as well as on the geometric side (more precise minimal and maximal approximations). This overall reasoning cycle terminates when a fixpoint has been reached where neither more specific geometries nor more specific qualitative relations can be deduced. As also indicated in Fig. 6, the qualitative reasoning and qualification modules also can resort to functionalities provided by existing QSR software such as SparQ [22] and GQR [9], for instance to perform fast constraint propagation. In the remainder of this section, we describe the combined qualitative-geometric reasoning cycle in detail using RCC-8, CDC, and VC as exemplary calculi. 3.1
The Qualitative Reasoning Component
The qualitative reasoning component of the hybrid reasoning system is founded on the algebraic closure algorithm (briefly described in Section 2.2) that applies
196
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
Algorithm 1. QualitativeReasoner(N ) for C ∈ C do AlgebraicClosure(C, N) end for return N
composition and permutation operations defined in a calculus until a fixpoint has been reached. Our implementation of the algebraic closure algorithm is general enough to deal with several composition and permutation operations. This allows us to not only deal with standard binary calculi but also with ternary and weak calculi in which several such operations need to be defined (e.g., the converse operation of the binary case has to be replaced by at least a swap and shift operation to be able to create all permutations of a relation triple). The function AlgebraicClosure(C,N ) in our qualitative reasoner algorithm given in Alg. 1 simply performs the algebraic closure operation for a calculus C on a constraint network N . The constraint network is assumed to be a multi-calculus network in which the edges are labeled by tuples of relations from different calculi. The AlgebraicClosure procedure picks the right relations from the tuples for the given calculus C and modifies them. The main algorithm loops through all calculi C in a given set of calculi C. It returns the potentially modified network N . 3.2
Quantifying Qualitative Relations between Egg-Yolk Objects
The quantification component computes a geometric interpretation of a single n-ary relation tuple r(R1∗ , ..., Rn∗ ) in which R1∗ is the primary object and R2∗ ...Rn∗ are the reference objects. The computation is based on the geometric semantic of the relation and has to take into account the egg-yolk geometries of the involved reference objects. The result is a new egg-yolk region that represents geometric information about R1∗ as implied by this particular relation. Relations from typical projective and metric qualitative spatial calculi can directly be translated into geometric information based on their respective acceptance areas, at least when the reference objects are crisp. However, for many relations such as N(A, B) in the CDC, the resulting region which describes where A could be located wrt. B is infinite. We therefore employ an extended geometric representation for potentially unbounded regions in egg-yolk objects that we will introduce next. The topological relations of RCC-8 can only partially be interpreted geometrically and therefore require a special treatment in the quantification procedure. Infinite-region representation. As shown in Fig. 7(a), the relation A{N}B together with the given polygon for B implies that A has to lie somewhere in the shaded region which reaches out to infinity. Vector representations as commonly implemented in GIS do not support the representation and manipulation of such kinds of infinite regions. In principle, such regions can be represented using a half-plane approach [18] in which objects are represented by means of boolean operations between half-planes in R2 . However, it turns out that it is sufficient to
A Hybrid Geometric-Qualitative Spatial Reasoning System q
B
q
r
h1
h2
P1
R
P2
197 r
Q1
Q2 P2
B
1 P3
P4
P5
P1
(a) CDC
(b) CDC IR
(c) VC
(d) VC IR
Fig. 7. Infinite acceptance areas and the corresponding infinite-region objects
make a small modification to the simple polygon representation used to describe the components and holes of a multi-region object, adopting the idea of halfplane representation. → − − → We define an infinite-region object IR as a triple (Λ, λ 1 , λ 2 ), in which the polyline Λ = p1 , ..., pn represents the finite boundary of IR, while the two rays → − → − λ 1 = [p1s , p1e ) and λ 2 = [p2s , p2e ) represent the boundaries of IR that extend → − to infinity. The starting point of Λ corresponds with the starting point of λ 1 . → − In the same way, the last point in Λ corresponds to the starting point of λ 2 . An infinite-region object introduces a partition of the space into two infinite regions. The actually represented region is the intersection of the half-plane right → − → − of λ 1 , the half-plane left of λ 2 and what can intuitively be seen as the area left of the polyline Λ. The infinite-region object representing the CDC acceptance area of Fig. 7(a) is depicted in Fig. 7(b). Fig. 7(d) shows the infinite-region object representing the acceptance area for the visibility relation Occluded as depicted in Fig. 7(c). It consists of the three components Λ = p1 , p2 , p3 , p4 , p5 , → − → − λ 1 = [p1 , q), and λ 2 = [p5 , r). From now on, we assume that polygons describing the components and holes in multi-region objects used to represent minimal and maximal extensions in eggyolk objects are either simple polygons or infinite-region objects as just defined. Quantification algorithm. To compute a geometric description of a region R1∗ given an n-ary relation r belonging to a calculus C and, holding between R1∗ and the regions R2∗ , ..., Rn∗ , we define the function Quantify(r, C, R2∗ , ..., Rn∗ ). The semantics of r determines whether the quantification process influences either or both of the maximal and the minimal extensions of R1∗ . The topological relation R1∗ {DC}R2 , for instance, between an unknown object R1∗ and a crisp object R2 , defines a constraint for R1+ as the unknown region can be anywhere except where it would overlap with R2 . It does, however, not provide any information about the minimal extension R1− . Conversely, for R1∗ {NTPPI}R2 the opposite holds. Projective and metric qualitative relations typically only provide information about the maximal extension. Alg. 2 shows the general procedure to compute Quantify(r, C, R2∗ , ..., Rn∗ ). A calculus-dependent function GeometryC (br, R2∗ , ..., Rn∗ ) is used to retrieve a geometric description of the acceptance area of a base relation br contained in r. The result is a crisp multi-region object (typically a single infinite-region object)
198
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
Algorithm 2. Quantify(r, C, R2∗ , ..., Rn∗ )
R1+ ← R2 , R1− ← ∅, AccArea← ∅ for br ∈ r do ∗ AccArea ← AccArea ∪ GeometryC (br, R2∗ , ..., Rn ) end for if affectsEggC (r) then R1+ ← AccArea end if if affectsYolkC (r) then R1− ← AccArea end if return R1∗ = (R1+ , R1− )
that is then used by the quantification procedure to build an egg-yolk region for R1∗ . Since r can be a disjunction of base relations, the acceptance area for r is constructed as the union of the acceptance areas for each base relation br ∈ r. Two calculus-dependent predicates, affectsEggC (r) and affectsYolkC (r), direct the assignment of the constructed acceptance area to either the egg or the yolk of the object R1∗ which is returned as the result of the procedure. The indeterminacy of the reference objects has to be taken into account when a base relation is quantified in GeometryC . For instance, the acceptance area for a CDC relation of a reference object B ∗ which is an egg-yolk object has to be defined as the union of acceptance areas for all possible crisp versions of B ∗ . In the following, we show how we derive GeometryC for projective calculi such as CDC and VC. After that we discuss RCC-8. GeometryC for projective relations. The definition of the acceptance areas based on clear geometric criteria provides a well defined semantics to any projective relation. For instance, the acceptance area AN (B) for north-of B can in the crisp case be defined as the set of all points (px , py ) that satisfy: (∀(bx , by ) ∈ B : py > by ) ∧ (∃(bx , by ) ∈ B : px = bx )
(1)
For an egg-yolk object B ∗ , this definition has to be modified to take into account the indeterminacy of the reference object by saying that the acceptance area A∗N is the union of acceptance areas AN over all crisp versions of B ∗ . A∗N (B ∗ ) = AN (R) | B − ⊆ R ∧ R ⊆ B + (2) R⊆R2
This result can be directly adapted to any n-ary projective relation r. However, this definition is based on an infinite number of crisp regions and, hence, does not yield constructive procedure to build the acceptance area for the relation. What we want instead is to compute the acceptance area (or at least a very good approximation of it) by combining acceptance areas from crisp cases in a suitable way, e.g., by interpreting the egg and yolk of the reference object as crisp objects. The final step of coming up with a constructive definition so far had to be conducted on a case-by-case analysis in which we compared the defining conditions as given in Eq. 1 to the properties that can be derived for the respective crisp acceptance areas AN (B − ), AN (B + ), etc. under the side
B+
199
AN* (B*)
ANE(B-)
ANW(B-)
AN+(B+)
AN(B-)
A Hybrid Geometric-Qualitative Spatial Reasoning System
B(a) B ∗
(b)
(c)
(d)
Fig. 8. Acceptance area example for CDC
B+ C+ C-
C+ C-
B+ B-
B+ B-
B(a) A∗P VJ (B ∗ , C ∗ )
(b) A∗V (B ∗ , C ∗ )
(c) EC(A∗ , B ∗ )
Fig. 9. Acceptance area examples for VC and RCC-8
condition that B − ⊆ B + . As an intermediate step we also define the acceptance + + area A+ N (B ) which assumes that B describes the maximal extension but also assumes that the yolk is empty. The result is always a formula in which different crisp acceptance areas are combined via intersection and union. In general, to be able to quickly integrate new calculi, an automatic approach to perform this analysis to construct acceptance areas for the non-crisp case is desirable and will be subject of future research. For the CDC example, A∗N (B ∗ ) is defined as: + − − − A∗N (B ∗ ) = A+ (3) N (B ) ∩ AN (B ) ∪ AN E (B ) ∪ AN W (B ) + Given the egg-yolk object B ∗ in Fig. 8(a), the acceptance area A+ N (B ) is − − depicted in Fig. 8(b), while the union of AN (B ), AN E (B ), and AN W (B − ) is shown in Fig. 8(c). The overall acceptance area resulting from Eq. 3 is depicted in Fig. 8(d). Two similar examples but for visibility relations between non-crisp reference objects are depicted in Fig. 9(a) and Fig. 9(b). They are defined as A∗P VJ (B ∗ , C ∗ ) = AP VJ (B − , C + ) and A∗V (B ∗ , C ∗ ) = AV (B − , C − ), respectively.
GeometryC for RCC-8. In contrast to projective calculi, the topological relations of RCC-8 cannot simply be described using the concept of acceptance areas. The implications for the minimal and maximal extensions of an object A∗ that is known to be in relation r with a reference object B ∗ had to be determined by hand and are summarized in Tab. 1. The first row contains what can
200
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un Table 1. GeometryC results for RCC-8 relations A∗ r B ∗ r +
DC 2
A R \B A− ∅
EC −
2
R \B ∅
PO EQ TPP NTPP TPPI NTPPI −
R2 B + B + ∅ B− ∅
B+ ∅
R2 B−
R2 B−
be derived for the maximal extension A+ and the second row the consequences for the minimal extension A− . As an example, Fig. 9(c) shows the constraint for A+ when the relation EC(A∗ , B ∗ ) holds. It is the complete plane but with a hole for B − . 3.3
The Geometric Reasoning Component
The geometric reasoning component stores egg-yolk approximations for all objects using multi-region objects potentially containing infinite-region objects. It uses intersection and union procedures for drawing geometric inferences by combining information stemming from different relations. This results in refined geometric approximations. To be able to adequately deal with infinite-region objects when computing intersection and union, we had to implement elementary intersection and union operations for two infinite-region objects as well as for an infinite-region object and a simple polygon. We briefly describe our approach to compute the intersection. The union is realized following the same general idea. After that, we present the overall geometric reasoning algorithm. Intersection of infinite-region objects. The intersection is realized by first transforming the infinite-region objects into simple polygons, performing a standard intersection for simple polygons, and finally, if needed, transforming the result back to an infinite-region object. An example of this procedure is depicted in Fig. 10. Given the two infiniteregion objects IR1 and IR2 shown in Fig. 10(a), the transformation into simple polygons is done by cropping them with a clipping box big enough to contain all defining points of the objects themselves as well as all intersection points between the rays marking the regions’ boundaries. We calculate the intersection points between the rays of IR1 and IR2 and derive a bounding box BB with these points. Lastly, we enlarge BB by a value δ resulting in the clipping box CB, as Fig. 10(a) shows. We now clip IR1 and IR2 with CB by computing the intersection of the rays with the boundary of CB resulting in simple polygons P1 and P2 (see Fig. 10(b)). The intersection of P1 and P2 is then computed. The result will be a multi-region object which might have more than one component. In the example we only get a single component. We now need to check for each component whether its boundary polygon needs to be transformed back into an infinite-region object. This happens by checking whether the boundary touches the clipping box CB. In this case, the parts touching the boundary will be removed and a new infiniteregion object is constructed consisting of a new polyline and two newly computed
A Hybrid Geometric-Qualitative Spatial Reasoning System
201
CB BB
IR1 E IR2
IR1
P1
I
IR2
P2
(a) Clipping box
(b) Polygons intersection (c) Resulting infinite region
Fig. 10. Intersection of two infinite-region objects
rays. The final result is shown in Fig. 10(c). If one of the original objects IR1 and IR2 already is a simple polygon, the intersection algorithm works in the same way except that no clipping is required for this object. Main geometric reasoning algorithm. The pseudocode for the geometric reasoning procedure is given in Alg. 3. It takes the set G containing the current geometric approximations for all involved objects in egg-yolk format and a multi-calculus constraint network N . It continuously loops through all objects Ri∗ ∈ G, considers all qualitative relations this object is involved in as provided by the function QualitativeRelations(Ri∗, N ), and tries to improve the geometric approximation of Ri∗ based on these relations. To achieve this, the Quantify(r, C, S2∗ , ..., Sn∗ ) procedure previously defined in Alg. 2 is called. The returned egg-yolk object Q∗ is used to update Ri∗ ∈ G. For the maximal extension Ri+ this has to be done by taking the intersection of the old value of Ri+ and Q+ . Conversely, for the minimal extension Ri− the combination has to be done with the union operator. When the geometric approximation of an object Ri∗ is refined, this can mean that now approximations can be refined further for objects Rj∗ that stand in certain relations with Ri∗ . Therefore, the algorithm runs the loop through all objects until a fixpoint is reached where no refined approximation has been computed.1 To keep track of this, the boolean flag hasChanged is used and the old geometry of Ri∗ is stored in P Ri∗ and later compared to the newly computed approximation. The object set G containing the refined geometries is returned by the reasoning algorithm. 3.4
Qualification from Egg-Yolk Geometries
Qualification is the procedure that determines the qualitative relations holding between the objects in set G given by their egg-yolk approximations Ri∗ and for a set of calculi C. The pseudocode of the qualifier procedure is given in Alg. 4. Besides G, the function also takes a multi-relation constraint network N which it modifies by refining the relations in N based on the newly computed relations. The resulting network is then returned as the result of the 1
We here only describe the basic version of the algorithm. Optimizations are possible to avoid reconsideration of objects and relations for which nothing has changed.
202
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
Algorithm 3. GeometricReasoner(G, N ) hasChanged ← True while hasChanged do hasChanged ← False for Ri∗ ∈ G do P Ri∗ ← Ri∗ QR ← QualitativeRelations(Ri∗ , N ) for (r, C, S2∗ , ..., Sn∗ ) ∈ QR do Q∗ ← Quantify(r, C, S2∗ , ..., Sn∗ ) Ri+ ← Ri+ ∩ Q+ Ri− ← Ri− ∪ Q− end for if P Ri∗ = Ri∗ then hasChanged ← True end if end for end while return G
qualification. The algorithm makes use of auxiliary function Permutations(G, n) that calculates all n-ary permutations of objects contained in the set G. The function refineRelation(N, C, (R1∗ , ..., Rn∗ ), r) refines the relation between objects R1∗ , ..., Rn∗ in network N by intersecting the old relation for calculus C with relation r. It is called with the relation returned by function QualifyC that performs the qualification for a given calculus C in a similar way as GeometryC does in the quantification procedure from Section 3.2.
Algorithm 4. Qualify(G, N ) for C ∈ C do Perm ← Permutations(G, arity(C)) ∗ for (R1∗ , ..., Rn ) ∈ Perm do ∗ refineRelation(N ,C,(R1∗ , ..., Rn∗ ),QualifyC (R1∗ , ..., Rn )) end for end for return N
The implementation of QualifyC for a projective calculus is given Alg. 5. It actually uses GeometryC to check whether the maximal extension of the primary object R1+ intersects the respective acceptance area for each base relation of C. The disjunctions of all base relations for which this is the case is returned. As mentioned, in this paper we only consider single-tile relation for CDC and VC. While this simplification does not introduce errors in the hybrid reasoner (for our purpose, a multi-tile relation can here be interpreted as a disjunction of a single-tile relation, resulting only in an overestimation), we plan to extend the reasoner to also deal with multi-tile relations. RCC-8 again has to be handled separately. Its qualification is based on the notion introduced in [4] that a list of possible disjunctive RCC-8 relations among
A Hybrid Geometric-Qualitative Spatial Reasoning System
203
Algorithm 5. QualifyC (R1∗ , ..., Rn∗ ) ;; projective case disjR ← ∅ for all base relations br of C do ∗ AccAreabr ← GeometryC (br, R2∗ , ..., Rn ) + if R1 ∩ AccAreabr = ∅ then disjR ← disjR ∪ br end if end for return disjR
two egg-yolk region can be computed from the RCC-8 relations holding between the crisp representations of the eggs and the yolks of both regions, resulting in a 4-tuple of RCC-8 relations. We therefore compute this 4-tuple for the given objects and then use a look up table to retrieve the disjunction of possible RCC-8 relations which is then returned. 3.5
The Overall Hybrid Reasoning Algorithm
A basic pseudocode version of the procedure that combines the components described in the previous sections is given in Alg. 6. To keep it simple, it does not contain all possible optimizations for avoiding unnecessary computations. The reasoner takes the set G of egg-yolk objects for all involved objects and a multirelation constraint network N representing the input information. We assume that the geometries in G have been initialized in accordance with the input information and by using egg-yolk objects representing the plane for objects for which no geometric information is given. The network N is also assumed to have been initialized correctly using the specified qualitative relations where possible, while using the universal relation (disjunction of all base relations) everywhere else. At first, the qualitative reasoning component is applied to input constraint network N , refining as many qualitative relations as possible. The resulting network is then passed on to the geometric reasoner that produces new geometric approximations (under the support of the quantification component). Afterwards, the approximated regions are analyzed by the qualification procedure to further refine the qualitative constraint network N . This process is reiterated until neither the qualitative nor the geometric reasoning component are able to refine the available information further. Given that the input information is consistent, it is easy to see that the hybrid reasoning is correct in the sense that computed geometric approximations have to indeed contain the actual object and that the actual geometric configuration described by the input is a solution of the derived qualitative model: (1) The QualitativeReasoner and Qualify functions can only lead to a refinement of the constraint network N; and (2) the GeometricReasoner can only further refine the geometries derived from the qualitative relations. The overall procedure ends when either the GeometricReasoner cannot refine any geometry and, hence, Qualify cannot discover new qualitative relations, or when the QualitativeReasoner does not refine the constraint network, resulting in an absence of infor-
204
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
Algorithm 6. HybridReasoner (G, N ) hasChanged ← True while hasChanged do previousN ← N N ← QualitativeReasoner(N ) G ←GeometricReasoner(G, N ) N ← Qualify(G, N) if N = previousN then hasChanged ← False end if end while return (G, N )
I
C
G
D
E
F
A
H
B
Fig. 11. Object configuration used in the example and experiments
mation that would enable the GeometricReasoner to further refine geometries. Furthermore, the quantification of a relation is based on the concept of acceptance areas in which an object has to be contained where we make sure that the acceptance area represents an upper approximation for the geometry of the object.
4
Implementation and Example
The hybrid geometric-qualitative reasoner has been implemented in order to collect first experimental results. The reasoner has been integrated into Quantum GIS [13]—or simply QGIS —, a free open source desktop GIS. Geometric and qualitative spatial data are maintained in a spatial database. The current prototype supports RCC-8 relations and single-tile relations of CDC and VC.
A Hybrid Geometric-Qualitative Spatial Reasoning System
G C
G
D+
B
C D
E+
B
-
H
H
A
A
(a) Approximation of D
(b) Approximation of E
G C
B
205
G F+
C H
A (c) Approximation of F
B H
I+ A
(d) Approximation of I
Fig. 12. First approximations of D, E, F and I
Fig. 11 shows the implemented plug-in running in QGIS. The displayed object configuration is used in the remainder of this section for a step-by-step running example. In this example the reasoner is given geometric definitions for regions A, B, C, G, H and the following qualitative descriptions: F ∗ {PVJ }(D∗ , B), E ∗ {PVJ }(D∗ , B), E ∗ {PVL }(B, C), G{TPP}D∗ , F ∗ {W}H, F ∗ {N}A, and I ∗ {S}D ∗ . Note that some relations refer to non-geometrically defined objects, for instance in F ∗ {PVJ }(D∗ , B). In the first pass, the qualitative reasoner infers 9 new base relations and 6 refinements of disjunctive relations, e.g, D∗ {TPPI}G and D∗ {PVL ,O}(B, C). Feeding the resulting qualitative relations into the geometric reasoner results in a first approximation as depicted in Fig. 12(a)–(d). Note that object D ∗ is the only one having an egg region D − defined by the inferred relation D∗ {TPPI}G. The qualification procedure is now able to produce 253 new qualitative relations using crisp objects and approximated ones. Since new information was discovered during this cycle, a new hybrid reasoning cycle is required. The qualitative reasoner infers two new base relations, one being D∗ {O}(B, C) that the GeometricReasoner employs to refine D+ as shown in Fig. 13(a). The link I ∗ {S}D∗ propagates the refinement to region I ∗ as depicted in Fig. 13(b), while the approximations of other objects remain unchanged. The qualification process refines some edges of the constraint network and the whole cycle is run once more. In the last execution, no new information is discovered and the reasoner terminates. Comparing the resulting regions with the original ones shows that the reasoner works as intended since any approximation contains the original region
206
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
G
G B
C
C
D+ DH
B
(a) Approximation of D
H
I+ A
A
(b) Approximation of I
Fig. 13. Final approximations of D and I
it represents. Furthermore, the fact that multiple main cycles were needed to compute the final approximations demonstrates that the hybrid reasoner yields better results than the qualitative and geometric approaches would have been able to produce individually.
5
Experiments
The capabilities of the hybrid reasoner have been evaluated in a first experiment running the system over problem instances generated from the configuration of objects depicted in Fig. 11. 1100 random input sets were generated by randomly selecting objects that would be given geometrically and unknown objects only described using qualitative relations. The provided relations involving the unknown objects as either primary or reference object were also randomly chosen. In the experiment, we were mainly interested in two things: (1) the average percentage of area reduction achieved by the hybrid reasoner compared to the approximation obtained by applying the quantification directly to the input relations, and (2) the average number of cycles of the hybrid reasoning procedure before the algorithm terminated. We recorded these values for two different kinds of tests. In the first test only one object is qualitatively described, while the number of qualitative relations used to describe the object was varied. In the second test, the number of only qualitatively described objects was varied, while the number of qualitative input relations per object was constant at 13. The test results are summarized in Tab. 2 for the first test and in Tab. 3 for the second one. Both tests show in almost all cases a significant reduction Table 2. Results for a variable number of qualitative input relations Input Relations (#) 4 7 10 13 Avg. area reduction (%) 47,5 3,7 40 28,8 Avg. hybrid cycles (#) 3 3 3,03 3
A Hybrid Geometric-Qualitative Spatial Reasoning System
207
Table 3. Results for a variable number of qualitatively specified objects Removed Regions (#) 1 2 3 4 5 6 7 8 Avg. area reduction (%) 28,8 36,7 31,2 41 39,3 40,9 49,6 67,8 Avg. hybrid cycles (#) 3 3,02 3,03 3,05 3 2,93 2,85 2,49
in the approximated area by the hybrid reasoner, supporting the validity of the proposed hybrid approach. The fact that the number of reasoning cycles is almost constant at three demonstrates that indeed interlinking the two reasoning components gives better results than using one of the approaches individually as otherwise the computation would have stopped after one cycle.
6
Conclusions
We presented a hybrid reasoning system able to process mixed geometric and qualitative spatial information and derive new geometric and qualitative information from it. The reasoner combines a qualitative reasoning component based on the algebraic closure algorithm with a geometric reasoning component that employs an egg-yolk approach to represent objects with underdetermined geometry. The quantify and qualify functions translate qualitative relations into geometric information and back. Our detailed example as well as the results from a first experiment show that the reasoning system is able to derive more specific information (both on the qualitative and the geometric side) than the qualitative or the geometric component alone. We plan to extend the reasoning system by adding support for multi-tile relations of the currently implemented CDC and VC calculi and by incorporating further calculi, in particular dealing with relative direction and distance. We will also investigate to what extent the manual derivation of the constructive definition of acceptance areas for egg-yolk reference objects can be automated. Once the described extensions are made, we will evaluate the influence of different parameters such as the calculi used, the number of qualitative relations used, and the number of objects specified qualitatively in detail. Acknowledgments. The authors would like to thank Christian Freksa and the anonymous reviewers for valuable comments. Funding by the Deutsche Forschungsgemeinschaft (DFG) under grant IRTG GRK 1498 Semantic Integration of Geospatial Information is gratefully acknowledged.
References 1. Clementini, E., Di Felice, P.: Spatial operators. SIGMOD Record 29, 31–38 (2000) 2. Cohn, A.G., Hazarika, S.M.: Qualitative spatial representation and reasoning: An overview. Fundamenta Informaticae 46(1-2), 1–29 (2001)
208
G. De Felice, P. Fogliaroni, and J.O. Wallgr¨ un
3. Cohn, A.G., Renz, J.: Qualitative spatial reasoning. In: van Harmelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation. Elsevier, Amsterdam (2007) 4. Cohn, A., Gotts, N.: The ‘egg-yolk’ representation of regions with indeterminate boundaries. In: Geographical Objects with Undetermined Boundaries, pp. 171–187. Francis Taylor (1996) 5. Egenhofer, M.J.: A formal definition of binary topological relationships. In: Litwin, W., Schek, H.J. (eds.) FODO 1989. LNCS, vol. 367, pp. 457–472. Springer, Heidelberg (1989) 6. Egenhofer, M.J.: Query processing in spatial-query-by-sketch. Journal of Visual Languages and Computing 8(4), 403–424 (1997) 7. Fogliaroni, P., Wallgr¨ un, J.O., Clementini, E., Tarquini, F., Wolter, D.: A qualitative approach to localization and navigation based on visibility information. In: Hornsby, K.S., Claramunt, C., Denis, M., Ligozat, G. (eds.) COSIT 2009. LNCS, vol. 5756, pp. 312–329. Springer, Heidelberg (2009) 8. Freksa, C., Moratz, R., Barkowsky, T.: Schematic maps for robot navigation. In: Habel, C., Brauer, W., Freksa, C., Wender, K.F. (eds.) Spatial Cognition 2000. LNCS (LNAI), vol. 1849, pp. 100–114. Springer, Heidelberg (2000) 9. Gantner, Z., Westphal, M., W¨ olfl, S.: GQR – A fast reasoner for binary qualitative constraint calculi. In: Proceedings of the AAAI 2008 Workshop on Spatial and Temporal Reasoning (2008) 10. Goodchild, M.F.: Citizens as sensors: The world of volunteered geography. GeoJournal 69(4), 211–221 (2007) 11. Goyal, R., Egenhofer, M.: Consistent queries over cardinal directions across different levels of detail. In: Proceedings of the 11th International Workshop on Database and Expert System Applications, pp. 876–880 (2000) 12. Herring, J.: The OpenGIS abstract specification, Topic 1: Feature geometry (ISO 19107 Spatial schema), version 5. In: OGC Document, pp. 01–101 (2001) 13. Hugentobler, M.: Quantum GIS. In: Encyclopedia of GIS, pp. 935–939. Morgan Kaufmann Publishers Inc., San Francisco (2008) 14. Liu, W., Li, S., Renz, J.: Combining RCC-8 with qualitative direction calculi: Algorithms and complexity. In: Proceedings of the 21st International Joint Conference on Artifical Intelligence, pp. 854–859. Morgan Kaufmann Publishers Inc., San Francisco (2009) 15. Mackworth, A.K.: Consistency in networks of relations. Artificial Intelligence 8(1), 99–118 (1977) 16. Randell, D.A., Cui, Z., Cohn, A.: A spatial logic based on regions and connection. In: Principles of Knowledge Representation and Reasoning: Proceedings of the Third International Conference, pp. 165–176. Morgan Kaufmann, San Francisco (1992) 17. Renz, J., Nebel, B.: Qualitative spatial reasoning using constraint calculi. In: Aiello, M., Pratt-Hartmann, I.E., van Benthem, J.F. (eds.) Handbook of Spatial Logics, pp. 161–215. Springer, Heidelberg (2007) 18. Rigaux, P., Scholl, M.O., Voisard, A.: Spatial databases with application to GIS. Morgan Kaufmann, San Francisco (2002) 19. Sharma, J.: Integrated spatial reasoning in geographic information systems: Combining topology and direction. Ph.D. thesis, University of Maine (1996) 20. Skiadopoulos, S., Koubarakis, M.: Composing cardinal direction relations. Artificial Intelligence 152(2), 143–171 (2004)
A Hybrid Geometric-Qualitative Spatial Reasoning System
209
21. Tarquini, F., De Felice, G., Fogliaroni, P., Clementini, E.: A qualitative model for visibility relations. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 510–513. Springer, Heidelberg (2007) 22. Wallgr¨ un, J.O., Frommberger, L., Wolter, D., Dylla, F., Freksa, C.: Qualitative spatial representation and reasoning in the sparQ-toolbox. In: Barkowsky, T., Knauff, M., Ligozat, G., Montello, D. (eds.) Spatial Cognition 2007. LNCS (LNAI), vol. 4387, pp. 39–58. Springer, Heidelberg (2007) 23. Winter, S.: Bridging vector and raster representation in GIS. In: GIS 1998: Proceedings of the 6th ACM International Symposium on Advances in Geographic Information Systems, pp. 57–62. ACM, New York (1998) 24. W¨ olfl, S., Westphal, M.: On combinations of binary qualitative constraint calculi. In: Boutilier, C. (ed.) Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), pp. 967–973 (2009)
CLP(QS): A Declarative Spatial Reasoning Framework Mehul Bhatt, Jae Hee Lee, and Carl Schultz Spatial Cognition Research Center (SFB/TR 8) University of Bremen, Germany {bhatt,jay,cschultz}@informatik.uni-bremen.de
Abstract. We propose CLP(QS), a declarative spatial reasoning framework capable of representing and reasoning about high-level, qualitative spatial knowledge about the world. We systematically formalize and implement the semantics of a range of qualitative spatial calculi using a system of non-linear polynomial equations in the context of a classical constraint logic programming framework. Whereas CLP(QS) is a general framework, we demonstrate its applicability for the domain of Computer Aided Architecture Design. With CLP(QS) serving as a prototype, we position declarative spatial reasoning as a general paradigm open to other formalizations, reinterpretations, and extensions. We argue that the accessibility of qualitative spatial representation and reasoning mechanisms via the medium of high-level, logic-based formalizations is crucial for their utility toward solving real-world problems. Keywords: geometric and qualitative spatial reasoning, constraint logic programming, declarative programming, spatial computing, architecture design.
1
Introduction
Declarative programming is a paradigm concerned with the development of computational models that can solve problems directly from high-level domain specifications consisting of the core logic of computation, without a complete specification of the precise flow of control [37]. It is a model of computation aiming to solve a problem by specifying its ‘what ’, as √ opposed to its ‘how ’. As an example, consider the definition of a square root: { x = y, { y ≥ 0, y 2 = x}}. This definition of a square root does not say anything about the actual computation of square roots; it is a declarative specification of the concept of a square root constituting a rather extreme form of declarativeness: the holy grail of research in declarative programming. Within computer science, or specifically Artificial Intelligence (AI), several declarative computation paradigms, programming languages, and frameworks exist, primarily among them based on the theoretical foundations of Logic Programming (LP) [17, 33], Constraint Logic Programming (CLP) [29] and derivatives such as Abductive Constraint Logic Programming (ACLP) [31], Answer-Set Programming (ASP) [36], and Functional Programming [1]. Knowledge Representation and Reasoning (KR) research in AI is concerned with M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 210–230, 2011. c Springer-Verlag Berlin Heidelberg 2011
CLP(QS): A Declarative Spatial Reasoning Framework
211
the development of formalisms and systems that can deal with high-level knowledge, ranging from the abstract mathematical, to the ontological, spatial, temporal, action and change driven etc [2, 50]. The main focus of this paper is the integration of one such specialization in the field of Artificial Intelligence, i.e., qualitative spatial representation and reasoning [14], with a declarative approach to problem solving, i.e., constraint logic programming. Qualitative Spatial Representation and Reasoning. Qualitative Spatial Representation and Reasoning (QSR) provides a commonsensical interface to abstract and reason about spatial information. Formal methods in the field of QSR consist of qualitative spatial calculi, which are relational-algebraic systems pertaining to one or more aspects of space such as topology, orientation, direction, size [14].1 The basic tenets in QSR consist of constraint-based reasoning algorithms over an infinite (spatial) domain to solve consistency problems in the context of qualitative spatial calculi. The key idea here is to partition an infinite quantity space into finite disjoint categories, and utilize the special relational properties of such a partitioned space for reasoning purposes. The application of QSR mechanisms is a topic that is gaining significant momentum in the community [10]. One objective, from the viewpoint of these application goals, is the use of qualitative spatial abstraction and reasoning mechanisms that have been developed in QSR in domains involving representing and reasoning with static and dynamic spatial information., e.g., spatial design, geographic information systems, cognitive robotics etc. From a methodological viewpoint, the integration of formal qualitative spatial and temporal techniques within general commonsense reasoning frameworks in KR is a crucial next-step for their applicability in real-world domains [6]. In this paper, we are concerned with representing and reasoning with qualitative abstractions of spatial information within a declarative programming framework. A declarative programming interface to qualitative spatial representation and reasoning techniques serves as a natural way for systems and programmers to seamlessly access specialized spatial reasoning capabilities toward the development of intelligent spatial systems. With this hypothesis, we propose a Declarative Spatial Reasoning framework and demonstrate its applicability for real-world problem solving. Declarative Spatial Reasoning. We propose declarative spatial reasoning as a general paradigm for the integration of specialized qualitative spatial representation and reasoning techniques with declarative programming languages and frameworks such that qualitative spatial, and geometric reasoning capabilities may be directly exploited within state-of-the-art declarative knowledge representation and reasoning frameworks in Artificial Intelligence. A crucial motivation is to be able to declaratively specify and solve real-world problems related to spatial representation and reasoning. From the viewpoint of the main proposal presented in this paper: 1
The mereotopological RCC calculus, which was first proposed within a first-order logical framework, is an exception.
212
M. Bhatt, J.H. Lee, and C. Schultz Declarative spatial reasoning denotes the ability of declarative programming frameworks to handle spatial objects and the spatial relationships amongst them as native entities, e.g., as is possible with concrete domains of Integers, Reals and Inequality relationships. The objective is to enable points, oriented points, directed line-segments, regions, and topological and orientation relationships amongst them as first-class entities within declarative frameworks in AI.
The principal advantage of this mode of computation is that high-level specifications and constraints about the spatial domain of interest may be expressed in a manner that is similar or close to their conceptualization and modeling, e.g., as high-level clauses (i.e., facts, rules), of the declarative semantics of a logic program. Although this paper is categorically focused on spatial abstractions defined in QSR, our general approach lends itself to re-interpretations and extensions with other perspectives such as Visual and Diagrammatic Representations, as well as other cognitively-driven modalities or mental models of space. Basic Approach and Contribution Spatial representation and reasoning problems may be approached in a multitude of representational forms, ranging from qualitative, visual, diagrammatic, purely geometric etc. The underlying thread in any approach is that the spatial domain is special, and similar to other specialized sub-disciplines in AI (e.g., ontological, action and change driven), the development of an intelligent spatial representation and reasoning capability, regardless of its methodological underpinnings, requires its own set of specialized techniques and algorithms. Keeping in mind such specializations, the fundamental question that arises is: By what means may specialized qualitative spatial representation and reasoning techniques be integrated within general knowledge representation and reasoning techniques in Artificial Intelligence?
This question is rather open-ended, and acquires several different interpretations depending on the precise KR technique, and aspects of space, actions, events, and change being considered [6]. In this paper, we focus on one concrete interpretation: How may specialized qualitative spatial representation and reasoning techniques be embedded within the state-of-the-art declarative programming approach of CLP ?
An embedding of this nature (i.e., with methods such as CLP) would imply that spatial representation and reasoning problems may be directly modeled within state-of-the-art KR techniques, and thus be directly usable by the direct benefactors of AI tools and frameworks, in this case, users of CLP. This particular
CLP(QS): A Declarative Spatial Reasoning Framework
213
interpretation, and its prototypical implementation in this paper, is guided by the hypothesis that: The accessibility of generic qualitative spatial abstraction and reasoning techniques, e.g., qualitative spatial calculi, qualification and consistency algorithms, via the interface of declarative programming frameworks such as constraint logic programming is crucial, and presents one model for their applicability toward real-world problem solving.
This paper illustrates its key propositions by developing the declarative spatial reasoning framework CLP(QS), which is an integration of a polynomial-based CLP framework with a qualitative spatial domain QS consisting of point and region based spatial calculi. To achieve the integration, we characterize the spatial domain QS using a system of non-linear polynomial equations in a manner that is consistent with the chosen CLP framework. The applicability of CLP(QS) is demonstrated by way of qualitatively abstracted geometric reasoning problems from the domain of Computer-Aided Architecture Design (CAAD). Since our focus is on real-world problem solving, all exemplified data-sets, i.e., the CAAD models, are sourced and generated from professional design tools, and they conform to industry scales and standards. Organization.The paper is organized as follows: Section 2 provides an overview of declarative spatial reasoning and application-guided motivations thereof. Section 3 presents CLP(QS): included are the qualitative spatial domain QS consisting of positional and topological spatial calculi, a polynomial characterization for QS, and its implementation, and (methodological and empirical) evaluation. Section 4 demonstrates the application potential for CLP(QS) for realworld problem solving in the domain of computer-aided architecture design. In Section 5, we discuss the relationship of our work and contributions with respect to existing research. In Section 6 we conclude and provide research perspectives.
2
What Is Declarative Spatial Reasoning?
Declarative spatial reasoning, in so far as its broad conception as a paradigm is concerned, is intuitively best understood with respect to the kinds of computational (and by implication, representational) challenges that it aims to address. The kinds of fundamental reasoning tasks that may be identified within the purview of declarative spatial reasoning span a wide spectrum, e.g., including reasoning patterns such as spatial property projection, spatial simulation, spatial planning (e.g., for configuration problems), hypothetical reasoning (e.g., for abductive explanation) with spatial information to name a few. Both within and beyond the range of domains identified in this paper, these are reasoning problems that involve an inherent interaction between space, actions, events, and spatial change in the backdrop of domain-specific knowledge and commonsense knowledge about the world [6]. For this paper, we restrict ourselves to the domain of space alone.
214
2.1
M. Bhatt, J.H. Lee, and C. Schultz
Need for a Declarative Interface
One of the principal reasons behind the success of declarative (logic and constraint logic) programming frameworks has been their ability to provide a range, and combination thereof, of constraint and logic-based reasoning abilities in the context of a high-level first-order language that may be used to directly encode a domain of interest.2 For instance, consider the following fragment of Prolog code that recursively computes the transitive closure of a relationship R (e.g., one may imagine this to be a traversal problem): t closure R(X, Y) :- R(X, Y). t closure R(X, Y) :- R(X, Z), t closure R(Z, Y).
Here, the search for the transitive closure t closure R, and term unification (based on syntactic equality) for the variables involved (i.e., X, Y, Z) is built into a logic programming language such as Prolog. The precise semantics of the terms being unified does not acquire any special significance since equality is the only relation that is available for term unification. Constraint logic based extensions make it possible to utilize inequalities and the existence of inequality constraints over Integers, Reals etc. For instance, now consider a rather specialized fragment that computes the set of points that are inside of a 3D solid object (e.g., a sphere). This could be imagined to be a reasoning task in the context of Constructive Solid Geometry (CSG) [27], with the following fragment requiring a constraint solver for quadratic and linear constraints: entity(ball, sphere(center(1,1,1), radius(5)). inSphere(point(X, Y, Z), sphere(center(Cx, Cy, Cz), radius(R))) :(X - Cx)*(X - Cx)+(Y - Cy)*(Y - Cy)+(Z - Cz)*(Z - Cz) <= R*R, R>=0.
What we are aiming at is the general ability to declaratively refer to highlevel statements about real and hypothetical spatial worlds, without the need to specify problems with the level of formalization in the above. Declarative programming frameworks such as CLP are indeed not pre-equipped to deal with spatial reasoning, or general spatial computation capabilities. For instance, a CLP engine cannot understand the semantics of space, and spatial relations such as inside, front-of , or in general, the semantics of relational spatial systems constituted by formal qualitative spatial calculi. By analogy, this is similar to the case where a general logic programming language such as Prolog is not intended to understand the meaning of the predicate ‘likes’ in the statement ‘likes(john, films)’, or complex taxonomic structures formalized therefrom without giving an explicit formalization of the descriptive semantics; such specialized ontological reasoning would instead fall within the purview of Description Logic reasoners. Consider the domain of spatial computing for design [8]: automated computer-assisted architecture design (CAAD) systems require the capability to solve structural and 2
This is applicable to other declarative frameworks such as functional programming. However, this paper will specifically deal with logic and constraint-logic based programming approaches.
CLP(QS): A Declarative Spatial Reasoning Framework
215
functional design requirement consistency problems. Such problems are expressible as spatial constraints —topological, orientational, size— among the domain entities (i.e., regions, line-segments and points) that constitute a CAAD model. The following scenario constitutes a design requirement: Security / Privacy. A typical design requirement may entail that certain parts of the environment may or may not be visible or readily accessible. For instance, it may be desired that the WashRoom is as isolated as possible from other workareas, or that the main entrance area be within the reach of sensing apparatuses such as an in-house Camera.
This constraint may, for instance, be directly encoded at a higher-level of abstraction within a rule-based programming mode3 ; in the following example the operational space denotes the region of space that an object requires to perform its intrinsic function, and the range space denotes the region of space that lies within the scope of a sensory device [9]: secure by(Door, Sensor) :physical geometry(Door, PGeom), operational space(PGeom, OpSpace), range space(Sensor, RgSpace), topology(OpSpace, RgSpace, inside).
The ability to declaratively handle spatial entities, and the topological and orientation relationships amongst them is only part of the story; in Section 2.2, which is to follow, we present a general class of application requirements. 2.2
Key Application Requirements
Application domains that involve spatial information processing typically require the following fundamental capabilities: domain constraints. express (spatial) constraints between domain entities by way of high-level rules, e.g., of the kind typically expressible within a logic programming framework consistency. check for (in)consistency of the rules, involving checking for spatial consistency, by considering the special properties that a domain such as space merits. For instance, this involves using the specialized spatial representation and reasoning mechanisms developed within the QSR community hypothetical reasoning. perform hypothetical reasoning at the qualitative spatial level, involving reasoning about what could be, on the basis of what is. Here, the key requirement is to use the special properties of the spatial relationship space and commonsense knowledge about space to derive those spatial configurations that are physically realizable. In a dynamic context, which is not addressed in this paper, this also translates to the task of scenario and narrative 3
The scenario is further built-up and illustrated in Fig. 4; Section 4.
216
M. Bhatt, J.H. Lee, and C. Schultz
completion, e.g., by spatio-temporal abduction. In conjunction with quantification (see below), this may be used to support a recommendation function in a spatial design context [8]. quantification. to compute quantifications that provide a metric grounding for the hypothesized spatial scenarios. This capability could, for instance, serve as the low-level computational foundation for high-level reasoners capable of abductive reasoning with spatial scenarios, and narratives of spatio-temporal information [7]. In principle, by broadening the interpretation of spatial reasoning, many more specialized spatial reasoning and computing requirements may be identified, e.g., spatio-temporal projection, simulation, planning, abductive explanation, inductive generalization, spatial similarity and matching, spatial data merging and integration [6]. Regardless, it is imperative that such facilities have to be provided in a domain neutral manner.
3
CLP(QS): A Declarative Spatial Reasoning Framework
Constraint Logic Programming (CLP) is a form of constraint programming in which logic programming is extended to include concepts from constraint satisfaction [29]. The CLP framework combines methods in Constraint Programming (CP) with Logic Programming (LP) techniques, thereby providing for a seamless integration of logical methods with algebraic techniques. A constraint logic program is essentially a logic program containing constraints (e.g., over the domain R) in the body of clauses. The difference is that, whereas within a LP problem solving is reduced to syntactic unification and theorem-proving, within CLP the task is to interpret term unification as more than equality by regarding it as a constraint system. Several CLP solvers exist: CLP(R) [30] and Prolog(III) for solving constraints over real numbers, the RISC-CLP(Real) for non-linear real constraints [28], CLP(RL) [46] for first-order formulas over various numeric domains, Abductive CLP [31] and so on.4 3.1
The Qualitative Spatial Domain QS
Qualitative spatial calculi can be classified into two groups: topological and positional calculi. With topological calculi such as the Region Connection Calculus (RCC) [40] and the 9-Intersection Model [22], the primitive entities are spatially extended regions of space, and in the case of the mereotopological RCC system, could possibly even be 4D spatio-temporal histories, e.g., for motionpattern analysis. Alternately, within a dynamic domain involving translational motion, point-based abstractions with orientation calculi suffice. Fig 2(a) is a 2D illustration of the RCC-8 relations. Other spatial calculi include the LR calculus [45], the Oriented-Point Relation Algebra (OPRAm ) [39], the Double-Cross Calculus [24], and the line-segment based Dipole Calculus [43]. 4
As further described in Section 3.2, a solver in the class of RISC-CLP(Real) is relevant from the viewpoint of this paper.
CLP(QS): A Declarative Spatial Reasoning Framework
(a)
217
(b)
Fig. 1. Illustrations of LR relation p1 p2 r p3 and OPRA2 relation p1 2 ∠27 p2
In this paper, we focus on two-dimensional point-based spatial calculi for representing intrinsically and extrinsically referenced orientation information. Spatial calculi such as the Oriented-Point Relation Algebra (OPRAm ), the LR calculus, ST AR calculus, Dipole calculus (DP), Double Cross Calculus (DCC) are applicable in this context. The spatial domain QS consists of: LR, OPRAm , ST AR, DP, DCC, and topological calculi such RCC and the 9-Intersection model restricted to a specific class of 2D polygonal regions.5 Our formalization of the qualitative spatial domain QS relies on the support for non-linear real constraints within CLP [28]. 3.2
A CLP Based Polynomial Characterization for QS
The decidability of the problem of solving first-order polynomial constraints over reals (also known as the Quantifier Elimination (QE) problem) was found by Tarski [47]. After the seminal contribution by Collins [15], QE over reals has been extensively investigated, where the main focus was to cope with the intrinsic doubly exponential complexity of the problem [18]. There are several approaches to this problem, the cylindrical algebraic decomposition algorithm by Collins [15, 16] being the most prominent approach ([38] provides a concise overview on this topic). In what follows we present qualitative spatial relations from LR, OPRAm as polynomial constraints over reals, of which the decision problem is in PSPACE [11] in the number of objects, and RCC as first-order constraints over the LR calculus, of which the worst case complexity for the decision problem is known to be doubly exponential [18]. Thereby, we are not only able to decide the problem, but we are also capable of providing a model of the solution by means of quantification. Throughout the following descriptions all points are two-dimensional points from the Euclidian plane. LR calculus. The domain of the LR calculus [45] is the set of all points in the Euclidian plane. A LR relation describes for three points p1 = (x1 , y1 ), 5
In a number of application domains (e.g., architecture, construction IT, urban planning), the input data for describing regions can be adequately represented by polygons.
218
M. Bhatt, J.H. Lee, and C. Schultz
p2 = (x2 , y2 ), p3 = (x3 , y3 ) the direction of p3 with respect to p1 , where the orientation of p1 is determined by p2 . There are altogether nine LR relations; seven relations for points are depicted in Fig 1(a) : left, right, f ront, start, inbetween, end, back. In Fig 1(a) the Euclidian plane is partitioned by points p1 and p2 , p1 = p2 into seven regions: two half-planes (l, r), two halflines (f , b), two points (s, e), and a line segment (i). These regions determine the relation of the to p1 and p2 . The remaining third point two relations are: double := (p1 , p2 , p3 ) p1 , p2 , p3 ∈ R2 , p1 = p2 , p1 = p3 , triple := {(p1 , p2 , p3 ) | } p1 , p2 , p3 ∈ R2 , p1 = p2 = p3 . By describing the relations using polynomial constraints, we obtain the correspondences (1)–(12), where we introduce a new point p4 for the equivalences (9), (6) and (3), if there is no point p4 , such that p1 p2 r p4 . The polynomial 1 x1 y1 constraints in (1) and (2) come from the determinant of the matrix 11 xx2 yy2 , whose sign determines the relative po3 3 sition of points p1 , p2 , p3 , where −, 0, + mean clockwise orientation, collinear, counterclockwise orientation, respectively. p1 p2 l p3 ≡def x2 y3 + x1 y2 + x3 y1 − y2 x3 − y1 x2 − y3 x1 > 0 p1 p2 r p3 ≡def x2 y3 + x1 y2 + x3 y1 − y2 x3 − y1 x2 − y3 x1 < 0
(1) (2)
p1 p2 b p3 ≡def x2 y3 + x1 y2 + x3 y1 − y2 x3 − y1 x2 − y3 x1 = 0 ∧ p1 p2 r p4 ∧ p4 p1 l p3
(3) (4)
p1 p2 s p3 ≡def x3 = x1 ∧ y3 = y1 ∧ x3 = x2 ∧ y3 = y2 p1 p2 i p3 ≡def x2 y3 + x1 y2 + x3 y1 − y2 x3 − y1 x2 − y3 x1 = 0
(5) (6)
∧ p1 p2 r p4 ∧ p4 p1 r p3 ∧ p4 p2 l p3 p1 p2 e p3 ≡def x3 = x2 ∧ y3 = y2 ∧ x3 = x1 ∧ y3 = y1 p1 p2 f p3 ≡def x2 y3 + x1 y2 + x3 y1 − y2 x3 − y1 x2 − y3 x1 = 0 ∧ p1 p2 r p4 ∧ p4 p2 r p3
(7) (8) (9) (10)
p1 p2 d p3 ≡def x1 = x2 ∧ y1 = y2 ∧ x1 = x3 ∧ y1 = y3
(11)
p1 p2 t p3 ≡def x1 = x2 = x3 ∧ y1 = y2 = y3 ,
(12)
OPRAm calculus. The domain of the OPRAm calculus is the set of all oriented points. An oriented point p is a quadruple (x, y, v, w), x, y, v, w ∈ R, where (x, y) is the location of p, and (v, w) defines the orientation of p by means of the orientation vector op := (v, w) − (x, y). Two orientated points p1 and p2 are equal if their positions and orientations are equal. With m lines passing through p, we can partition the whole plane (without the point itself) equally into 2m open sectors and 2m half-lines, where exactly one distinguished halfline has the same orientation as op . Starting with the distinguished half-line, and going through the sectors and half-lines alternately in the counterclockwise order, we can assign numbers 0 to 4m − 1 to the open sectors and half-lines (See Fig 1(b)). An OPRAm relation is a binary relation which describes for points p1 and p2 their positions relative to each other with respect to the aforementioned
CLP(QS): A Declarative Spatial Reasoning Framework
219
partitioning. This is represented by p1 m ∠ji p2 , where m is as defined before, i is the number of the sector (or half-line) of p1 , in which p2 is located, and j is the number of the sector (or half-line) of p2 , in which p1 is located.6 Then for p1 = (x1 , y1 , v1 , w1 ), p2 = (x2 , y2 , v2 , w2 ), and the rotation map rx (v, w, θ) cos θ − sin θ v := ry (v, w, θ)
sin θ cos θ
w
we can define for i = 0, 2, . . . , m − 4, m − 2: p1 m ∠∗i p2
1
≡def
det ∧
x1 y1 π π 1 rx (v1 ,w1 ,i m ) ry (v1 ,w1 ,i m ) 1 x2 y2
=0
1
x1 y1 1 rx (v1 ,w1 ,(i+ m ) π r v ,w1 ,(i+ m )π 2 m ) y( 1 2 m) 1 x2 y2
det
< 0,
which describe that p2 is in half-line i of p1 , and for i = 1, 3, . . . , m − 3, m − 1: 1 x1 y1 π π p1 m ∠∗i p2 ≡def det 1 rx (v1 ,w1 ,(i−1) m ) ry (v1 ,w1 ,(i−1) m ) >0 1 x2 y2 1 x1 y1 π π ∧ det 1 rx (v1 ,w1 ,(i+1) m ) ry (v1 ,w1 ,(i+1) m ) < 0, 1
x2
y2
which describe that p2 is in sector i of p1 . Then p1 m ∠ji p2
≡
p1 m ∠∗i p2
∧
p2 m ∠∗j p1 ,
and we obtain the desired polynomial constraints. Other point-based calculi. Other point-based calculi like the ST AR calculus, dipole calculus, single-cross, and double-cross calculus can be modeled in a similar way as described previously. Indeed, all of the point-based calculi mentioned in this paper except for ST AR can be reduced to OPRAm [21], which has been formally characterized in this paper. Topological calculi. We model topological relations (Fig. 2(a)) [22, 40] between regions of a restricted class, namely, polygons that can consist of disconnect pieces and can contain holes. Firstly, we define a region R as a simple polygon given by a sequence of 2D points, i.e., R := (p1 , p2 , . . . , pm ), m ≥ 3. Region R is convex, i.e., pi pi+1 l pi+2 , for all i = 1, 2, . . . , m − 1, where we set pm+1 := p1 (See Fig 2(b)). With this setting we can decide the relative position of a point q with respect to a convex region R (i.e., whether the point is inside, in the interior, in the boundary, or outside of R) by using the constraints from the LR calculus. Table 1 shows the correspondences. As any simple polygon can be partitioned into convex polygons in polynomialtime [12], and is therefore a disjunction of convex polygons, our definitions are naturally extended to concave polygons. 6
The original paper [39] also introduces the so-called same relations for two coinciding oriented points, which are differentiated by their orientations. Since these cases are primarily relevant for relation-algebraic reasoning, they are omitted in this paper.
220
M. Bhatt, J.H. Lee, and C. Schultz
(a) RCC-8 Relations
(b) A convex, polygonal region (p1 , p2 , p3 , p4 , p5 )
Fig. 2. Topological Relations
Table 1. The correspondence table
p ∈ R◦
∀i (pi pi+1 l p)
p ∈ ∂R
∃i (pi pi+1 s p ∨ pi pi+1 i p ∨ pi pi+1 e p)
p∈R
C
∃i : pi pi+1 r p
p∈R
p ∈ R◦ or p ∈ ∂R
p∈R∩S
p ∈ R and p ∈ S
p∈R∪S
p ∈ R or p ∈ S
p ∈ R\S
p ∈ R ∩ SC
Table 2. The correspondence table for complex regions R and S R dc S
∀p p ∈ S ⇒ p ∈ RC
R po S
∃p (p ∈ ∂R ∩ ∂S) ∧ ∀p p ∈ R ⇒ p ∈ ∂S ∪ S C
∃p, p , p p ∈ R ∩ S ∧ p ∈ R\S ∧ p ∈ S\R
R tpp S
∃p (p ∈ ∂R ∩ ∂S) ∧ ∀p (p ∈ R ⇒ p ∈ S)
R tppi S
∃p (p ∈ ∂R ∩ ∂S) ∧ ∀p (p ∈ S ⇒ p ∈ R)
R ntpp S
∀p (p ∈ R ⇒ p ∈ S ◦ )
R ntppi S
∀p (p ∈ S ⇒ p ∈ R◦ )
R eq S
∀p (p ∈ R ⇔ p ∈ S)
R ec S
On the basis of this information we can further decide the RCC relations between regions R and S. This is shown in Table 2. Our encoding of the semantics of topological relations, and the restricted class of regions that we admit, suffices for a characterization of topological relations for both the 9-Intersection Model [22] as well as the mereotopological Region Connection Calculus [40].
CLP(QS): A Declarative Spatial Reasoning Framework
(a)
Algebraic closure cannot detect
(b)
221
To be consistent with the CA
the OPRA2 inconsistency that b,c,d
constraints, d must also overlap a
must be colinear, a,c,d must be colin-
while being disjoint from b. Algebraic
ear, but a,b,c,d must not be colinear.
closure cannot detect that this is impossible on a 1D acyclic domain of intervals.
Fig. 3
3.3
Relational Algebraic and Polynomial Characterizations
The algebraic-closure method, which utilizes the relational algebraic structure of qualitative calculi, provides a sound and complete algorithm for calculi like Allen’s Interval Algebra or RCC8. However, the method is not sound for pointbased 2D calculi, including LR or OPRAm as shown in [25, 52], on which our applications are based. By contrast, the quantifier elimination problem as addressed at the beginning of Section 3.2 is decidable and has effective decision methods like the cylindrical decomposition algorithm [15]. In what follows, we address the weaknesses of the relation algebraic approach. These weaknesses call for the quantifier elimination approach underlying our formulation of QS. Failing on atomic networks. For calculi that are not closed under constraints [41], algebraic closure is not able to determine the consistency of atomic networks. For example, using OPRAm [25] let a,b be distinct oriented points directly facing points c,d. We make this inconsistent by placing a,b to the left-rear of c, and d to the front-right of c (see Figure 3(a)),7 opra2 (a,b,1,7), opra2 (a,c,0,3), opra2 (b,c,0,3),
opra2 (c,d,7,3), opra2 (a,d,0,3), opra2 (b,d,0,3).
Algebraic closure cannot detect this inconsistency. Other examples include LR [52] and IN DU [4]. Failing on interpretations. Changing the domain of interpretation in a seemingly trivial manner can dramatically change the effectiveness of algebraic closure. Moreover, an interpretation that is suited to one calculus can be precisely 7
The predicate opram (p1 ,p2 ,i,j) represents the OPRAm relation p1 m ∠ji p2 .
222
M. Bhatt, J.H. Lee, and C. Schultz
Fig. 4. Work-in-progress floorplan of an office (range space is illustration only)
the interpretation that causes algebraic closure to fail in another calculus, making it tricky to freely mix and swap calculi. An example is the containment algebra (CA) [35].8 Algebraic closure cannot infer that a CA cycle of at least four intervals requires at least two spatial dimensions or a cyclic domain (see Figure 3(b)), partialOverlap(a,b), partialOverlap(c,d), disjoint(a,c),
3.4
partialOverlap(b,c), partialOverlap(d,a), disjoint(b,d).
CLP(QS): Implementation Overview
In order to test the basic principles of CLP(QS), we have implemented a prototype through a loose integration (in C++) between SWI-Prolog [51] for high-level reasoning and REDUCE [26] for solving polynomial equations. The REDLOG package [19] of the computer algebra system REDUCE allows quantifier elimination over Reals. Prolog manages the control of query-answering by building REDLOG expressions for LR relations as described in Section 3.2 using specialized predicates such as i LR(P1,P2,P3). Although optimization and benchmarking are not the aims of this paper, it must be noted that the complexity of general polynomial systems is doublyexponential [18].9 As future outlook of this aspect of the work, we are interested in applying constraint optimizations, investigating more computationally efficient relation encodings, and developing a dedicated solver for QS consistency and quantification problems (Section 6).
4
CLP(QS): An Application in Architecture Design
In this section we describe the role of CLP(QS) in the domain of Computer Aided Architectural Design (CAAD). During the process of designing a building, 8
9
In CA, algebraic closure cannot decide consistency when interpreted in the domain of linearly ordered intervals [35]. For example, this could be 1D axis-aligned blocks motivated by the block-algebra for p = 1, or spatial intervals along an acyclic path motivated by Allen’s interval algebra. The trap for developers is that it can decide consistency of atomic networks for less structured domains (i.e. RCC-5 [42]) such as closed disks [20] and the block-algebra for p > 1. Most poor run-times have been experienced with our sample data for topological queries where a number of polygon vertices are ungrounded.
CLP(QS): A Declarative Spatial Reasoning Framework
223
architects routinely analyze an enormous amount of detailed information in order to determine whether certain requirements have been met. These can range from meeting strictly objective safety codes, to eliciting subjective, emotional responses, and typically employ geometric and high-level spatial features. Imagine that an architect is designing the floorplan of an office. Figure 4 illustrates a work-in-progress design. We will now demonstrate how CLP(QS) can be used for checking the consistency of high-level constraints and for quantification. 4.1
Consistency
The structural form corresponding to high-level design constraints (e.g., coming from design guidelines, designer expertise, client requirements) may be conceived and directly translated into a high-level declarative specification. Some examples follow: Safety. Doors should not open onto areas where a person might be located while occupied with some activity. Formally, the operational space of doors should not overlap with the functional space of activity objects such as the washbasin, safety(Door, Object) :operational space(Door, Op), functional space(Object, Fs), not(topology(Op, Fs, overlaps)).
CLP(QS) detects that the current design does not meet this constraint for the washbasin (see Figure 5(a)). The architect changes the opening direction of the door which causes a new problem as detected by CLP(QS) (see Figure 5(b)), noCollidingDoors(Door1, Door2) :operational space(Door1, Op1), operational space(Door2, Op2), not(topology(Op1, Op2, overlaps)).
The architect resolves the problem by sliding the door down the wall (see Figure 5(c)). Security. The architect must position a camera so that people entering the office can be identified. This is formalized using the range space of the camera and the operational space of the entrance door, secure by(Door, Camera) :physical geometry(Door, G), operational space(G, O), range space(Camera, R), topology(O, R, inside).
CLP(QS) confirms that the design currently satisfies this constraint by translating it into the equivalent LR constraint10 10
For clarity we have assumed in the example that the regions are convex and do not have holes. As we described in Section 3.2, regions with holes can also be handled.
224
M. Bhatt, J.H. Lee, and C. Schultz
(a) Wash Sink and Door
(b) Doors
(c) IncorrectSensorPosition
(d) Sensor Repositioning Fig. 5. Design (In)Consistency
ntpp(O,R) ≡ ∀p · inside(p,O) → interior(p,R) O R R ≡ ∀p ( ¬∃ i · r(pO i ,pi+1 ,p) → ∀j · l(pj ,pj+1 ,p) ) .
Privacy. Security cameras must not be able to record inside the adjoining bathroom. That is, cameras must be directed away from the bathroom, privacy(Bathroom, Camera) :oriented point(Camera, C), physical geometry(Bathroom, G), not orientation(C, G, facing).
Facing can be interpreted as the constraint:
facing(C,G) ≡ ∃p ( inside(p,G), ∃i ∈ {15, 0, 1} opra4 (C,p,i,*)).
On the privacy constraint, CLP(QS) detects that the current design is inadequate. The architect attempts to solve the problem by rotating the camera to face away from the bathroom, however, CLP(QS) detects that this causes the security constraint to fail. 4.2
Quantification
We will now get CLP(QS) to suggest a location and orientation of the camera that satisfies the security and privacy constraints. To do this we will specify the location and orientation of the camera as an ungrounded variable, and further constrain its domain of acceptable configurations. Position. The camera must be mounted on one of the perimeter walls of the office,
CLP(QS): A Declarative Spatial Reasoning Framework
225
onPerimeter(Camera, Office) :∃W ( isWall(w), in(W, Office), physical geometry(W, R), oriented point(Camera, P), topology(P, R, boundary)).
This is equivalent to the LR constraint ∃j ( s(wj1 ,wj2 ,p) or i(wj1 ,wj2 ,p) or e(wj1 ,wj2 ,p)) .
where a wall Wj is the line segment (w1j , w2j ). Visibility. A region R is visible from an observation point p (in a room with objects Objs) if there is at least one unobstructed line between the observer and some point within the region, visible(R,p,Objs) :∃p2 ( ∀S ∈ Objs/{R} ( inside(p2,R), not( lineIntersect(p,p2,S)) )).
If region R is a sequence of points, then the predicate for determining whether a line intersects a region is lineIntersect(p1,p2,R) :∃i (pi ,pi+1 ∈ R, lineIntersect(p1,p2,pi ,pi+1 )).
where line intersection is defined using LR relations, lineIntersect(a1,a2,b1,b2) :∃p ( (i(a1,a2,p) or s(a1,a2,p) or f(a1,a2,p) ) and (i(b1,b2,p) or s(b1,b2,p) or f(b1,b2,p) )).
CLP(QS) suggests the location and orientation illustrated in Figure 5(d).
5
Discussion and Related Work
Researchers have investigated high-level modeling and reasoning with spatial knowledge; most direct connections of our work emerge with the works by Almendros-Jim´enez [3], Banerjee and Chandrasekaran [5], Bhatt et al. [9], Escrig and Toledo [23], Kurup and Cassimatis [34], Uribe et al. [49]. Related work also exists in the constraint databases community, although there exist fundamental differences between constraint query languages and CLP [13, 32]. Focussing on the knowledge representation approach, we differentiate our approach and contributions with respect to three main aspects: 1. Use of qualitative spatial calculi, as construed within QSR, whilst preserving their relational semantics (and therefore, where applicable, their cognitive and psycholinguistic underpinnings). This is crucial in domains such as design where constraints are tightly connected to their high-level conceptualization.
226
M. Bhatt, J.H. Lee, and C. Schultz
2. Providing an underlying polynomial characterization for a range of topological and positional spatial calculi, thereby: (a) overcoming several limitations, in a relational algebraic sense, of conventional compositional reasoning with precomputed composition tables (b) providing inherent support for quantification of relational spatial information 3. Enabling spatial entities and qualitative relations as first-class, native objects within the declarative framework of classical constraint logic programming systems, and providing a prototypical implementation and demonstration of its application for real-world problem solving. The declarative approach adopted by Bhatt et al. [9] uses a description logic based reasoning technique for ensuring spatio-terminological consistency. In this work, only topological consistency may be checked for using a terminological system, and it is not possible to solve constraints, quantify, or specify positional constraints. The work by Uribe et al. [49] aims to construct a module that can provide spatial reasoning services in the context of a larger system that is not necessarily restricted to spatial knowledge. Their view of spatial knowledge is grounded in QSR, and: (a) aims at providing question-answering support within a first-order theorem prover, (b) solely relies on spatial reasoning by incorporating composition tables within (the prover). This is different from our approach in two major respects: (1) we utilize constraint logic programming, where problem-solving is based on constraint solving foundations, as opposed to a theorem-proving approach; and (2) our polynomial characterization of the spatial domain QS does not utilize compositional reasoning thereby overcoming the limitations of reasoning with composition tables (as elaborated on in Section 3.3). The work by Escrig and Toledo [23] may also be situated in this category, since here too spatial reasoning (with positional calculi) is performed using composition tables. It is worth noting here that spatial representation is performed using encoding composition tables using constraint handling rules11 framework. Almendros-Jim´enez [3] approach the problem of constraint solving over sets of spatial objects by considering spatial constraints as an instance of CLP. Notwithstanding the fact that Almendros-Jim´enez neither addresses QSR methods, nor utilizes any form of linear or non-linear formalization, there exist interesting potentials when considering the synergy afforded by the contributions of [3] and this paper: their in-depth study on the operational semantics of the CLP solver, i.e., the interaction of the spatial constraint solver with the underlying resolution mechanism in CLP, whilst considering constraints over sets of spatial objects, provides useful insights for a future extensions of our prototypical CLP(QS) framework in a manner such that it may be tightly integrated within state-of-the-art CLP engines (e.g., as a specialized spatial reasoning library for Eclipse). 11
Constraint Handling Rules (CHR) are a special purpose language designed to write and combine constraint systems [44]. CHR have been used to encode a range of constraint handlers (solvers), including domains such as terminological and temporal reasoning.
CLP(QS): A Declarative Spatial Reasoning Framework
227
Banerjee and Chandrasekaran [5] deal with a range of spatial perception and action problems from a diagrammatic reasoning perspective. Whereas they employ similar underlying methods (namely, quantified constraint satisfaction) for solving spatial problems, their approach does not seek a direct integration with CLP or other declarative programming approaches. Furthermore, the equivalent of the spatial domain QS in their work consists of a diagrammatic representation, where in our case, QS is founded on qualitative spatial calculi in QSR. As emphasised in this paper, our perspectives on the spatial domain QS within declarative spatial reasoning are subject to reinterpretations and extensions: from our perspective, we see interesting synergies and possibilities to compare different formalizations for QS possibly encompassing visual and diagrammatic models of space. Methodologically similar to the work of Banerjee and Chandrasekaran [5] in its use of a diagrammatic representations is the work by Kurup and Cassimatis [34]. Within a propositional logic framework, Kurup and Cassimatis [34] approach the spatial reasoning problem by integrating a diagrammatic representation with a DPLL-based backtracking algorithm that is specialized for spatial relations of objects in a grid. This approach is efficient compared to other approaches using SAT solvers or SMT solvers. However, it will not find appropriate solutions in the real world, as spatial reasoning in a grid leads to information loss. Similar to the work by Uribe et al. [49], the approach of Kurup and Cassimatis [34] is also not grounded to the formal semantics of qualitative spatial calculi, but instead, utilizes a diagrammatic representation. The principal motivation of their approach is to overcome the limitations of (diagrammatic) spatial reasoning with propositional satisfiability solvers. In doing so, they show the manner in which the DPLL algorithm augmented with diagrammatic reasoning can be used to make SAT more efficient when reasoning about spatial relations in a grid. Indeed, none of the related works discussed so far consider quantification of relational spatial information, with only the approach of Banerjee and Chandrasekaran [5] being quantification capable. Quantification, in our case, has been identified, implemented, and demonstrated to be a crucial computational requirement within applications.
6
Conclusion and Outlook
We have put forward a case for the accessibility of specialized spatial representation and reasoning mechanisms via the medium of high-level, logic-based formalizations in KR. This entails basic scientific challenges, and is motivated by need to solve specialized real-world problems in spatial representation and reasoning, and applying QSR in such application scenarios. The core contributions of this paper lie in the integration of qualitative spatial representation and reasoning with constraint logic programming. Given the support for polynomial systems within constraint logic programming, our method is directly realizable within state-of-art CLP solvers with support for polynomials. We have demonstrated a prototypical implementation of this approach, and also illustrated its applicability toward practical spatial computing in the domain of computer-aided architecture design.
228
M. Bhatt, J.H. Lee, and C. Schultz
From a theoretical perspective, the general motivating principle of our ongoing research is that any form of high-level spatial reasoning (e.g., spatial projection, explanation etc) will rely on some form of spatial constraint solving capability, in addition to other forms of non-classical inference patterns such as non-monotonic inference, spatial belief revision capabilities etc. Hence, integration, along the lines of CLP(QS), with other declarative frameworks such as Answer-Set Programming, Event Calculus is one line of work presenting interesting challenges. The general motivating principle here is that any form of high-level spatial reasoning (e.g., spatial projection, explanation etc) will rely on some form of spatial constraints solving capability, in addition to other forms of non-classical inference patterns such as non-monotonic inference, spatial belief revision capabilities etc. From an application perspective, we have so far considered a rather limited range of problems solely within the context of architecture design. Given the generality of CLP(QS), there exist several possibilities for studies with other application domains; here, geographic information systems, and cognitive robotics are a prime candidates in our ongoing projects. With this backdrop, we are also extending CLP(QS) in order to include support for reasoning with a high-level, qualitative model for 3D visibility [48]. Acknowledgements. This work has benefitted from past collaborations with Gregory Flanagan, Christian Freksa, Frank Dylla, Jan Oliver Wallg¨ un, and Diedrich Wolter. We gratefully acknowledge the funding and support of the German Research Foundation (DFG), www.dfg.de/ .
References [1] Abelson, H., Sussman, G.J., Sussman, J.: Structure and Interpretation of Computer Programs. MIT Press, Cambridge (1985) [2] Aiello, M., Pratt-Hartmann, I., van Benthem, J. (eds.): Handbook of Spatial Logics. Springer, Heidelberg (2007) [3] Almendros-Jim´enez, J.M.: Constraint logic programming over sets of spatial objects. In: Proceedings of the 2005 ACM SIGPLAN Workshop on Curry and Functional Logic Programming. ACM, New York (2005) [4] Balbiani, P., Condotta, J.-F., Ligozat, G.: On the consistency problem for the indu calculus. Journal of Applied Logic (2006) [5] Banerjee, B., Chandrasekaran, B.: A Constraint Satisfaction Framework for Executing Perceptions and Actions in Diagrammatic Reasoning. Journal of Artificial Intelligence Research (2010) [6] Bhatt, M.: Reasoning about space, actions and change: A paradigm for applications of spatial reasoning. In: Qualitative Spatial Representation and Reasoning: Trends and Future Directions. IGI Global, USA (2010) [7] Bhatt, M., Flanagan, G.: Spatio-temporal abduction for scenario and narrative completion. In: Bhatt, M., Guesgen, H., Hazarika, S. (eds.) Proceedings of the International Workshop on Spatio-Temporal Dynamics, Co-Located With the European Conference on Artificial Intelligence (ECAI 2010). ECAI Workshop Proceedings (August 2010) [8] Bhatt, M., Freksa, C.: Spatial computing for design: An artificial intelligence perspective. In: Visual and Spatial Reasoning for Design Creativity, SDC 2010 (2011)
CLP(QS): A Declarative Spatial Reasoning Framework
229
[9] Bhatt, M., Dylla, F., Hois, J.: Spatio-terminological inference for the design of ambient environments. In: Hornsby, K.S., Claramunt, C., Denis, M., Ligozat, G. (eds.) COSIT 2009. LNCS, vol. 5756, pp. 371–391. Springer, Heidelberg (2009) [10] Bhatt, M., Guesgen, H., Woelfl, S., Hazarika, S.: Qualitative Spatial and Temporal Reasoning: Emerging Applications, Trends and Directions. Journal of Spatial Cognition and Computation (2011) [11] Canny, J.: Some algebraic and geometric computations in pspace. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing. ACM, New York (1988) [12] Chazelle, B., Dobkin, D.: Optimal convex decompositions. In: Computational Geometry. North-Holland, Amsterdam (1985) [13] Chomicki, J., Revesz, P.Z.: Constraint-based interoperability of spatiotemporal databases*. GeoInformatica (September 01, 1999) [14] Cohn, A.G., Renz, J.: Qualitative spatial reasoning. In: Handbook of Knowledge Representation. Elsevier, Amsterdam (2007) [15] Collins, G.: Quantifier elimination for real closed fields by cylindrical algebraic decompostion. In: Brakhage, H. (ed.) GI-Fachtagung 1975. LNCS, vol. 33, Springer, Heidelberg (1975) [16] Collins, G.E., Hong, H.: Partial cylindrical algebraic decomposition for quantifier elimination. Journal of Symbolic Computation (1991) [17] Colmerauer, A., Roussel, P.: The birth of prolog. In: History of Programming Languages—II. ACM, New York (1996) [18] Davenport, J.H., Heintz, J.: Real quantifier elimination is doubly exponential. Journal of Symbolic Computation (1988) [19] Dolzmann, A., Seidl, A., Sturm, T.: Redlog User Manual, 3.1 edn. (November 2006) [20] Duntsch, I., Wang, H., Mccloskey, S.: Relation algebras in qualitative spatial reasoning. Fundamenta Informaticae (1999) [21] Dylla, F., Wallgr¨ un, J.: On generalizing orientation information in OPRAm . In: Freksa, C., Kohlhase, M., Schill, K. (eds.) KI 2006. LNCS (LNAI), vol. 4314, pp. 274–288. Springer, Heidelberg (2007) [22] Egenhofer, M.J., Franzosa, R.D.: Point set topological relations. International Journal of Geographical Information Systems (1991) [23] Escrig, M.T., Toledo, F.: Qualitative spatial orientation with constraint handling rules. In: ECAI (1996) [24] Freksa, C.: Using orientation information for qualitative spatial reasoning. In: Frank, A.U., Formentini, U., Campari, I. (eds.) GIS 1992. LNCS, vol. 639. Springer, Heidelberg (1992) [25] Frommberger, L., Lee, J.H., Wallgr¨ un, J.O., Dylla, F.: Composition in OPRAm . Technical report, SFB/TR 8 Spatial Cognition (2007) [26] Hearn, A.C.: REDUCE User’s Manual, 3.8th edn., Santa Monica, CA, USA (February 2004) [27] Hoffmann, C.M.: Geometric and solid modeling: an introduction. Morgan Kaufmann Publishers Inc., San Francisco (1989) [28] Hong, H.: RISC-CLP (Real): logic programming with non-linear constraints over the reals. MIT Press, Cambridge (1993) [29] Jaffar, J., Maher, M.J.: Constraint logic programming: A survey. J. Log. Program. (1994) [30] Jaffar, J., Michaylov, S., Stuckey, P.J., Yap, R.H.C.: The clp( r ) language and system. ACM Trans. Program. Lang. Syst. (1992) [31] Kakas, A.C., Michael, A., Mourlas, C.: ACLP: Abductive constraint logic programming. J. Log. Program. (2000)
230
M. Bhatt, J.H. Lee, and C. Schultz
[32] Kanellakis, P.C., Kuper, G.M., Revesz, P.Z.: Constraint query languages. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York (1990) ISBN 0-89791-352-3 [33] Kowalski, R.A.: The early years of logic programming. Commun. ACM (1988) [34] Kurup, U., Cassimatis, N.: Integrating constraint satisfaction and spatial reasoning. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence (2010) [35] Ladkin, P.B., Maddux, R.D.: On binary constraint networks. Technical report, Kestrel Institute (1988) [36] Lifschitz, V.: What is answer set programming? In: Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 3. AAAI Press, Menlo Park (2008) [37] Lloyd, J.W.: Practical advtanages of declarative programming. In: Alpuente, M., Barbuti, R., Ramos, I. (eds.) GULP-PRODE, vol. (1) (1994) [38] Mishra, B.: Computational real algebraic geometry. In: Handbook of Discrete and Computational Geometry. CRC Press, Inc., Boca Raton (1997) [39] Moratz, R.: Representing relative direction as a binary relation of oriented points. In: Proceeding of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence, Riva del Garda, Italy, August 29–September 1, IOS Press, Amsterdam (2006) [40] Randell, D.A., Cui, Z., Cohn, A.: A spatial logic based on regions and connection. In: Principles of Knowledge Representation and Reasoning, KR 1992. Morgan Kaufmann, San Francisco (1992) [41] Renz, J., Ligozat, G.: Weak composition for qualitative spatial and temporal reasoning. In: van Beek, P. (ed.) CP 2005. LNCS, vol. 3709, pp. 534–548. Springer, Heidelberg (2005) [42] Renz, J., Nebel, B.: On the complexity of qualitative spatial reasoning: A maximal tractable fragment of the region connection calculus. In: IJCAI, vol. (1) (1997) [43] Schlieder, C.: Reasoning about ordering. In: Kuhn, W., Frank, A.U. (eds.) COSIT 1995. LNCS, vol. 988. Springer, Heidelberg (1995) [44] Schrijvers, T., Fr¨ uhwirth, T.W. (eds.): Constraint Handling Rules. LNCS, vol. 5388. Springer, Heidelberg (2008) [45] Scivos, A., Nebel, B.: The finest of its class: The natural point-based ternary calculus for qualitative spatial reasoning. In: Freksa, C., Knauff, M., Krieg-Br¨ uckner, B., Nebel, B., Barkowsky, T. (eds.) Spatial Cognition IV. LNCS (LNAI), vol. 3343, pp. 283–303. Springer, Heidelberg (2005) [46] Sturm, T.: Quantifier elimination for constraint logic programming. In: Ganzha, V.G., Mayr, E.W., Vorozhtsov, E.V. (eds.) CASC 2005. LNCS, vol. 3718, pp. 416–430. Springer, Heidelberg (2005) [47] Tarski, A.: A decision method for elementary algebra and geometry. Technical report, RAND Corporation, Santa Monica, CA (1951) [48] Tassoni, S., Foliaroni, P., Bhatt, M., De Felice, G.: Toward a Qualitative Model of 3D Visibility. In: 25th International Workshop on Qualitative Reasoning, IJCAI 2011 (2011) (position paper) [49] Uribe, T.E., Chaudhri, V., Hayes, P.J., Stickel, M.E.: Qualitative spatial reasoning for question-answering: Axiom reuse and algebraic methods. In: AAAI Spring Symposium on Mining Answers from Texts and Knowledge Bases (2002) [50] van Harmelen, F., Lifschitz, V., Porter, B. (eds.): Handbook of Knowledge Representation (Foundations of Artificial Intelligence). Elsevier Science, Amsterdam (2007) [51] Wielemaker, J., Schrijvers, T., Triska, M., Lager, T.: Swi-prolog. In: CoRR (2010) [52] Wolter, D., Lee, J.H.: Qualitative reasoning with directional relations. Artificial Intelligence (2010)
The Social Connection in Mental Representations of Space: Explicit and Implicit Evidence Holly A. Taylor1, Qi Wang1, Stephanie A. Gagnon1,2, Keith B. Maddox1, and Tad T. Brunyé1,2 1
Department of Psychology, Tufts University 490 Boston Ave., Medford, MA 02155 USA 2 U.S. Army NSRDEC, Cognitive Science Team, RDNS-WS-S 15 Kansas St, Natick, MA 01760 USA {holly.taylor,qi.wang,keith.maddox}@tufts.edu, [email protected], [email protected] http://ase.tufts.edu/spacelab
Abstract. To understand memory of and reasoning about real-world environments, all aspects of the environment, both spatial and non-spatial need to be considered. Non-spatial information can be either integral to or associated with the spatial information. This paper reviews two lines of research conducted in our lab that explore interactions between spatial information and non-spatial information associated with it (namely social information). Based on results of numerous studies, we propose that full accounts of spatial cognition about realworld environments should consider non-spatial influences, noting that some phenomena, while seemingly spatial in nature, may have substantive non-spatial influences. Keywords: spatial categorization, processing, social cognition.
non-spatial
categorization,
implicit
1 Introduction In the world around us, spatial and non-spatial information is inextricably linked. This point can be illustrated by imagining a tour of a university campus. On your tour, you may note an interesting campus building. In taking note of it, it becomes a landmark to you. What information do you process about this landmark? One aspect is its location, but this is surely not all you remember about it. Most of the features you remember, including its identity, are not inherently spatial. These non-spatial features vary in how “connected” they are to the landmark. Some are relatively fixed in relation to the building, such as its color (yellow), building material (brick), shape (Lshaped) and/or architectural style (Classical Revival). You may also note more fluid features of the building, such as its function (Philosophy Department) or decorations (neon pink flyers announcing an upcoming lecture). Finally, information situationally and transiently associated with the building, such as the race and/or gender of students seen walking out of the building, may capture your attention. Although we experience and remember both spatial and non-spatial information, little research has explicitly M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 231–244, 2011. © Springer-Verlag Berlin Heidelberg 2011
232
H.A. Taylor et al.
examined their interaction in memory and problem solving. The present paper uses non-spatial information situationally and transiently associated with locations as a starting point in exploring this interaction. In the present paper we build a case that understanding spatial cognition involves understanding spatial and non-spatial associations. We focus specifically on social information associated with locations. To meet this goal, we first outline cognitive parallels between processing spatial and social information. Next, we discuss methodological adaptations that cross between spatial and social cognition, potentially enhancing the toolbox of both sub-disciplines. With an understanding of the methodologies, we review a number of studies applying these methodologies to explore spatial and social information interactions. Finally, based on these studies’ results, we argue that these associations may provide viable explanations for seemingly spatial phenomena. As such, a full account of spatial cognition must consider non-spatial associations.
2 Parallels in Cognitive Processing of Spatial and Social Information The complexity and multi-faceted nature of spatial and social information make them cognitively interesting in their own right. People categorize both types of information in an attempt to make them cognitively manageable. Information categorization can also introduce costs; it can lead to biases and distortions in memory and the way the information is used. Some of the complexity of spatial information comes from its hierarchical nature [1-3]. Spatial features, such as those found on your hypothetical campus tour, conform to geometric definitions of point, line, and plane. The campus itself is a plane, the roads and walking paths running through it are lines, and the buildings are points. The conceptualization of locations by their geometrical assignment helps define the environment’s hierarchical nature. Spatial categorization takes advantage of this hierarchy. People use line-based features to sub-divide an overall space. Research has made apparent that people use line-based spatial features such as mountain ranges, roads [4], and artificial boundaries [5] and perhaps plane-based locations [6] to group and organize point-based locations [7, 8]. However, organizing spatial information is further complicated by its nested structure. The same information can be geometrically classified at different levels, depending on the specific scale, level of analysis, or zoom [9]. For example, the campus would be considered a point when thinking about the larger city in which it is located, but as a plane when trying to find a particular building on campus. Research suggests, however, that for a given scale and boundary availability, people structure their cognitive maps using these categorical grouping processes and they do so spontaneously [8]. The categorization process can lead to spatial distortions. In Stevens and Coupe’s [6] now classic study, they illustrated how boundaries (e.g., state borders) promote categorical reasoning. Specifically, judgments about the relative location of two cities in different U.S. states relied on knowledge of those states’ relative locations. More generally, people perceive locations sharing superordinate membership as more
The Social Connection in Mental Representations of Space
233
similar to one another and more distinct from locations not sharing this membership [5, 8, 10, 11]. Perceptions of within-category similarity induce spatial distortions in memory. Locations within a category are remembered as closer together than equally distant locations residing in different categories [e.g., 12, 13-15]. People similarly categorize social targets (people), as seen in the in-group bias [16, 17] and stereotyping [e.g., 18, 19, 20] literatures. The in-group bias suggests preferential treatment for individuals sharing some social identity [e.g., women, Asians; 16, 17]. People use different, relevant cues to categorize social information, including appearance (e.g. skin color) and/or behavior (e.g. aggression). As with spatial information, social categorization also leads to category-based errors. Such errors include failure to differentiate individuals within a social category, notably an out-group category [e.g., 18, 21], and differential similarity judgments within versus across social groups [e.g., 22, 23, 24]. People more often confuse individuals in the same racial category than they do individuals from different categories. Social and spatial information are frequently linked in real-world experiences. It is not uncommon for individuals of different racial or ethnic groups to reside in the same neighborhood. Taking Boston, Massachusetts as an example, one expects to find a greater preponderance of Asian individuals in the China Town area, more ItalianAmericans living in the North End, and more African-Americans in Dorchester. Indeed, neighborhood names frequently become associated with the dominant racial or ethnic group living there. If we organize our memory of the city using spatial categories and our memory of the people using social categories, how do these different category types interact? Another cognitive connection between spatial and social information is evident in how physical space is harnessed to mentally represent abstract concepts. Two such connections can be seen in spatial representations of affective valence [25] and time [26]. Several studies have found similarities between thinking about space and thinking about time and in the language used within these two domains [27]. Like time, social information is generally abstract. Ties between social variables and physical space have cultural and ideological ties. Cultures around the world manipulate space at both smaller and larger scales, from house layouts to city planning, to convey social and ideological relationships [e.g., 28, 29-31]. For instance, residential patterns in ancient Mayan cities were arranged in relation to cardinal directions and topography, conveying a cultural map that represents social status. sElite and royal buildings were located northward—culturally associated with the sun’s zenith and the celestial realm—and at higher elevations so as to be closer to their ancestors [28, 32]. This has carried over more subtlely in today’s society with placement of politically important locations at higher elevations. For example, Capitol Hill in Washington, D.C. is modeled after the most important Roman temple, dedicated to Jupiter Optimus Maximus on Capitoline Hill, the highest hill in Rome [33]. Abstract indicators of social status (e.g., power, wealth, influence, intellect) have been related to physical space. We link the more intangible concepts with varying levels of height through metaphors. Take phrases such as “she was climbing the ladder of success” or “Alice was taught to dress above her station” [34] or “he thought unskilled labor beneath him” [35] as examples. Through these metaphors, social status becomes associated with vertical space. “Up,” is used to represent the abstract
234
H.A. Taylor et al.
concepts of power, wealth, and intellect, whereas “down” is comparatively inferior. The additional potential exists to transfer these spatial associations of social status to other social variables (e.g., race or gender), in line with culturally developed stereotypes [31, 36]. Whether the use of spatial information to represent abstract concepts is intentional or incidental is not known. They could be either or both. We do know that aspects of both spatial and social information appear to be, at least partially, processed implicitly. We explore further possible implicit associations between social and spatial information in studies described in this paper. In sum, when considered both separately and in combination, spatial and social information engage similar cognitive processes. Both types of information, due to their complexity, are organized and used categorically. Further, because of its more concrete nature, people sometimes recruit the representational power of spatial information for understanding more abstract information, including social relations. For the most part, however, spatial and social information processing has been experimentally explored separately. The remainder of this paper describes the methodological, conceptual, and empirical progress our lab has made explicitly exploring the interaction of spatial and social information.
3 Cross-Application of Methodology Methodology often develops within a discipline to address fundamental questions of interest. Increasingly, however, interdisciplinary approaches to research questions have shown success, in part, because they bring to bear different methodological approaches. The same can be applied to sub-disciplines of a field. As discussed above, similarities exist in how people process spatial and social information. To capitalize on these similarities and maximize our explorations of how spatial and social information might interact, we have adapted some paradigms more traditionally used in social psychological research in our studies of spatial phenomena. These include the “Who-Said-What” task (a Category Confusion task) and the Implicit Associations Task (IAT). These paradigms, however, can be seen more generally as assessing associations to categories. This broader definition makes them directly applicable to spatial information. 3.1 Category Confusion Tasks The “Who-Said-What” task is a speaker-statement matching task developed by S. E. Taylor et al. [24]. Participants watch a simulated conversation between individuals representing different social categories (e.g., 3 Black and 3 White males). In other words, the conversation individuals differ on one social variable (e.g., race), but other social variables are held constant (e.g., gender). After watching the conversation, participants complete a surprise statement-matching task. In the task, they see a conversation statement and picture of one of the conversation participants and decide whether the pictured individual said the statement. Typical findings show that participants more likely wrongly attribute a statement to someone in the same racial category than to someone from different racial categories. This task can be generally
The Social Connection in Mental Representations of Space
235
conceived of as a category confusion paradigm and, consequently, is applicable to other domains, such as spatial information, where categorization appears to occur readily. We call our adapted paradigm the “Who-Works-Where” task. Participants first learn a spatial environment and information associated with each location. To date, we have primarily used this task with learning via a map [15, 37] although future work will examine learning through virtual (VR) navigation. The associated information includes racial information about people associated with the locations. After learning, participants see a location name and a person name or picture and decide whether they associated the person and the location during learning. Because the task generally assesses category use, we adapted it to examine categorical social and spatial associations. 3.2 Implicit Association Task (IAT) The IAT evaluates the relative effort required in making congruent versus incongruent associations by measuring speeded responses during a category discrimination task. Traditionally, this task has been used in experimental social and personality psychology to identify associations between, for example, gender target categories (male/female) and job-type attribute categories (engineer vs. teacher). If the target categories (male/female) are differentially associated with the attribute categories (engineer/teacher), participants will categorize examples from all four categories faster and more accurately when the associated target and attribute concepts are congruent (e.g., male/engineer) versus incongruent (e.g., female/engineer). Thus, the IAT is thought to measure automatic and implicitly activated subconscious judgments, while minimizing the use of explicit strategies [38, 39]. Like the category confusion tasks, the IAT can be modified to other categorizeable information types. Unlike the category confusion tasks, the IAT measures extant, rather than experimentally learned associations. We adapted the IAT to measure the implicit association between either different aspects of spatial information (cardinal direction and topography) or between social (powerful/average individuals) and location-based information, such as topography (e.g., mountainous/level terrain) [40]. As with classic uses of the IAT, participants progress from a single to a multiple categorization task. They first see instances of one category (e.g., topography, represented by picture of mountains or flat plains) and button-press to indicate their categorization (mountainous or level). As the task progresses, the two categories of interest are intermixed in successive trials. Further, their associations to response buttons are either congruent with one another or incongruent. The implicit association is interpreted from the difference in response time and accuracy between congruent and incongruent category assignments. 3.3 Distance Estimation as a Similarity Assessment Task The two tasks discussed thus far take traditional social cognition methods and apply them to a spatial context. A traditional spatial cognition task, distance estimation, can also be applied to explorations of spatial-social connections. Distance estimation tasks are methodologically simple. Participants simply estimate the distance between two
236
H.A. Taylor et al.
locations. The task can specify a particular type of spatial distance, for example Euclidean (crow-fly) or route distance [41]. The task can be implemented by having participants give numeric responses on a particular scale (e.g., miles) or scaled responses based on a spatial reference (e.g., mark the distance on a line representing the farther distance between locations within an environment). The application of distance estimation tasks to spatial-social connections comes from a more general interpretation of the task. Specifically, distance estimation can be viewed as a similarity assessment task [13]. Just as spatial relations are used metaphorically to represent abstract ideas, distance can be conceptualized in both concrete and abstract ways. The fact that people talk about “bridging the ideological gap” or “coming closer to agreement” when discussing ideas illustrates a metaphorical use of distance. “Closer”, in this sense, means more similar. We have employed distance estimation in the more traditional spatial cognition approach, but include social variables in analyzing influences on these estimates. This approach has been effective in exploring the spatial-social information interaction.
4 Interaction of Spatial and Social Information The import in understanding how spatial and social information interact lies in understanding spatial memory in the real world. Real-world environments generally include both spatial and social information. Walking around your own neighborhood you see the blue house on the corner and the Asian woman tending a garden in the yard or you see the old man sitting at the corner bus-stop. In our experiences, social and spatial information interact. This co-occurrence would suggest that they interact cognitively as well. However, these associations can also be quite transient. The Asian woman in the blue house may move out and an African-American man may move in. The old man presumably leaves the bus-stop when the bus arrives. In other words, these types of associations may be meaningful in the moment, but may not last. Some earlier studies hint that social and spatial information can interact, but not until recently did researchers more systematically investigate this interaction. Earlier studies examined how one’s own racial membership and accompanying experiences influenced the cognitive map of his/her own neighborhood [42, 43]. On-going research in our lab is more systematically examining factors affecting spatial and social information interactions in memory [15, 37, 44]. Our approach is multipronged and employs traditional methodology from both spatial and social cognition. The application of the methodology, however, is not traditional and crosses between spatial and social cognition. Our overall approach can be said to combine explicit and implicit assessments. The more explicit approach involves participants learning a novel map that provides information about, but no direct experience with, unfamiliar individuals with different social characteristics. Memory and distance estimation tasks then assess effects of spatial and social information. The implicit approach involves the IAT. Results of multiple studies strongly suggest that even more transient social associations impact spatial cognition of real-world settings in ways important to understand.
The Social Connection in Mental Representations of Space
237
4.1 Cross-Over Spatial and Social Influences Think again about touring a college campus. You most likely noted the locations of some buildings. Further, if you encountered people on this tour, you automatically processed their social identities, such as race and gender [24, 45]. Did this information become cognitively integrated? We know that non-spatial characteristics of an environment are perceived and incorporated into one’s cognitive map along with spatial features [46, 47]. When this information has categorical structure, people cluster locations based on their function [10, 48], their physical similarity [49] and the semantic category into which they fall [50]. But, as discussed previously, social information is usually more transiently associated with spatial information than is function or physical similarity. The people associated with a building change more frequently than does the building’s function or appearance. In previous and on-going research, we have explored the extent to which social information associated with locations is cognitively integrated. These investigations have used a common methodological approach. Participants study a map of small town business locations. The map is divided into neighborhoods (spatial categories). During study participants focus on one location at a time, learning its location and other associated information, including some type of social information about each business proprietor (e.g., race, political affiliation). After studying, participants complete distance estimation and category confusion (WhoWorks-Where) tasks. Different instantiations of this methodology focused on making either the spatial or the social category more salient. We predicted that category salience, because it directs attention, would play a role in the interaction between spatial and social information [51]. We manipulated salience in a number of ways, including presentation order, presentation format, the nature of the category, and the correlation of the two categories. Presentation order either grouped locations by neighborhood (spatial category) or by social category. The spatial presentation order first went through all locations in a neighborhood before moving on to the next neighborhood. The social presentation order first presented all locations associated with one social category before going on to the next. Presentation format differences depended on the category. To make the neighborhoods more salient, one study labeled the neighborhoods. To highlight the social categories more, the associated information either used labels (less salient) or pictures (more salient, particularly for race). Another way of manipulating the social category involved instantiating different social categories. Most studies used the salient social category of race, but we also examined a less-salient category—political affiliation. The relationship between the categories could be highlighted through their correlation, functionally leading to either socially segregated or socially integrated neighborhoods. Segregated neighborhoods had the majority of locations (8 of 12) associated with one social category. Integrated neighborhoods had an equal number of associations to each social category. This line of work has made clear the fact that spatial and social information interacts and that category salience changes this interaction. Maddox et al. [15] found consistent interactions between racial and spatial categories with distance estimates. Participants estimated locations from different neighborhoods that shared an
238
H.A. Taylor et al.
association to the same race as closer together than those that had different race associations. For this spatial task, the interaction did not appear for locations within a neighborhood, suggesting a stronger role of spatial category on the more spatial task. Looking across multiple studies revealed the effects of category salience [52]. The correlation between categories highlighted their relationship. Participants showed a sensitivity to this correlation, as suggested by a more pronounced effect of social category on distance estimates with segregated neighborhoods [15]. Further, whether the presentation order focused on the spatial or the social category had an effect, for both the distance estimation and category confusion tasks. Both tasks showed interactions between spatial category, social category, and presentation order in the direction predicted by how presentation order directs attentional focus. Salience based on the category nature also had an effect; the less-salient category (political affiliation) did not interact with the spatial category, even though it has cultural associations to spatial areas (e.g., maps of red and blue states during U.S. presidential elections). Finally, we manipulated the salience of race by either relating racial information through labels (e.g., African American, Asian) or through pictures. Results showed that the interaction between social and spatial categories on distance estimates interacted with how the racial information was conveyed. Pictures appeared to strengthen the effect of the racial category compared to labels alone. Taken together, the results of this line of work show strongly that people incorporate social information available in an environment into their spatial representation. Further, the more salient the social category is during learning the more it interacts. In these studies, unlike those discussed next, participants explicitly studied social along with spatial information. The next line of work we discuss examines implicit social-spatial associations that participants already have developed. 4.2 Implicit Spatial-Spatial Associations: An Interesting Phenomenon Our recent investigations of implicit associations between social and spatial information came about in incremental and programmatic studies attempting to explain an interesting phenomenon, that of a southern bias in route selection [53]. The research progression is integral to our overall exploration of social and spatial information interactions and, therefore, is worthy of explanation. As an overview, the progression moved from identifying a particularly interesting spatial phenomenon to exploring spatial explanations for the phenomenon to exploring alternative and clearly related social and spatial interactions. In the work that identified the southern bias, participants selected the “best” of two equidistant routes. The route choices either went generally eastward or westward, connecting landmarks that lay north-south of one another, or generally northward or southward, connecting landmarks east-west of one another. With east-west route dilemmas, route choice reflected the equidistant nature of the options with participants selecting the eastward route approximately half of the time. The northsouth route dilemmas showed our interesting phenomenon. For these trials, route selection did not reflect the equivalent route distance. Instead, participants showed a consistent bias for south-going routes. This bias was evident in route selection within fictitious small towns and in estimates of travel times for routes traversing the U.S.,
The Social Connection in Mental Representations of Space
239
thus showing generalization across scale. Current work in our lab is exploring whether this bias results from explicit strategy use or an implicit association. People may show this southern bias in route selection because of an association between north and uphill. Some evidence of this association comes from a follow-up rating task with the same routes. Participants rated northern, compared to southern, routes as more scenic and requiring more calories to traverse, both qualities associated with higher elevation. East and west routes did not show differences in scenery or calorie ratings. More direct evidence of an association between north and uphill comes from recent work using the modified IAT [44]. In this case, the IAT compared implicit associations between high and low elevation (mountainous versus level terrain) and cardinal directions (north and south). For the IAT, the congruent pairing had north and mountains sharing one response button and south and level sharing the other. The incongruent mapping required south and mountains to share one response button and north and level to share the other. Participants completed blocks of both mappings (congruent and incongruent) in counterbalanced order. Thus for these categories, response latencies for image categorizations should reflect the strength of implicit target (north/south) and attribute (mountains/level terrain) associations [38, 39]. We ran two experiments differing only in their representations of cardinal directions, either maps with stars to the north or south or compass roses showing “N” or “S” to designate direction. Results revealed faster categorization response latencies for congruent north/mountain pairings relative to incongruent south/mountain pairings (p < .05). Our findings provide direct evidence for an automatic association between north and higher elevations [54]. This association can then be implicated in spatial heuristics that bias spatial decision making performance. The source of this association is unclear, however. Different possible sources make sense, including mapping conventions (e.g., north as up) and topography. We are currently examining effects of local topography in a cross-cultural and crosstopographical study. The participants in the original studies showing the south-route bias resided in the north-eastern U.S. where topography reinforces the idea that north is uphill. In this U.S. region, people travel north to hike or ski in the mountains. An international, collaborative follow-up, however, suggest that a topographical association is not contributing to the southern bias. Collaboration with other U.S. and European labs [55] suggest that local topography, e.g., mountains to the north, do not explain the phenomenon. Labs with mountains prominent in directions other than north (e.g., Sophia, Bulgaria) also find a southern route preference. 4.2 Implicit Social-Spatial Associations: Explaining the Interesting Phenomenon Our success in using the IAT to explore associations between different aspects of spatial information suggested to us that the IAT could also be used to explore associations between spatial and social information, such as those seen in our maplearning studies [15, 37, 52] To investigate social-spatial associations, we again employed the IAT [56]. Our adaptations were similar to those exploring the association between topography and
240
H.A. Taylor et al.
cardinal directions. In this case, we examined associations between social variables (e.g., social power, gender) and topography (mountainous vs. level terrain). As discussed earlier, social power has some association to higher elevations. While some cultures implement this association directly, such as in city layouts [28, 31], the obviousness of the association in Western cultures is lessened. To explore this association in U.S. participants, we selected powerful people from Time Magazine’s list of powerful people, which includes international leaders and executives. Our participants most likely did not explicitly identify the powerful people, because they involved relatively unfamiliar individuals such as leaders of foreign nations or businesses [although, see 57]. Thus, from the participant’s perspective, the operationalization of social power largely involved attire. The stimuli consisted of images of men and women, matched for age and ethnicity, dressed in either suits (powerful) or relatively casual clothing (average). We operationalized spatial topography using images of mountains (high elevation) and level plains (low elevation), matched for climate and amount of visible sky. Our findings provide evidence that participants implicitly associate powerful individuals and high elevation, and, as a corollary, average individuals and level elevation [31]. Data supporting this assertion involved faster response times for congruent (high/powerful, low/average) response mappings relative to incongruent (high/ average, low/powerful) response mappings. Further, the strength of this effect could be seen in the lack of either a main or interactive effect of block order (i.e., congruent versus incongruent first, p's > .5). These results support the notion that we ground abstract conceptions of social status in physical space and this association implicitly affects other judgments. Further, this suggests that our underlying social and ideological connotations regarding elevation might impact our perception of space in general, which carries important implications for predicting human behavior and theories of spatial and social cognition. We also examined whether this association carries over to stereotype-based, more indirect relationships. In many western cultures, men are perceived as having higher social status than women. Evidence of this can be seen in higher compensation [58] and promotion [59] rates for men, holding qualification and performance measures constant. Another source of this association could be height; people may associate people’s height with topographic elevation. Men, on average, are taller than women. Using an IAT that pitted categorization by gender against categorization by topography, results did not show a strong implicit association. Either the relationship between gender and topographic elevation does not exist or it may be too indirect to be captured by the IAT. If the association to gender is mediated through social status, it may not be evident with the IAT, although mediated semantic relationships do show implicit effects, in the form of priming [60]. Other cognitive processes engaged by the IAT, including executive functions [61], may make it insensitive to mediated associations. To sum up, our recent studies show that the IAT can be used to explore implicit associations between spatial and social information, particularly those with a more direct connection. Important to the broader study of spatial memory and reasoning, the evidence of social-spatial associations strongly suggests they influence spatial cognition and should be considered when investigating real-world spatial behavior.
The Social Connection in Mental Representations of Space
241
5 Conclusions: Why Spatial and Social Interactions Matter Imagine you have just finished the college campus tour you began earlier in this paper. Perhaps on your tour, you passed the building housing the Asian Languages and Literatures Department. Further along the tour you noted the interesting Greek Revival architecture of the Asian Student Affinity House. Finally, as the tour neared its end, you saw a group of Asian students engaged in an animated conversation outside the Student Union. While there is nothing Asian integral to these particular locations, you have now made Asian associations to all of them. How do these associations come to bear when you later think about this campus, perhaps deciding whether you will attend this university or what the best route is to where you parked your car? Our work suggests that these associations spatial information retrieved from memory and decisions based on this information. We have outlined two different lines of current research in our lab exploring how social associations to locations affect spatial cognition. The two lines originally had quite different goals. One line explicitly set out to examine the interaction of social and spatial category effects. The second came out of a programmatic approach to explain an interesting route-selection phenomenon. The two research lines approach the question of social-spatial information interaction using different methodologies and assumptions (e.g., intentional versus incidental learning). With these differences in mind, the strong evidence of social and spatial information interactions stands out to an even greater extent. People categorize both spatial and social information and when in the same overall context (e.g., learning a map) the two category types interact. They interact on tasks that focus specifically on spatial aspects of the map (distance estimation) and on associations about map locations. The interactions can be seen when information is intentionally studied. It is apparent in extant associations that are implicitly activated. The range of contexts in which they appear means far-reaching implications for spatial problem solving and decision making. These implications are further strengthened by the fact that the particular type of social and spatial associations we have examined are generally transient in nature and experience. An important next step in this work is developing theoretical accounts for spatial and non-spatial information interactions. Given the nascency of research directly exploring such interactions, such a theoretical account at this time is not possible. The range of non-spatial associations is both vast and varied. At a basic level, landmarks have both an identity and a location. The identity is non-spatial. Working memory handles identity and location differently [62]. This basic spatial/non-spatial association is not completely understood. We know salience affects these associations, but have not completely outlined what impacts salience. Thus, this line of work, leading to theoretical accounts, is ripe for research and discussion. The present paper also provides support for having methodology cross conceptual and disciplinary boundaries. We outline here the use of traditionally social cognition tasks that, in our research, revealed interesting spatial cognition findings. We also outline using traditionally spatial cognition tasks with which we have found interesting social-spatial information interactions. Crossing methodologies in this way and using multiple methodologies means variety in methodological assumptions and strength in convergent findings based on them. We strongly suggest that broadening our methodological toolboxes in this way should allow for new research insights.
242
H.A. Taylor et al.
Although it may sound trite, we cannot escape the fact that spatial behavior is complex. The body of research exploring spatial behavior supports this contention, as does our work, reviewed in this paper. While we would not claim to have brought parsimony to explanations of spatial cognition, our findings make clear that social and spatial information interact. This fact should be considered in broad accounts of spatial behavior. Acknowledgments. We thank the following students for their data collection assistance: Matt DiGirolamo and Michael Fitzgerald. Funding by the NRSDEC U.S. Army Natick Soldier Research, Development, & Engineering Center, Cognitive Science division, is appreciated.
References 1. Bilge, A.R., Taylor, H.A.: Where is "here" in nested environments? Spatial updating from different sources. Spat. Cog. Comp. 10(2-3), 157–183 (2010) 2. Wang, R.F., Brockmole, J.R.: Human navigation in nested environments. J. Exp. Psychol. Learn 29, 398–404 (2003) 3. Wang, R.F., Brockmole, J.R.: Simultaneous spatial updating in nested environments. Psychon. B Rev. 10, 981–986 (2003) 4. McNamara, T.P., Ratcliff, R., McKoon, G.: The mental representation of knowledge acquired from maps. J. Exp. Psychol. Learn 10, 723–732 (1984) 5. McNamara, T.P.: Mental representation of spatial relations. Cog. Psychol. 18, 87–121 (1986) 6. Stevens, A., Coupe, P.: Distortions in judged spatial relations. Cog. Psychol. 10, 422–437 (1978) 7. Friedman, A., Montello, D.R.: Global-scale location and distance estimates: Common representations and strategies in absolute and relative judgments. J. Exp. Psychol. Learn 32(3), 333–346 (2006) 8. McNamara, T.P., Hardy, J.K., Hirtle, S.C.: Subjective hierarchies in spatial memory. J. Exp. Psychol. Learn. 15, 211–227 (1989) 9. Lloyd, R.: Spatial cognition: Geographic environments. Kluwer Academic Publishers, Norwell (1997) 10. Hirtle, S.C., Jonides, J.: Evidence of hierarchies in cognitive maps. Mem. Cognition 13(3), 208–217 (1985) 11. Maki, R.H.: Why do categorization effects occur in comparative judgment tasks? Mem. Cognition 10(3), 252–264 (1982) 12. Huttenlocher, J., Hedges, L.V., Duncan, S.: Categories and particulars: Prototype effects in estimating spatial location. Psychol. Rev. 98(3), 352–376 (1991) 13. Montello, D.R., Fabrikant, S.I., Ruocco, M., Middleton, R.S.: Testing the first law of cognitive geography on point-display spatializations. In: Kuhn, W., Worboys, M.F., Timpf, S. (eds.) COSIT 2003. LNCS, vol. 2825, pp. 316–331. Springer, Heidelberg (2003) 14. Carbon, C., Leder, H.: The Wall inside the rain: Overestimation of distance crossing the former Iron Curtain. Psychon. B Rev. 12(4), 746–750 (2005) 15. Maddox, K.B., et al.: Social influences on spatial memory. Mem. Cognition 36(3), 479–494 (2008) 16. Brewer, M.B.: Ingroup bias in the minimal intergroup situations: A cognitive motivational analysis. Psychol. Bull. 86, 307–324 (1979) 17. Tajfel, H.: Experiments in intergroup discrimination. Sci. Am. 223, 96–102 (1970)
The Social Connection in Mental Representations of Space
243
18. Hamilton, D.L.: Stereotyping and intergroup behavior: Some thoughts on the cognitive approach. In: Hamilton, D.L. (ed.) Cognitive Processes in Stereotyping and Intergroup Behavior, pp. 333–354. Lawrence Erlbaum Associates, Hillsdale (1981) 19. Hamilton, D.L., Sherman, J.W.: Stereotypes. In: Wyer, R.S., Srull, T.K. (eds.) Handbook of social cognition, pp. 1–68. Lawrence Erlbaum Associates, Inc., Hillsdale (1994) 20. Sigall, H., Page, R.: Current stereotypes: A little fading, a little faking. J. Pers. Soc. Psychol. 18(2), 247–255 (1971) 21. Taylor, S.E.: A categorization approach to stereotyping. In: Hamilton, D.L. (ed.) Cognitive Processes in Stereotyping and Intergroup Behavior, pp. 83–114. Lawrence Erlbaum Associates, Hillsdale (1981) 22. Klauer, K.C., Wegener, I.: Unraveling social categorization in the “Who said what?” paradigm. J. Pers. Soc. Psychol. 75(5), 1155–1178 (1998) 23. Maddox, K.B., Gray, S.A.: Cognitive representations of Black Americans: Reexploring the role of skin tone. Pers. Soc. Psychol. B 28(2), 250–259 (2002) 24. Taylor, S.E., et al.: Categorical and contextual bases of person memory and stereotying. J. Pers. Soc. Psychol. 36(7), 778–793 (1978) 25. Brunyé, T.T., et al.: Bad news travels west: Handedness and map granularity influences on remembering locations of valenced events (2011) (submitted) 26. Boroditsky, L.: Metaphoric structuring: Understanding time through spatial metaphors. Cognition 75, 1–28 (2000) 27. Clark, H.H.: Space, time, semantics, and the child. In: Moore, T.E. (ed.) Cognitive development and the acquisition of language, pp. 28–64. Academic Press, New York (1973) 28. Ashmore, W.: Site-planning principles and concepts of directionality among the ancient Maya. Lat. Am. Antiq. 2(3), 199–226 (1991) 29. Bourdieu, P.: The Berber house, in Rules and meanings. In: Douglas, M. (ed.) Penguin, pp. 98–110. Harmondsworth, UK (1973) 30. Hodder, I.: The contextual analysis of symbolic meaning. In: Hodder, I. (ed.) The Archaelogy of Contextual Meanings, pp. 1–10. Cambridge University Press, Cambridge (1987) 31. Keating, E.: Spatial conceptualization of social hierarchy in Pohnpei, Micronesia, in Spatial information theory: A theoretical basis for GIS. In: Kuhn, W., Frank, A.U. (eds.) COSIT 1995. LNCS, vol. 988, pp. 463–474. Springer, Heidelberg (1995) 32. Robin, C.: Peopling the past: New perspectives on the Ancient Maya. In: PNAS, vol. 98(1), pp. 18–21 (2001) 33. Weatherford, J.: Tribes on the hill: The U.S. Congress rituals and realities. Bergin & Garvey, Westport (1985) 34. Taft, J.: Mental hygiene problems of normal adolescence. Ann. Am. Acad. Polit. SS 98, 61–67 (1921) 35. Stevenson, R.A.: The union and Billy Bell. Scribner’s 29, 401–409 (1901) 36. Xiao, D., Liu, Y.: Study of Cultural Impacts on Location Judgments in Eastern China. In: Winter, S., Duckham, M., Kulik, L., Kuipers, B., et al. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 20–31. Springer, Heidelberg (2007) 37. Wang, Q., et al.: Seeing the forest or the trees: Categorical effects in map memory based on spatial focus (2011) (submit) 38. Greenwald, A.G., Farnham, S.D.: Using the implicit association test to measure selfesteem and self-concept. J. Per. Soc. Psychol. 79(6), 1022–1038 (2000) 39. Greenwald, A.G., et al.: Understanding and using the implicit association test: III. Metaanalyses of predictive validity. J. Pers. Soc. Psychol. 97, 17–41 (2009) 40. Brunyé, T.T., et al.: High and mighty: Implicit associations between physical space and social status. in prep. (2011)
244
H.A. Taylor et al.
41. Taylor, H.A., Naylor, S.J., Chechile, N.A.: Goal-specific influences on the representation of spatial perspective. Mem. Cognition 27(2), 309–319 (1999) 42. Ladd, F.C.: Black youths view their environment: Neighborhood maps. Environ. Behav. 2(1), 74–99 (1970) 43. Orleans, P.: Differential cognition of urban residents. In: Orleans, P. (ed.) Science, Engineering and the City, Publication 1498, National Academy of Engineering, Washington, DC (1967) 44. Gagnon, S.A., Brunyé, T.T., Taylor, H.A.: To the north, Alice! Implicit associations between spatial topography and cardinal direction. In: 23rd Annual Convention of the Association for Psychological Science, Washington, DC (2011) 45. Ito, T.A., Urland, G.R.: Race and gender on the brain: Electrophysiological measures of attention to race and gender of multiply categorizable individuals. J. Pers. Soc. Psychol. 85, 616–626 (2003) 46. McNamara, T.P., Halpin, J.A., Hardy, J.K.: The representation and integration in memory of spatial and nonspatial information. Mem. Cognition 20, 519–532 (1992) 47. McNamara, T.P., LeSueur, L.L.: Mental representations of spatial and nonspatial relations. Q J. Exp. Psychol. 41A(2), 215–233 (1989) 48. Merrill, A.A., Baird, J.C.: Semantic and spatial factors in environmental memory. Mem. Cognition 15(2), 101–108 (1987) 49. Hirtle, S.C., Kallman, H.J.: Memory for the locations of pictures: Evidence for hierarchical clustering. Am. J. Psychol. 101(2), 159–170 (1988) 50. Hirtle, S.C., Mascolo, M.F.: Effect of semantic clustering on the memory of spatial locations. J. Exp. Psychol. Learn 12(2), 182–189 (1986) 51. Blanz, M.: Accessibility and fit as determinants of the salience of social categorizations. Eur. J. Soc. Psychol. 29(1), 43–74 (1999) 52. Wang, Q., et al.: Social categorizations of space: Explorations of category salience (2011) (in prep.) 53. Brunyé, T.T., et al.: North is up(hill): Route planning heuristics in real-world environments. Mem. Cognition 38(6), 700–712 (2010) 54. Gattis, M.: Spatial schemas and abstract thought, vol. 362. MIT Press, Cambridge (2003) 55. Brunyé, T.T., et al.: Spatial biases in topographically diverse environments: International replication of a southern route preference (2011) (in prep.) 56. Greenwald, A.G., Banaji, M.R.: Implicit social cognition: Attitudes, self-esteem, and stereotypes. Psychol. Rev. 102(1), 4–27 (1995) 57. Rule, N.O., Ambady, N.: The face of success: Inferences from chief executive officers’ appearnace predict company profits. Psychol. Sci. 19, 109–111 (2008) 58. DesRoches, C.M., et al.: Activities, productivity, and compensation of men and women in the life sciences. Acad. Med. 85(4), 631–639 (2010) 59. Blau, F.D., DeVaro, J.: New evidence on gender difference in promotion rates: An empirical analysis of a sample of new hires. National Bureau of Economic Research (2006) 60. McNamara, T.P.: Depth of spreading activation revisited: Semantic mediated priming occurs in lexical decisions. J. Mem. Lang. 27(5), 545–559 (1988) 61. Mierke, J., Klauer, K.C.: Implicit association measurements with the IAT: Evidence for effects of executive control processes. Exp. Psychol. 48(2), 107–122 (2001) 62. Thomas, A.K., Bonura, B.M., Taylor, H.A.: Age differences in remembering "what" and "where": A comparison of spatial working memory and metacognition in older and younger adults (2011) (in prep.)
Revisiting the Plasticity of Human Spatial Cognition Linda Abarbanell1, Rachel Montana2, and Peggy Li3 1
Harvard University, Graduate School of Education Princeton University, Department of Psychology 3 Harvard University, Laboratory for Developmental Studies 2
Abstract. In a recent study by Haun et al. (2011), Dutch-speaking children who prefer an egocentric (left/right) reference frame when describing spatial relationships, and Hai||om-speaking children who use a geocentric (north/south) frame were found to vary in their capacity to memorize small-scale arrays using their language-incongruent system. In two experiments, we reconcile these results with previous findings by Li et al. (2011) which showed that English (egocentric) and Tseltal Mayan (geocentric) speakers can flexibly use both systems. In Experiment 1, attempting to replicate Haun et al., we found that English- but not Tseltal-speaking children could use their language-incongruent system. In Experiment 2, we demonstrate that Tseltal children can use an egocentric system when instructed nonverbally without left/right language. We argue that Haun et al.’s results are due to the Hai||om children’s lack of understanding of left/right instructions and that task constraints determine which system is easier to use. Keywords: linguistic relativity, frames of reference, Tseltal Mayan.
1 Introduction Recent years have seen a resurgence of work on the linguistic relativity hypothesis, the idea that the language we speak can have a profound effect on the nonlinguistic concepts we form (see Bowerman & Levinson, 2001; Gentner & Goldin-Meadow, 2003; Gumperz & Levinson, 1996, see also Whorf (1956) for the original formulation of the hypothesis and Gleitman & Papafragou (2005) for a review of the recent literature). One of the more controversial areas of current investigation concerns the perspectives, or frames of reference, speakers use to talk about locations and directions. In English, speakers tend to adopt the perspective of a (typically egocentric) viewer (e.g., “The cup to the right of the pitcher”), while speakers of other languages use fixed aspects of the (geocentric) environment. For example, in Tseltal Mayan (Chiapas, Mexico), speakers use the uphill/downhill slope of their terrain and other salient landmarks (“The cup to the downhill of the pitcher”), while body part terms like ‘left’ and ‘right’ are not typically projected to regions outside the body (Brown, 2006; Brown & Levinson, 1993a; 1992, cf. Abarbanell, 2007). While a third, object-based system arguably cuts across both (“The cup at the mouth of the pitcher”) (see Terrill & Burenhult, 2008), most of the literature has focused on how the habitual use of an egocentric versus a geocentric perspective can M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 245–263, 2011. © Springer-Verlag Berlin Heidelberg 2011
246
L. Abarbanell, R. Montana, and P. Li
affect how spatial representations are interpreted, stored and retrieved across modalities, resulting in greater facility using language-congruent cognitive strategies and a dispreference for language-incongruent ones. This dispreference, in turn, has been argued to yield practice effects, making it difficult for speakers to use the reference frame that is less frequently used in their language on non-linguistic spatial tasks, such as memorizing a small-scale spatial array (Brown & Levinson, 1993b; Levinson, 1996, 2003; Majid, Bowerman, Kita, Haun & Levinson, 2004; Pederson, Danziger, Wilkins, Levinson, Kita & Senft, 1998). For example, in the animals-in-arow task, participants are shown a row of toy animals facing in a given direction (e.g., left/north). They are then asked to recreate the “same” array after turning 90° or 180° to face a second table. Studies with over 20 language groups revealed a robust and striking correlation: speakers of languages like English that habitually use an egocentric reference frame rotated the animals along with their body, while speakers of languages like Tseltal held them constant with the environment (see Figure 1).
Fig. 1. Animals-in-a-row task, showing (a) egocentric and (b) geocentric solutions
Not all researchers, however, accept these results as necessarily demonstrating language use shaping the availability of frames of reference in everyday spatial reasoning as argued by proponents of linguistic relativity, taking issue in particular with open-ended tasks for which there are at least two correct solutions (Li & Gleitman, 2002; Li, Abarbanell, Gleitman & Papafragou, 2011; Newcombe & Huttenlocher, 2000; Pinker, 2007). In such cases, speakers must make pragmatic inferences about the desired or expected response. That is, how one’s linguistic community customarily speaks about or responds to inquiries about locations and directions may affect how speakers interpret what appropriately counts as the “same” array (Li & Gleitman, 2002). To adjudicate between “linguistic relativity” and “pragmatic inference” hypotheses, Li et al. (2011) and Abarbanell (2010), tested English and Tseltalspeaking adults and children on non-open-ended tasks that required participants to respond in a certain way, either using egocentric or geocentric coordinates. When the expectations were made clear, under matched conditions, they found that the two language groups performed similarly: Contrary to the predictions of Levinson (2003) and colleagues, both English (egocentric) and Tseltal (geocentric) speakers showed a
Revisiting the Plasticity of Human Spatial Cognition
247
pattern of equivalent or enhanced performance on egocentric as compared with geocentric tasks. 1.1 Contradictions in the Literature: Haun et al. A recent study by Haun, Rapold, Janzen, and Levinson (2011), however, found seemingly contradictory results. They compared Dutch speakers in the Netherlands who primarily use an egocentric reference frame, with speakers of ≠Akhoe Hai||om, a Khoisan language spoken in Northern Namibia that primarily uses a geocentric reference frame, on both open- and non-open-ended versions of the animals-in-a-row task. Seven to eleven-year-old children were tested to control for educational differences in adult community members across language groups. The children were shown an array of animals at one table and then walked to a second table where they faced 90° from their original orientation and were asked to recreate the same array. The two tables were separated by a school building which ostensibly could have been used to orient the animals (e.g. “The animals are facing the building”), to probe the children’s tendency to use a salient local landmark versus a larger-scale directional system. The children were tested on a block of simple trials, using three animals in a row, followed by a block of more difficult trials involving a six-object array, to test the possibility that arrangements which are harder to encode linguistically will cause participants to fall back on a more ‘innate’, or language-independent strategy. Using videotaped instructions, the children were simply told to “rebuild the array”. In line with previous findings from open-ended tasks, both groups of children aligned the animals in accordance with their language-congruent reference frame: the Dutch-speaking children were exclusively egocentric while the Hai||om were overwhelmingly geocentric. Next, the children received a third block of hard trials in which they were instructed to rebuild the array using the opposite system to the one they had just used (e.g., children who had aligned the animals egocentrically on the open-ended blocks were instructed to align them geocentrically and vice-versa) In contrast to the flexibility of the English and Tseltal-speakers tested by Abarbanell and colleagues, the Dutch and Hai||om showed little capacity for switching to their language-incongruent system. Children, however, and sometimes even adults, exhibit difficulty in switching to a different way of solving a problem after being accustomed to one way of solving it (Cepeda, Kramer & Gonzales de Sather, 2001; Luchins, 1942; Yerys & Munakata, 2006). In a follow-up study, to control for the possibility of such a perseveration effect, a new group of Hai||om-speaking children were given two blocks of instructed trials, one with instructions to recreate the array egocentrically (e.g., “place the rightmost objects back on the right-hand side of the array”) and the other geocentrically (e.g., “place the western objects back on the western side of the array”), again using videotaped instructions, with the order of the two blocks counterbalanced across participants. Once again, the Hai||om-speaking children performed significantly better in the geocentric than the egocentric condition, struggling to use their non-dominant egocentric system regardless of block order. These findings seemingly contradict those of Li et al. (2011) and Abarbanell (2010) in two ways. First, Li and colleagues found that when tasks were non-openended, Tseltal and English speakers often performed similarly. Second, across several
248
L. Abarbanell, R. Montana, and P. Li
tasks, they found that recalling the arrangements of spatial arrays from an egocentric perspective, the perspective in which one takes in information about the world, can sometimes be easier than from a geocentric perspective. Thus, Tseltal speakers who do not predominantly use left/right language just like Hai||om speakers, often performed better on tasks requiring the egocentric solution than tasks requiring the geocentric solution. There are, however, several different ways to reconcile these seemingly discrepant results, including the fact that these were different comparison populations and there were many procedural differences. Li and colleagues tested different spatial tasks (card choice and route tracing), and participants took a 180° rather than a 90° turn. Finally, the way in which Li et al. indicated which response they expected from participants did not involve spatial language, while Haun et al.’s instructions did. The possibility of task-specific and population-specific effects makes it crucial to try to replicate seemingly divergent results across different comparison groups that share the same frame of reference preference in order to disentangle the possible contribution of language from that of other cultural, environmental, and task effects. 1.2 Replication and Left/Right Comprehension Difficulty in Children We will return to these points in the General Discussions. For now, we turn to a possible alternative explanation for Haun et al.’s results. One valid concern is whether or not Hai||om children actually understood the egocentric verbal instructions. Given that even English-speaking children have a difficult time acquiring the full extent of “left” and “right” use in a culture that more frequently uses these terms (Piaget, 1928; Rigal, 1994, 1996), there is cause to be wary of the assumption that Hai||om-speaking children would understand the verbal instructions with “left” and “right”. While the Hai||om use front/back terms projectively to describe spatial relationships as well as metaphorically to describe relationships in other domains (e.g., younger people ‘follow behind’ their elders), they reportedly do not do the same with left/right. Rather, left/right language remains relatively dispreferred and undeveloped, e.g., in discussions about routes or object manipulations, especially as compared to languages like English or Dutch (Widlok, 2007: 272). Without this type of left/right language readily available in the input, it is possible that the Hai||om-speaking children, whose left/right comprehension was not assessed by the experimenters, had not acquired the full meaning of “left” and “right” and did not understand the instructions for the egocentric block. They therefore naturally performed better on the geocentric block for which they understood the instructions. Given that alternative interpretations exist for Haun et al.’s results, it is presently unclear whether their results truly contradict Li et al.’s. In order to try to reconcile the Haun et al. (2011) and Li et al (2011) results, we turned to Tseltal and English speakers, asking whether Haun and colleagues’ results are replicable with this comparison population, and, if so, whether they could be attributable to a difficulty with left/right language. In Experiment 1, we first replicated Haun et al.’s study and verified whether the Tseltal-speaking children indeed understand left/right language. In Experiment 2, we again tested children in non-open-ended versions of the task, but with instructions that do not make use of left/right language, following the methodology from Li et al.
Revisiting the Plasticity of Human Spatial Cognition
249
2 Experiment 1: Haun et al. Replication 2.1 Methods Participants. Thirty-nine English-speaking children between the ages of seven to ten were recruited from the Cambridge MA area through town email lists, afterschool programs, and at the Boston Museum of Science. The children were randomly assigned to one of three conditions: an open-ended condition (N=14, mean age 7.93 years; SD = .83), an Ego-to-Geo condition (N=13, mean age 8.31; SD = 1.03), and a Geo-to-Ego condition (N=12, mean age 7.92; SD = .67). The children received a small prize for their participation. Forty-six Tseltal-speaking children were recruited from the community of Tenejapa, Chiapas, Mexico, through a research assistant who was from the community, and were also randomly assigned to the three conditions: open-ended (N=14, mean age 9.21; SD = .89), Ego-to-Geo (N=16, mean age 8.06, SD = .68), and Geo-to-Ego (N=16, mean age 8.94, SD = 1.00). Testing was conducted in a local house in Tenejapa and participants were compensated with 20 pesos (~2 USD). All instructions were administered in Tseltal by the research assistant, a native-Tseltal speaker who was bilingual in Spanish. Setup. Haun et al. varied whether participants were run outdoors or indoors and found no difference in the results; we therefore did not manipulate this factor. All of our participants were tested indoors. Two square tables were placed approximately 1.5 meters apart, with a barrier in between so that when seated at one table, participants could not see the other table. The arrays were created from a set of eight toy farm figurines (a cow, horse, pig, sheep, bale of hay, boy, girl, and cart). Following Haun et al., we included easy and hard trials. The easy trials were lines of three animals, all facing the same direction. The hard trials used six figurines in two rows of three, with the figurines varying in their facing orientation (either right or left). The easy trials could be easily described linguistically (e.g., “cow, sheep, pig walking right”) while hard trials would require longer descriptions (“The pig is facing left towards the cow that is facing right…”). Therefore, any language-specific strategies driven by sub-vocalizing, would be expected to yield to a universal default strategy on the more difficult trials (Haun et al., 2011). For both trial types, participants had to choose the correct figurines from the collection of eight. The test arrays were predetermined by a computer program that randomly selected the figurines and their facing orientations. Procedures. Participants were randomly assigned to one of the three conditions. In all conditions, the children were seated at Table 1, where they were given as much time as they wanted to memorize the array. They then moved to Table 2 where they sat at a 90° rotation from their original orientation. In the open-ended condition participants were simply instructed to “place the animals in the same way” at the second table. The children received three easy followed by three hard trials. The two non-open-ended conditions, Ego-to-Geo and Geo-to-Ego, were counterbalanced conditions in which participants were instructed to provide a specific solution. In the Ego-to-Geo condition, participants were instructed
250
L. Abarbanell, R. Montana, and P. Li
to replicate the arrays using an egocentric solution for the first block of trials, followed by geocentric instructions for the second block, and vice-versa for the Geoto-Ego condition. Each block consisted of three easy followed by three hard trials. In all conditions, the participants were introduced to the task of recreating an array with three practice trials, using just the first table. For each practice trial the experimenter made an array of three animals aligned in a row to either all face left or right, and then asked the child to make the same array on the same table. For the first practice trial, the original array remained in view of the child. For the second and third practice trials, the experimenter covered up the array after the children indicated that they were ready. They were then asked to make the same array from memory. These memory trials served as a quick check for whether participants in all conditions were comparable in their ability to recreate arrays from memory. In the non-open-ended conditions, the children received two additional training trials in which they walked to the second table before recreating the array. On these trials, the children received instructions that familiarized them to the expected solution, either egocentric or geocentric according to block. To induce the use of an egocentric reference frame, the children were shown which animal was to their left and right (e.g., “The horse is on this side towards your left. It is to the left”/”Ja’ te cawayo ay ta izquierda a’w-u’un. Ay ta izquierda”). We used the Spanish terms to express the ideas of “left” (izquierda) and “right” (derecha) to Tseltal speakers, although all other instructions were given in Tseltal. All of the Tseltal-speaking children in the sample were currently attending school where teachers report explicitly teaching and using a left/right system in Spanish. In contrast, left/right terms in Tseltal are rarely explicitly taught in the home according to parental reports, and teachers do not expect the children to necessarily know the words (Abarbanell, 2010). For the geocentric trials, the children were introduced to geocentric terms at the first table. The English-speaking children were shown which animal was to the north/closer to the northern wall, and which animal was to the south/closer to the southern wall. North/south was used with the English-speakers as the children were likely to be more familiar with these than with east/west terms. Children typically learn these terms when learning about maps in school which are oriented north, and “north” and “south” refer to salient locations on the globe while “east” and “west” do not. Further, north/south terms are predicted to be more frequent in discourse directed to children than east/west (e.g., “Santa lives at the North Pole”). The Tseltal-speaking children were shown which animal was towards where the sun rises (ta slok’ib k’aal)/towards the sunrise wall (ta slok’ib k’aal pajk’), and which animal was towards where the sun sets (ta smalib k’aal)/towards the sunset wall (ta smalib k’aal pajk’). For the Tseltal children, although uphill/downhill has been identified as the most salient directional axis for Tseltal speakers (Brown, 2006; Brown & Levinson, 1993a), more recent data suggests that the dominant axis as well as the exact quadrants specified by different directional terms, varies for different regions within the municipality of Tenejapa. For example, Abarbanell (2010) found that Tseltalspeaking adults from the same region as the children tested here frequently used sunrise/sunset terms on language elicitation tasks in contrast to the near-absence of any uphill/downhill language, an absence that was also noted by Polian (2010). We therefore opted to use the sunrise/sunset terms.
Revisiting the Plasticity of Human Spatial Cognition
251
At the second table, children in both language groups were asked to raise their left and right hands in the egocentric block, and to point in the appropriate directions in the geocentric block, and then to tell the experimenter which animal was located at each hand/in each direction. They were then instructed to line up the animals. If the children aligned the animals using a different reference frame from what was expected, the experimenter guided the participant in aligning the animals correctly, reminding the children where the animals went in relation to their left/right or to the environment. No such explicit feedback was given on the three easy or three hard test trials that followed this initial training period. At the end of the experiment, the Tseltal-speaking children were tested on their left/right comprehension. A doll (the boy or girl figurine from the farm animal set) was placed in the center of the table with a different animal figurine at each left/right side, and the children were asked to identify which animal was to the doll’s left and right. The Tseltal third-person possessive prefix (-s) was affixed to the Spanish left/right terms so that it was grammatically unambiguous that the doll’s and not the child’s left/right was intended (“Binti chambalam ay ta s-derecha/s-izquierda te muñeca?”/”Which animal is to the doll’s left/right”). The children received four such trials, two for each side, blocked by side and counterbalanced across children. For each side, there was one trial with all of the figurines facing away from the participant followed by one trial with all figurines facing toward. Scoring. For each trial, the children received one point for each figurine placed in the correct position and one point for each figurine placed in the correct orientation, for a total of six possible points for the easy trials and twelve points for the hard trials. The orientation score was calculated in two ways, with the max of these two ways taken as each child’s final score. In the first way, each figurine was scored for correct orientation regardless of the position in which it appeared. Second, the orientation of the figurine in each position was scored regardless of whether the figurine was the correct one. These two ways took into consideration the possibility that the children might have incorrectly swapped figurine positions but correctly remembered the figurines’ orientation or the possibility that the children misremembered the figurine (e.g., boy instead of girl), but remembered the orientation of the figurine at that position. The percentage correct for each trial was then computed by dividing the number of points out of the total possible number of points. 2.2 Results Open-ended condition. Participants in the open-ended condition were classified as either egocentric, geocentric, object-centric, or untypable based on their general response strategy (at least two out of three trials each for the easy and hard trials aligned with the same reference frame). The percent correct was then calculated for each child, according to their preferred reference frame. The distribution confirmed that the children’s preferences aligned with the dominant frame of reference in their language (see Table 1). We note that none of the children opted to use the divider between the tables as a reference point. Using the divider meant recreating an array in which the line of figurines which was closest to the divider at the first table remained closest to the divider at the second table.
252
L. Abarbanell, R. Montana, and P. Li
Table 1. Distribution of dominant response strategies in English (N=14) and Tseltal (N=14) speakers for the open-ended condition. The mean percentage correct and standard deviation are given for consistent (egocentric and geocentric) response strategies only.
English
Tseltal
Egocentric Response N=12 Easy: 86.11 (14.51) Hard: 71.69 (10.69)
Geocentric Response N=2 Easy: 75.00 (8.49) Hard: 70.83 (16.90)
Object-centric Response N=0
Untypable Response N=0
N=1 Easy: 61.11 Hard: 55.56
N=11 Easy: 76.77 (18.56) Hard: 56.82 (14.13)
N=0
N=2
% Correct
Non-open-ended conditions. For the two non-open-ended conditions, a 2 (difficulty: easy, hard) x 2 (trial type: egocentric, geocentric) x 2 (block order: Ego-to-Geo, Geoto-Ego) x 2 (language: English, Tseltal) ANOVA yielded main effects of difficulty (F(1,53) = 114.80, p < .001, ηp2 = .68), trial type (F(1,53) = 56.91, p < .001, ηp2 = .52), and language (F(1,53) = 22.83, p < .001, ηp2 = .30). The effect of block order was not significant (p = .86, n.s.). Collapsing across block order, Figure 2 depicts these main effects. Overall, participants did better on the easy (74.37% correct) than the hard trials (44.99% correct), better on the geocentric (74.07% correct) than the egocentric trials (45.29% correct), and the English-speakers (70.14% correct) did better than the Tseltal (51.51%).
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
English Tseltal
Ego Easy
Ego Hard
Geo Easy
Geo Hard
Fig. 2. Percentage correct by trial type and difficulty, collapsing across blocked conditions (Ego-to-Geo, Geo-to-Ego), comparing the English (N=25) and Tseltal speakers (N=32)
Revisiting the Plasticity of Human Spatial Cognition
253
% Correct
The ANOVA yielded three significant interactions, two of which involved language. First, the effect of language varied by difficulty (F(1,53) = 5.01, p = .03, ηp2 = .09), with the two groups diverging more on the easy (English: 88.33%, Tseltal: 63.45%; F(1,56) = 31.56, p < .001, ηp2 = .37) than the hard trials (English: 51.94%, Tseltal: 39.56%; F(1,56) = 5.96, p = .02, ηp2 = .10). Second, language by trial type was significant (F(1,53) = 20.26, p < .001, ηp2 = .28). However, this interaction was not due to the English-speaking children performing better on the egocentric than the geocentric trials and vice versa for the Tseltal-speaking children. In fact, both language groups performed better on the geocentric than the egocentric trials (English: 75.39% vs. 64.89%, F(1,24) = 4.46, p = .05 ηp2 = .16; Tseltal: 73.05% vs. 29.97%, F(1,31) = 71.05, p < .001, ηp2 = .70). An analysis of the simple effects showed that the language by trial type interaction was driven primarily by the comparatively lower performance of Tseltal-speaking children on the egocentric trials (English: 64.89%, Tseltal: 29.97%; F(1,56) = 27.22, p < .001, ηp2 = .33) but not the geocentric trials (English: 75.39%, Tseltal: 73.05%; F(1,56) = .48, p = .49, ηp2 = .01). Lastly, there was a three-way interaction between trial type, block order, and difficulty (F(1,53) = 16.20, p < .001, ηp2 = .23). As Figure 3 indicates, the order in which the trials were given (Ego-to-Geo, Geo-to-Ego) affected the hard, but not the easy trials. For the hard egocentric trials, the group tested on the egocentric block first performed better than the group tested on the geocentric block first (Ego-to-Geo: 42.05% correct vs. Geo-to-Ego: 20.39% correct). For the hard geocentric trials, the group tested on the geocentric block first now performed better than the group tested on the egocentric block first (Geo-to-Ego 65.48% vs. Ego-to-Geo: 51.92%). This finding suggests that the children had a difficult time switching from the response pattern of their first block to the requested reference frame of their second block.
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Ego-to-Geo Geo-to-Ego
Ego Easy
Ego Hard
Geo Easy
Geo Hard
Fig. 3. Percentage correct by trial type and difficulty, collapsing across language group (English, Tseltal), comparing Ego-to-Geo (N=29) and Geo-to-Ego (N=28)
254
L. Abarbanell, R. Montana, and P. Li
Results from the left/right comprehension tests for the Tseltal-speaking children were broken down according to whether the children participated in the open-ended condition or the instructed conditions. The children in the instructed conditions received explicit input regarding how to label their left and right hands and the animals next to their hands during the previous task while the children in the openended condition did not. Therefore, if children were unfamiliar with the words, leftright input could potentially benefit the children in the instructed condition and not the open-ended condition. The comprehension scores were scored in two ways with the consideration that for half of the trials, the doll faced the same direction as the child and for half of the trials the doll faced the opposite direction as the child. Although the third person possessive prefix in the test phase specified for animals to the doll’s left or right, computing another person’s left and right is relatively more difficult than computing one’s own left and right. Several studies with Englishspeaking children reveal that at the earliest stages of learning “left” and “right” children do not consider the left and right sides of others and apply their own left and right to interpret left-right commands (e.g., “choose the animal on the doll’s left side”; see Li, Shusterman, & McNaughton, under review). It is possible that Tseltalspeaking children would also ignore the doll’s facing direction. The percentages correct were therefore scored once in consideration of the doll’s facing direction and once ignoring the doll’s facing direction. In this latter case, the percentages correct were determined by the perspective of the child and not of the doll. See Table 2 for the results. As the results indicate, children’s performance hovered around chance. Only the Geo-to-Ego condition seemed to be above chance when the left-right comprehension score was coded from the perspective of the child, which could be due to the fact that they had just completed the Egocentric block prior to being tested on left-right comprehension and hence had some recollection as to how the experimenter applied the words to describe the animals. Table 2. Percent correct on left/right language comprehension tasks scored two ways and broken down by condition. The p-values reflect t-test comparisons against chance (.50).
Doll’s Facing Direction Considered
Doll’s Facing Direction Ignored
Open-ended (n = 14)
66.07% p = .07
51.79% p = .75
Ego-to-Geo (n=16) Geo-to-Ego (n=13)
59.38% p = .30 55.77% p = .27
62.50% p = .10 75.00% p = .02*
Revisiting the Plasticity of Human Spatial Cognition
255
2.3 Discussion Our findings confirm that there is a strong correlation between habits of language use and response preference on open-ended tasks (Brown & Levinson, 1993b; Pederson et al., 1998; Haun et al., 2011). On the instructed conditions, however, the Englishspeaking children were able to successfully switch to their non-preferred system while the Tseltal-speaking children were not. Further, both groups of children performed better on the geocentric than the egocentric trials. These findings contrast with those of Li et al. (2011) and Abarbanell (2010) who found a similar flexibility across both language groups and an egocentric advantage on some tasks. The performance of the English-speaking children also contrasts with the inflexibility of the Dutch-speakers tested by Haun et al. (2011). What could account for these divergent results? Close comparison of the procedures across studies yields at least two explanations. One important difference concerns the 90° turn we used here versus the 180° turn used by Li et al., together with the relatively close distance between tables we used here compared with the arrangement in Haun et al.. Studies in the spatial cognition literature, mainly testing speakers of languages that make use of left/right language, show that we mentally update the relationship between objects and how they look from our new perspective as we move, thereby maintaining a stable representation of the objects with respect to the environment (Simons & Wang, 1998). The representation of the objects with respect to the environment differs from and can compete with the representation of the objects as we initially viewed them. Some studies involving detecting changes in an array of objects show that under some circumstances our ability to detect changes is better when the display matches the geocentric than the egocentric view. This ability to update, however, is mediated by both the distance travelled and the degrees of displacement: The greater distance and degrees the harder it is to update how objects look in the new position. Furthermore, with 180° rotation, participants can no longer see the part of the room where they initially viewed the array, making it easier for them to use their own bodies as a point of reference. In our study, although the children could no longer view the first table, they could still see the side of the room where they initially memorized the array, likely strengthening the geocentric representation. Contrariwise, although Haun et al. also used a 90° rotation, the children walked around an entire building, causing a drastic change in environment. We would therefore predict that the Haun et al. task would favor an egocentric response, similar to the tasks in Li et al. Why, then, would the Hai||om-speaking children have difficulty with the egocentric condition of their task? Further, why would only the English-speaking children show flexibility regardless of rotation? One possibility, which we consider next, concerns the use of verbal versus nonverbal instructions. In Li et al., participants were shown the expected response, egocentric or geocentric, with the use of nonverbal prompts (e.g., by carrying a covered array to the second table so that it either rotated with their bodies or was held constant with the environment, and then uncovering the array to check their response). They were not required to decode spatial language that they rarely, if ever, use. Left/right systems, in particular, are notoriously difficult to acquire and ambiguous in use (Piaget, 1928; Rigal, 1994, 1996). Children who are only marginally versed in the use of such a system, like the Tseltal and Hai||om, would
256
L. Abarbanell, R. Montana, and P. Li
likely have a hard time understanding the egocentric instructions – even more so if they are administered via a video recording as in Haun et al. As the experimenters did not test the children’s left/right language comprehension, we have no way of assessing their understanding of these terms. Similarly, the Dutch-speaking children were told, e.g., to “place the western objects back on the western side of the array”. While we could find no studies on the acquisition of these terms in egocentric reference languages, as already noted, there is reason to believe that east/west terms may be less salient and less frequent in discourse directed to children than north/south (e.g., “We’re going on vacation on the North Sea”). Data from Tsotsil speakers in Zinacantan, Chiapas, Mexico, a Mayan language closely related to Tseltal that uses a similar uphill/downhill system, documents an asymmetry in children’s acquisition of these terms based on the social salience of different directions and locations. The children begin using the term olon ‘downhill’ while pointing randomly, only later affixing it to the correct direction, and still later acquiring the full upland/downland contrast (de León, 1994). It is therefore plausible that Dutch-speaking children, who do not habitually use a cardinal direction system, would show a similar asymmetry in their understanding of these terms. Further, as our data indicates, the children showed a perseveration effect on the hard trials: those who received the egocentric trials first did better on the hard egocentric trials than those who received the geocentric trials first, and vice-versa for the hard geocentric trials. Since Haun et al. did not test the Dutch-speaking children using counterbalanced instructed trials, it is possible that perseveration affected their performance on the geocentric trials, all of which were hard, in addition to any difficulty interpreting east/west geocentric instructions.
3 Experiment 2: Eliminating Left/Right Language To eliminate any difficulty presented by the use of left/right language, we tested a new group of Tseltal-speaking children using non-verbal means to convey the required reference frame, following the procedures developed by Li et al. (2011). Rather than telling the children to use the “left” and “right” sides of their bodies, the animals were arranged at Table 1 on a square cardboard tray, which was then covered prior to having the children carry the tray to the second table. In the egocentric condition, the children carried the tray so that it rotated with their body as they turned 90° at the second table, while in the geocentric condition the children dropped the tray at the second table prior to rotation (see Figure 4). In Experiment 2, children were always tested first on the egocentric trials before the geocentric trials since we were primarily interested in addressing whether Tseltalspeaking children would be able to produce the egocentric response using this nonverbal version of the task. We contrasted this group of children’s performance with Tseltal-speaking children’s performance in the ego-to-geo condition of Experiment 1. We did not test English-speaking children since they already did well on the egocentric trials in Experiment 1. We reasoned that if comprehending the left-right instructions was a problem for Tseltal children in Experiment 1, performance on the egocentric trials should be higher in Experiment 2 relative to Experiment 1 while performance on the geocentric trials should remain roughly the same across the two
Revisiting the Plasticity of Human Spatial Cognition
257
experiments. However, if Haun et al. are correct that children cannot flexibly produce the language-incongruent response, Tseltal-speaking children should perform similarly across Experiments 1 and 2, with equally poor performance on the egocentric trials and equally better performance on the geocentric trials. 3.1 Methods Participants and Setup. Sixteen Tseltal-speaking children that had not been tested previously (mean age = 7.88 years; SD = .62) were recruited from the same population and were tested under the same conditions as in Experiment 1. Procedure. The children were given three practice trials at the first table, identical to those in Experiment 1. They were next given two training trials using both tables to familiarize them with the expected solution to the task. For the Egocentric condition, with the help of a native Tseltal-speaking research assistant, the children were told that they would use the two sides of their body without the use of any left/right terms (using the term jejch for ‘side’). The research assistant then touched the child on each shoulder/arm in turn while naming the animal on the indicated side. The children were then asked to raise each hand and name the animal located at that hand, and to raise the hand towards which the animals were looking. For the Geocentric condition, the children were asked to point toward sunrise (slok’ib k’aal) and sunset (smalib k’aal). They were then shown which animal was located towards each direction, indicating the wall of the room as well as the (sunrise/sunset) direction. The children were then asked to repeat which animal was located towards sunrise and sunset and to name the direction towards which the animals were looking. When the children indicated that they were ready, they were instructed to carry the (uncovered) tray to the second table. In the Egocentric condition, the children held the box as they turned at the second table to face 90° from their original orientation. In the Geocentric condition, the children dropped the box at Table 2 prior turning so that the orientation of the box remained constant with the environment (see Figure 4). The box was then covered and the children were asked to recreate the array after first naming the animals that were located in the appropriate directions (at each hand, towards sunrise/sunset). The cover was then lifted so the children could see if their response was correct. They were prompted to correct any errors. The test trials were identical to the training trials, except that the tray was covered prior to walking to the second table. The children were prompted to verbally identify the animals located at each hand/towards sunrise/sunset for the first test trial only. The children received three easy and three hard trials, following the same protocol as in Experiment 1. For each trial, after the children’s response at the second table was recorded, the cover was lifted from the tray and the children were prompted to visually check and then correct their answers. In this respect, the procedure deviated from Experiment 1, where the children could not visually check their answers and were not prompted to make corrections. Following the main task, the children were tested on their comprehension of left/right terms using a doll facing away and towards them as in the previous experiment. In the same manner, they were also tested for their knowledge of
258
L. Abarbanell, R. Montana, and P. Li
sunrise/sunset terms, with the order (left/right, sunrise/sunset) blocked and counterbalanced across participants. Following both of these tests, the children were tested for their ability to identify left and right on their own bodies using simple commands (e.g., “raise your left arm”, “move your right leg”/”toya te a-k’ab ta izquierda”, “tija te a’w-akan ta derecha”). The children were given eight such trials, one for each arm/leg while facing towards and then away from the experimenter.
Fig. 4. Setup for Experiment 2 for the (a) Egocentric versus the (b) Geocentric trials
3.2 Results and Discussion The percent correct on the test trials were compared with the performance of the Tseltal-speaking children in the Ego-to-Geo condition from Experiment 1 (see Figure 5 for the comparison). A 2 (difficulty: easy, hard) x 2 (trial type: egocentric, geocentric) x 2 (experiment: Exp.1 Ego-to-Geo, Exp. 2 Tray) ANOVA yielded main effects of difficulty (F(1,30) = 31.56, p < .001, ηp2 = .51), trial type (F(1,30) = 29.46, p < .001, ηp2 = .50), and experiment (F(1,30) = 14.05, p = .001, ηp2 = .32). Overall, participants showed better performance on the easy (72.83% correct) than the hard trials (53.86% correct), on the geocentric (75.39% correct) than the egocentric trials (51.30% correct), and in Experiment 2 (73.44% correct) than Experiment 1 (53.26% correct). The ANOVA yielded two significant interactions. The trial type by difficulty interaction (F(1,30) = 6.88, p = .01, ηp2 = .19) revealed a bigger difference between the easy and hard trials for the geocentric trials (Easy: 88.37% vs. Hard: 62.41%; F(1, 31) = 54.43, p < .001, ηp2 = .64) than the Egocentric trials (Easy: 57.29% vs. Hard: 45.31%; F(1, 31) = 6.04, p = .02, ηp2 = .16). Crucially, the trial type by experiment interaction was significant (F(1,30) = 12.06, p = .002, ηp2 = .29), confirming that the improvement from Experiment 1 to Experiment 2 held only for the egocentric trials (Exp 1: 33.51% vs. Exp 2: 69.10%, F(1, 31) = 15.85, p < .001, ηp2 = .35) and not the geocentric trials (Exp 1: 73.00% vs. 77.78%, F(1, 31) = 1.31, p = .26, ηp2 = .04). As in the open-ended condition in Experiment 1, the language assessment revealed that these children had not fully acquired the left/right terms. They were at chance at identifying the animal to the left/right of the doll, both when facing away and towards their bodies. While they were above chance as a group at identifying their own
Revisiting the Plasticity of Human Spatial Cognition
259
left/right hands and legs, performance was not at-ceiling (75.0%). In contrast, their knowledge of sunrise/sunset was more robust (see Table 3). In sum, when given clear instructions that did not involve the use of left/right terms, the Tseltal-speaking children showed improved performance on the egocentric trials, while still showing an overall advantage for solving the task geocentrically.
% Correct
Table 3. Percent correct on left/right (LR) and sunrise/sunset (EW) language comprehension tasks for the tray condition (N=16), showing comparison against chance (.50)
Doll’s Facing Direction Considered
Doll’s Facing Direction Ignored
LR of Self (Body Parts)
EW
59.38% p = .16
56.25% p = .41
75.00% p = .006**
82.81% p < .001***
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Ego-to-Geo Tray
Ego Easy
Ego Hard
Geo Easy
Geo Hard
Fig. 5. Percentage correct by trial type and difficulty for the Tseltal-speakers only, comparing the Ego-to-Geo (Experiment 1, N=16) and Tray (Experiment 2, N=16) conditions
4 General Discussion The goal of these studies was to reconcile contradictions in the literature concerning the cognitive flexibility of different language groups for memorizing small-scale spatial arrays using different coordinate systems (Haun et al., 2011; Li et al., 2011). The present results both confirm and diverge from previous findings, offering an important point of comparison. Our open-ended task replicated what appears to be a robust correlation between linguistic and nonlinguistic preferences on tasks that have more than one correct solution. Our ambiguous tasks, however, concurred with the findings of Li et al. by demonstrating cognitive flexibility across language groups. Although Tseltal-speaking children had difficulty on the egocentric trials when
260
L. Abarbanell, R. Montana, and P. Li
instructed using left/right terms, their performance improved significantly with more explicit nonverbal instructions. These results support an alternative explanation for the inflexibility of the Hai||om children whose left/right comprehension was not assessed by Haun et al.: language affects speakers’ ability to comprehend verbal instructions. Our results diverged from Li et al., however, in one important respect. The 90° rotation and proximity between tables appeared to trigger mental updating of a stable array, resulting in better performance on the geocentric vs. egocentric trials for both Tseltal and English-speakers, despite the fact that the latter preferred the egocentric response on the open-ended task. This contrasts with the egocentric advantage observed among both language groups by Li et al., despite Tseltal-speakers’ preference for the geocentric solution on open-ended tasks (Brown & Levinson, 1993b). This dissociation between preference and performance supports the argument that pragmatic inferences are responsible for speakers’ preferences on open-ended tasks: Language may influence how speakers interpret what constitutes “sameness”, but this tells us little about how they reason about spatial relationships in their day-today lives (Li & Gleitman, 2002; Li et al., 2011). Moreover, these results suggest that task constraints rather than language determine which system is easier to use in any given context. Supporting this, we note that egocentric advantages have been observed among geocentric language speakers on tasks involving motion paths (Li et al., 2011; Mishra, Dasen & Niraula, 2003; Wassmann & Dasen, 1998; Senft, 2001). The relationship between language and thought in this domain does not appear to be one-to-one, as argued by Haun et al., rather, there is converging evidence of a dissociation between linguistic and nonlinguistic spatial representations. The present results predict that Hai||om and Dutch speakers will show a similar flexibility if given explicit nonverbal instruction on the Haun et al. task, with the distance between tables perhaps yielding an egocentric advantage across language groups. It is possible, however, that task-specific constraints interacted with language to produce the observed pattern of results. While it was relatively easy for the English-speaking children to switch to a geocentric response on a task that encourages mental updating of an environmentally stable array, it may be inherently more difficult to do so through environmental shifts, unless one’s language or culture requires a constant attunement to one’s environment. It is therefore possible that the Dutch-speaking children would continue to have difficulty on the geocentric trials of the Haun et al. task, even if given clearer instructions. Likewise, the flexibility of the Tseltal-speaking children on the egocentric trials of our task, despite their preference for using a geocentric reference frame on open-ended tasks (Brown & Levinson, 1993b), predicts a similar flexibility for the Hai||om if given clear instructions – especially on a task that should encourage an egocentric response. However, it is possible that cultural and environmental factors specific to the Hai||om make a geocentric response both more salient and more entrenched for this particular language group. The Hai||om are a semi-nomadic hunter-gatherer group whose survival depends on their attunement to subtle variations in soil type and vegetation. In addition to a linguistically strong east/west (sunrise/sunset) axis, speakers use a system of landscape terms on north/south, with the direction specified by each term shifting as one moves along the axis. These landscape-based terms are “ubiquitous” in Hai||om
Revisiting the Plasticity of Human Spatial Cognition
261
discourse, which is characterized by a “continuous flow of “topographical gossip”” (Widlok, 2008: 364-5, see also Widlok, 1996). Groups of people are named for the landscape they regularly inhabit (e.g., “people of the hard ground”), fusing landscape and people into what Widlok terms a “land-cum-people” terminology (2008: 366). It would not be surprising, then, if the Hai||om showed a stronger geocentric tendency than the Tseltal, whose agrarian lifestyle does not involve quite the same fusion between people and landscape, and where the visual saliency of the uphill/downhill slope may not demand the same level of mental updating as a contingent landscape system. It is possible, then, that the Tseltal-speakers would not show as strong a performance on geocentric trials as their Hai||om peers in the Haun et al. task, while the Hai||om may have difficulty suppressing a geocentric advantage on our task. If such culture-specific effects are found, it would argue that it is not the semantic system per se, but rather variation may arise across language groups that share the same semantic typology, depending on other cultural and environmental factors as well as the pragmatics with which each system is used. Such variations, however, are not likely to be absolute, but rather yield gradations in performance vis-à-vis the ease or difficulty with which inherent task constraints that favor the use of one system versus another can be overcome. Haun et al. concluded that the habitual use of a particular frame of reference in one’s language shapes speakers’ competence as well as their preferences in the domain of memory for small-scale spatial arrays. We argue that these conclusions are at best premature. Converging evidence suggests that linguistic and nonlinguistic spatial representations are largely independent, with task constraints determining which system is most readily employed. Different language groups may vary in the ease with which they are able to overcome task-specific constraints; however, the extent to which further task manipulations uncover variability across language groups that share the same semantic typology, will confirm that language is but one factor among many that may shape the margins of what is otherwise a largely universal cognitive domain. Acknowledgments. We thank the community of Tenejapa, Chiapas, Mexico, especially Antonieta Lopez Santiz for her assistance in translating and running the task, the family Lopez Santiz, and all of the participants in Tenejapa and the Cambridge, MA area. We also thank Susan Carey and Jesse Snedeker for invaluable discussions and Ruthe Foushee for helping to code the data. Funding was provided by a Wenner-Gren Post-Ph.D. Research Grant to L.A., and a Harvard College Research Program and Dean’s Summer Research Award to R.M. Part of this work was carried out by R.M. in fulfillment of an undergraduate thesis supervised by P.L.
References 1. Abarbanell, L.: Words and Worlds: Spatial Language and Thought among the Tseltal Maya. Harvard Graduate School of Education, Cambridge (2010) (unpublished Doctoral Dissertation) 2. Abarbanell, L.: Linguistic Flexibility in Frame of Reference Use among Adult Tseltal (Mayan) Speakers. Paper Presented at the Annual Meeting of the Linguistic Society of America. Anaheim, CA (2007)
262
L. Abarbanell, R. Montana, and P. Li
3. Bowerman, M., Levinson, S.C.: Language Acquisition and Conceptual Development. Cambridge University Press, Cambridge (2001) 4. Brown, P.: A Sketch of the Grammar of Space in Tzeltal. In: Levinson, S.C., Wilkins, D. (eds.) Grammars of Space, pp. 230–272. Cambridge University Press, Cambridge (2006) 5. Brown, P., Levinson, S.C.: ‘Left’ and ‘Right’ in Tenejapa: Investigating a Linguistic and Conceptual Gap. Z. Phon. Sprachwiss. Kommunforsch (ZPSK) 45(6), 590–611 (1992) 6. Brown, P., Levinson, S.C.: ‘Uphill’ and ‘Downhill’ in Tzeltal. Journal of Linguistic Anthropology 3(1), 46–74 (1993a) 7. Brown, P., Levinson, S.C.: Linguistic and Nonlinguistic Coding of Spatial Arrays: Explorations in Mayan Cognition. Cognitive Anthropology Research Group, Max Planck Institute for Psycholinguistics (1993b) 8. Cepeda, N.J., Kramer, A.F., Gonzales de Sather, J.M.C.: Changes in Executive Control Across the Life-span: Examination of Task Switching Performance. Developmental Psychology 37, 715–730 (2001) 9. de León, L.: Exploration in the Acquisition of Geocentric Location by Tzotzil Children. Linguistics 32, 857–884 (1994) 10. Gentner, D., Goldin-Meadow, S. (eds.): Language in Mind: Advances in the Study of Language and Thought. MIT Press, Cambridge (2003) 11. Gleitman, L., Papafragou, A.: Language and Thought. In: Holyoak, K.J., Morrison, B. (eds.) Cambridge Handbook of Thinking and Reasoning. Cambridge University Press, Cambridge (2005) 12. Gumperz, J.J., Levinson, S.C. (eds.): Rethinking Linguistic Relativity. Cambridge University Press, Cambridge (1996) 13. Haun, D.B.M., Rapold, C.J., Janzen, G., Levinson, S.C.: Plasticity of Human Spatial Cognition: Spatial Language and Cognition Covary across Cultures. Cognition 119(1), 70–80 (2011) 14. Levinson, S.C.: Frames of Reference and Molyneux’s Question: Cross-Linguistic Evidence. In: Bloom, P., Peterson, M.A., Nadel, L., Garrett, M.F. (eds.) Language and Space, pp. 109–169. MIT Press, Cambridge (1996) 15. Levinson, S.C.: Space in Language and Cognition. Cambridge University Press, Cambridge (2003) 16. Li, P., Gleitman, L.: Turning the Tables: Language and Spatial Reasoning. Cognition 83(3), 265–294 (2002) 17. Li, P., Abarbanell, L., Gleitman, L., Papafragou, A.: Spatial Reasoning in Tenejapan Mayans. Cognition 120(1), 33–53 (2011) 18. Luchins, A.S.: Mechanization in Problem Solving: The Effect of ‘Einstellung. In: Psychological Monographs, vol. 54(6), whole no. 248. American Psychological Association, Inc., Evanston (1942) 19. Majid, A., Bowerman, M., Kita, S., Haun, D.B.M., Levinson, S.C.: Can Language Restructure Cognition? The Case for Space. Trends in Cognitive Science 8(3), 108–114 (2004) 20. Mishra, R.C., Dasen, P.R., Niraula, S.: Ecology, Language, and Performance on Spatial Cognitive Tasks. International Journal of Psychology 38, 366–383 (2003) 21. Newcombe, N.S., Huttenlocher, J.: Making Space: The Development of Spatial Representation and Reasoning. MIT Press, Cambridge (2000) 22. Pederson, E., Danziger, E., Wilkins, D., Levinson, S.C., Kita, S., Senft, G.: Semantic Typology and Spatial Conceptualization. Language 74(3), 557–589 (1998) 23. Piaget, J.: Judgment and Reasoning in the Child. Routledge, London (1928) 24. Pinker, S.: The Stuff of Thought. Penguin Group, Inc., New York (2007)
Revisiting the Plasticity of Human Spatial Cognition
263
25. Polian, G.: New Insights on Spatial Frames of Reference in Tseltal. Paper Presented at the Annual Meeting of the Society for the Study of Indigenous Languages of the Americas. Baltimore, MA (2010) 26. Rigal, R.: Right-Left Orientation: Development of Correct Use of Right and Left. Perceptual and Motor Skills 79, 1259–1278 (1994) 27. Rigal, R.: Right-Left Orientation, Mental Rotation, and Perspective-taking: When can Children Imagine What People See from Their Own Viewpoint? Perceptual and Motor Skills 83(3), 831–843 (1996) 28. Senft, G.: Frames of Spatial Reference in Kilivila. Studies in Language 25(3), 521–555 (2001) 29. Li, P., Shusterman, McNaughton, A.: The Right Way to Learn ‘Right’ and ‘Left’ 30. Simons, D.J., Wang, R.F.: Perceiving Real World Viewpoint Changes. Psychological Science 9, 315–320 (1998) 31. Terrill, A., Burenhult, N.: Orientation as a Strategy for Spatial Reference. Studies in Language 32(1), 93–136 (2008) 32. Wassmann, J., Dasen, P.: Balinese Spatial Orientation: Some Empirical Evidence for Moderate Linguistic Relativity. The Journal of the Royal Anthropological Institute 4(1), 689–711 (1998) 33. Whorf, B.L.: Language, Thought and Reality: Selected Writings of Benjamin Lee Whorf. In: Carroll, J.B. (ed.) MIT Press, Cambridge (1956) 34. Widlok, T.: Topographical Gossip and the Indexicality of Hai||om Environmental Knowledge (Working Paper No. 37). Cognitive Anthropology Research Group. Max Planck Institute for Psycholinguistics, Nijmegen (1996) 35. Widlok, T.: Conducting Cognitive Tasks and Interpreting the Results: The Case of Spatial Inference Tasks. In: Wassmann, J., Stockhaus, K. (eds.) Experiencing New Worlds, pp. 258–280. Berghahn Books, Oxford (2007) 36. Yerys, B.E., Munakata, Y.: When Labels Hurt but Novelty Helps: Children’s Perseveration and Flexibility in a Card-sorting Task. Child Development 77(6), 1589–1607 (2006)
Linguistic and Cultural Universality of the Concept of Sense-of-Direction Daniel R. Montello and Danqing Xiao Department of Geography University of California, Santa Barbara Santa Barbara, California 93106 USA {montello,danqing}@geog.ucsb.edu
Abstract. We analyze self-reported sense-of-direction in samples of people from Santa Barbara, Freiburg, Saarbrücken, Tokyo, and Beijing. The Santa Barbara Sense-of-Direction Scale (SBSOD) by Hegarty and colleagues primarily assesses survey spatial abilities in directly-experienced environments. It was translated into German, Japanese, and Mandarin Chinese. Results suggest sense-of-direction is a unitary and meaningful concept across the five samples and four languages. In four of the samples, males report significantly better sense-of-direction than do females. Some variations are found across the five samples in overall level of sense-of-direction and in response patterns across the 15 items. Because it is strongly related to the survey spatial thinking that primarily underlies sense-of-direction, and because it can be counted in a relatively straightforward manner, we specifically examine thinking in terms of cardinal directions as a component of sense-of-direction, including conducting a count of cardinal-direction words from Internet corpora in the four languages. We find support for sense-of-direction as a coherent concept across the four languages and as a useful tool to measure individual differences in sense-ofdirection. We also consider linguistic/cultural variations in sense-of-direction, especially with respect to variations in physical environments. Keywords: Sense-of-direction, cross-linguistic, individual differences, spatial corpora.
1 Introduction Psychologists, educators, and others have attempted to characterize and measure spatial abilities (spatial intelligence) for well over a century, and over a hundred such tests have been devised and made publicly available [1, 2]. The great majority of these psychometric tests have attempted to measure spatial abilities with small, flat pictures that depict shapes, arrows, mazes, matrices of dots, and the like. Factor analytic studies have suggested that pictorial spatial abilities can be classified into two or three major factors, variously labeled by such terms as spatial visualization, spatial orientation, and spatial perception. However, this apparent diversity aside, conceptual and empirical considerations have inspired several researchers to question the adequacy of psychometric pictorial tests for describing and measuring spatial thinking M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 264–282, 2011. © Springer-Verlag Berlin Heidelberg 2011
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
265
abilities across the full range of everyday and specialized real-world tasks that people perform [3, 4, 5, 6]. Perhaps the spatial task that has been of greatest interest to researchers of human spatial cognition, including psychologists, geographers, linguists, neuroscientists, and information scientists, is navigation, particularly wayfinding. Wayfinding involves thinking and decision-making with information about environmental spaces, particularly distal places and features, and often requires combining orientation to one’s local surroundings with orientation to these distal places and features [7]. Wayfinding tasks include route choice and planning, pointing to nonvisible places, and creative navigation (such as shortcutting). Directly acquired environmental spatial information is learned as part of perceptual-motor experiences moving about actual environments, whether built or natural; they include buildings, neighborhoods, parks, and cities. Although it is plausible that pictorial spatial abilities would predict wayfinding abilities in directly-experienced environmental spaces, several attempts to demonstrate this have failed or been modestly successful at best [8, 9, 10]. More successful at predicting performance on environmental wayfinding tasks has been self-reported sense-of-direction (SOD). Kozlowski and Bryant [11] found that answers to a single 7- or 9-point rating scale item “How good is your sense of direction?” predicted error in pointing to cardinal directions, nonvisible campus landmarks, and locations within a large public building, with correlations around .5. Subsequently, others found similar results with single-item [12, 13] and multi-item assessments [9, 14]. These researchers also found that self-report scales, like actual environmental performance measures, correlated only weakly with traditional psychometric spatial tests. 1.1 Santa Barbara Sense-of-Direction Scale (SBSOD) Hegarty and her colleagues [15] at the University of California, Santa Barbara, developed a self-report instrument they named the Santa Barbara Sense-of-Direction Scale (SBSOD). Their express purpose was to develop a predictive test of environmental spatial skills, including wayfinding and learning the layouts of new environments. The scale includes 15 items in which respondents express their degree of agreement or disagreement (i.e., Likert-scale format) on a 7-point scale with statements about environmental spatial abilities, activities, and preferences. Besides “My ‘sense of direction’ is very good,” items refer to giving directions, judging distances, thinking in terms of cardinal directions, enjoyment of reading maps, getting lost, and so on. A total score is typically expressed as an average of the answers to the 15 items, with positively-worded items reverse scored so that a high score (close to 7.0) reflects a good self-reported sense of direction, and a low score (close to 1.0) reflects a poor self-reported sense of direction. The scale is reprinted in Appendix A. Details of the development, reliability testing, and validation of the SBSOD are found in [15]. Hegarty et al. reported the scale to have an internal consistency reliability of .88, and a test-retest reliability of .91 based on a retesting interval of 40 days. They also reported several content validation studies. One showed that the SBSOD correlated more highly with respondents’ errors in pointing to campus landmarks than in pointing blindfolded to objects in the room where they were
266
D.R. Montello and D. Xiao
standing. Another showed good correlations with blindfolded pointing to the origin of a walked route. Another showed stronger correlations with a directly experienced environment than one learned from a video or desktop virtual environment. A final study showed higher correlations with estimating directions between campus landmarks than with estimating directions between U.S. cities on a map. And correlations were just as high in this last study whether respondents filled out the scale before or after the criterion task, suggesting that sense of direction is a somewhat enduring trait and not just based on self-perceptions of recent performance; see [16]. The authors interpreted their findings to indicate that SOD primarily taps into environmental-scale spatial knowledge directly acquired via perceptual-motor experience in real environments. It does not predict map-acquired knowledge or that acquired in a desktop virtual environment as well, but it is likely to predict performance fairly well in more immersive virtual environments. Evidence also suggests that respondents interpret the phrase “sense of direction” somewhat literally, to refer especially to the ability to orient oneself so that a current or imagined heading in a real or imagined environment is coordinated with allocentric spatial knowledge, such as the locations of landmarks or absolute reference systems, including cardinal directions [17]. SOD does not correlate as strongly with performance on distance estimation or map sketching, although its predictive power there is likely to exceed .3 or .4, e.g., [18]. So it is fairly clear that SOD is about the ability to perform survey spatial tasks, not route tasks; see also [12, 13, 19, 20]. 1.2 Linguistic and Cultural Similarities and Differences in Self-reported SOD The original Santa Barbara SOD scale is written in American English. A persistent question of interest to researchers in spatial cognition concerns the degree to which spatial cognition is similar or different across different languages and different cultures [21, 22, 23, 24, 25]. If we grant that sense-of-direction is a coherent and meaningful concept in American English, we can ask whether it is a coherent and meaningful concept in other languages and cultures. This addresses theoretical questions about the cultural variability or universality of spatial cognition, such as questions about the innateness of spatial cognition, the role of language in spatial cognition, the role of culturally specific experiences other than language in spatial cognition, the role of the natural and built environment in spatial cognition, and so on. But it also addresses the practical issue of whether translations of the Santa Barbara SOD would be useful instruments for measuring individual differences in what the Santa Barbara version apparently measures fairly reliably. In this paper, we investigate how well and how the concept of sense-of-direction translates into some other languages. This also allows us to evaluate prospects for translating the Santa Barbara SOD Scale into useful measures of sense-of-direction in other languages. It would not be an exaggeration to say that a concern about the proper theoretical relationship between language and non-linguistic cognition, or even whether a concept like “non-linguistic cognition” is coherent, looms as a key issue throughout the history of philosophy, as well as the social and behavioral sciences [26, 27, 28, 29, 30, 31]. Within the past decade or two, some of the most important work on these issues has centered on spatial language and spatial cognition [23, 32, 33]. In particular, most of the cross-linguistic and cross-cultural research of the last decade
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
267
and a half has focused on systems (often called “frames”) of reference used by different languages, and how this might affect spatial thinking. Several different conceptual frameworks for understanding spatial reference systems in cognition have been proposed over the last several decades (even earlier in the case of Piaget’s work); e.g., [34, 35, 36]. A framework proposed by Levinson [37], specifically in the context of spatial language, has proven very influential. The framework overlaps with previous frameworks, of course, although it concerns only directional and not distance relations (such as “next to”). It includes three types of reference systems: absolute, relative, and intrinsic. Absolute linguistic systems code direction with respect to stable and global origins and axes that could be based on macroscopic features like mountain chains or celestial bodies. They also include systems based on the abstract coordinates of technologically developed societies, such as the latitude-longitude graticule, which Levinson does not explicitly discuss. Cardinal directions are a common example of an absolute system found in many languages. Relative linguistic systems code direction with respect to a person’s heading or facing direction. They are dynamic as people change heading, and they rarely if ever apply to orientation statements over very large areas of space. Left and right are a common example of relative systems in many languages. Intrinsic linguistic systems code direction with respect to the heading of an asymmetric environmental feature, which is stable insofar as the feature does not move, but also tends to apply only to orientation statements over relatively small areas around the origin feature. The front of a building would be an example of an intrinsic system, as long as “front” referred to the fixed side of the building with a feature like the entrance. Notable work by Levinson and his colleagues, summarized in [38] and [39], has led these researchers to propound a fairly strong form of linguistic relativity (the Whorfian hypothesis) in the domain of language involving spatial reference systems. They have studied groups of hunter-gatherers and simple agriculturalists who usually or always employ absolute reference systems, no matter the scale of space involved. That is, a modest number of language communities apparently rarely if ever employ relative reference. Native speakers of, for example, Guugu Yimithirr and Tzeltal use absolute reference to indicate directions, and according to these researchers, think about spatial directions with an absolute system, whether the spatial tasks are explicitly linguistic or not. Thus, “language can play a central role in the restructuring of human cognition” [39] and “language influences how people think, memorize, and reason about spatial relations and directions” [38]. Of course, cultural differences need not be due to language differences. Any variable that differs systematically with cultural-group membership could potentially cause differences in spatial thinking, including differences in spatial language itself. One candidate example of culturally-correlated variation that has frequently been hypothesized to influence spatial cognition and language is systematic differences in environmental activities and experiences. Indeed, such differences might explain cultural variation, sex-related variation, or even individual variation. Potential pertinent examples of such activities include independent travel away from home, map use, particular types of toy play, and video gaming [24, 40, 41, 42, 43, 44]. A second candidate that has frequently been hypothesized to influence spatial cognition and language is systematic differences in environmental structure and layout [45]. Specific examples that have been proposed include variations in the
268
D.R. Montello and D. Xiao
presence of vista-obstructing features [46], the presence of macroscopic features nearby (oceans, mountain chains) [47], and the presence of grid (rectilinear) versus radial versus irregular (organic) road layouts [48]. Notice that these are not entirely independent; the presence of mountains can provide a global reference, obstruct vistas, and impose curvilinear roads. There are good reasons to believe that these environmental variables influence cognition and language. Brown [49] compiled data from 127 globally distributed languages and reported variations in the preferred spatial reference system used in each language. His explanation for these variations was that if most members of a society live within circumscribed territories, geographic features—especially global features such as big mountains—will be used when references to direction in the environment are made. For instance, maintaining orientation in grids is easier than on oblique or curved road systems [50, 51]. Data reported by [52] and [53] on cardinaldirection usage in American English suggests that such terms are more common in regions of the U.S. that are less mountainous; at least in most areas west of the Appalachians, this also means the road and property layouts are more rectilinear. To the degree that features like these differ systematically in the environments of different cultural groups, one might expect some corresponding variation in spatial cognition and language [54].
2 Study 1: Cross-Linguistic/Nationality Comparison of the SBSOD In this study, we compare responses to versions of the SBSOD that were translated into German, Japanese, and Mandarin Chinese. We evaluate the similarities and differences in responses to the scale across datasets consisting of responses by native speakers of each language. 2.1 Methods 2.1.1 Respondents Comparisons in this paper are based on five separate datasets, collected from respondents who were university students in one of four countries (Table 1). The respondents were speakers of one of four different languages: English, German, Japanese, or Mandarin Chinese. Although some multilingualism undoubtedly exists among the respondents, we believe it to be modest, and that nearly all respondents filled out the SBSOD in their native language. Table 1. Respondent characteristics in the five datasets
City Santa Barbara Freiburg Saarbrücken Tokyo Beijing
Language of SOD Scale English German German Japanese Mandarin
Number of Respondents (Sex) 106 (48 F, 56 M, 2 ?) 90 (49 F, 41 M) 115 (93 F, 22 M) 137 (86 F, 51 M) 102 (29 F, 73 M)
Age in Years (Range) 19.9 (18-37) 21.4 (19-31) 22.0 (19-40) 21.7 (18-58) 18.5 (18-23)
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
269
At the time of filling out the scale, respondents were residing in Santa Barbara (CA), U.S.; Freiburg, Germany; Saarbrücken, Germany; Tokyo, Japan; and Beijing, China. All datasets consisted primarily of young adults of typical college age; the average age of the respondents in each dataset was between 18 and 22 years. Two respondents (one from Santa Barbara, one from Freiburg) were removed from the datasets because they did not respond to five or more items on the SOD scale. 2.1.2 Survey Instruments The original SBSOD developed by [15] and written in American English consists of 15 Likert-scale items. Each item asks respondents to rate their degree of agreement with a statement about their abilities or preferences at environmental spatial skills or tasks that relate to sense-of-direction. Ratings are on a 7-point scale with 1 being ‘strongly agree’ and 7 being ‘strongly disagree.’ As shown in Appendix A, seven items are stated positively (agreeing with the item reflects a good sense-of-direction) and eight are stated negatively. The English-language SBSOD was translated into German, Japanese, and Mandarin Chinese by a native speaker of the language into which the scale was translated; he or she was also a competent speaker of English as a second language. Translators attempted to translate the entire instructions and the content of each item as accurately as possible. The German version of the scale, known as the “Freiburg SBSOD,” was translated by S. Münzer and C. Hölscher; for comparison and discussion, and recognizing the many German speakers in the COSIT community, the German version is presented in Appendix B. The Japanese version, known as the “Tokyo SBSOD,” was translated by T. Ishikawa. The Mandarin version, known as the “Beijing SBSOD,” was translated by the second author. All versions are available from the authors of the present manuscript. 2.1.3 Procedure All data collection efforts were conducted by native speakers of the scale’s language. Each version of the scale presented the items to respondents in the same fixed order as in the original SBSOD. Respondents in Santa Barbara and Beijing completed the scale in a single group session during an introductory geography class (the respondents were mostly not geography students, however). Respondents in Tokyo and the German cities were recruited and filled out the scale individually. All respondents read the instructions and a consent statement before completing the scale. 2.2 Results and Discussion 2.2.1 Overall Responses Pattern As is customary with the SBSOD, we present the data in this paper by reverse-scoring positively-stated items so that a high score (close to 7.0) indicates a good sense-ofdirection. Table 2 presents the means and standard deviations of responses to each item in all five datasets.
270
D.R. Montello and D. Xiao
Beijing, China
Tokyo, Japan
Saarbrücken, Germany
Freiburg, Germany
Santa Barbara, USA
Table 2. Means and standard deviations of responses to each item in all five datasets, reversescored so that higher scores reflect better sense-of-direction
4.8a (1.6) 4.4a (1.4) 4.4a (1.5) 3.5b (1.5) 4.4a (1.9) ab a a c 4.3 (1.9) 4.5 (1.9) 4.7 (1.8) 3.3 (1.7) 3.8bc (2.1) 4.4a (1.6) 3.9ab (1.6) 3.3b (1.8) 3.4b (1.7) 4.1a (1.7) a ab ab c 4.8 (1.9) 4.5 (1.8) 4.2 (1.8) 3.3 (1.7) 4.0b (2.1) a b b b 3.6 (2.3) 2.6 (1.5) 2.0 (1.6) 2.4 (1.8) 3.9a (2.6) a a 4.0 (1.6) 4.5 (1.7) 4.1a (1.9) 3.9a (1.9) 4.0a (2.1) a b b b 4.2 (2.0) 4.6 (1.9) 3.8 (2.0) 4.0 (2.1) 5.6b (2.0) 5.1a (1.6) 4.9ab (1.4) 5.0ab (1.4) 3.6ab (1.8) 4.8c (2.0) 5.0a (1.6) 4.9ab (1.6) 4.4b (1.8) 3.7c (1.8) 5.2a (1.8) a a a a 4.1 (2.1) 3.6 (2.0) 3.4 (2.0) 3.5 (1.9) 3.5a (2.0) ab ab ab b 4.0 (1.7) 4.3 (1.8) 3.9 (1.9) 3.6 (1.6) 4.5a (1.8) a bc bc c 6.0 (1.4) 5.2 (1.7) 5.3 (1.6) 5.1 (1.5) 5.7ab (1.8) a a a a 4.4 (2.1) 4.6 (2.0) 4.2 (2.2) 4.0 (2.0) 4.0a (2.1) a ab b c 5.0 (1.7) 4.4 (1.9) 4.3 (1.9) 3.3 (1.6) 4.4ab (1.9) a b b c 5.3 (1.8) 4.5 (1.8) 4.6 (1.8) 3.7 (1.6) 4.5b (2.0) a ab b c 4.6 (1.1) 4.4 (1.0) 4.1 (1.1) 3.6 (1.1) 4.4ab (1.0) abc Within each row, cities with mean scores that do not share a superscript are significantly different from each other. For example, for item #1, the means for Santa Barbara, Freiburg, Saarbrücken, and Beijing all have only an ‘a’ superscript and do not significantly differ from each other. Tokyo’s mean has only a ‘b’ superscript and is significantly less than the means of each of the other datasets. 1 giving directions 2 memory for things 3 judging distances 4 sense of direction 5 cardinal directions 6 easily get lost 7 enjoy reading maps 8 understand directions 9 good reading maps 10 remember routes 11 giving directions 12 important to know 13 someone navigate 14 remember routes 15 good mental map Overall Mean
A multivariate approach to repeated measures was used to test the significance of differences in responses to the 15 items across the five datasets. The scores significantly differed as a function of dataset, F(4, 541) = 16.18, p<.0001. Post-hoc comparisons suggest that respondents from Santa Barbara, Freiburg, and Beijing form a group rating themselves best; respondents from Freiburg, Saarbrücken, and Beijing form a group rating themselves intermediately (i.e., Freiburg and Beijing ratings were poorer than Santa Barbara but not significantly, and they were better than Saarbrücken but not significantly); and respondents from Tokyo form a group rating themselves most poorly. However, the pattern of overall mean differences interacted with the 15 items, F(56, 2056) = 6.14, p<.0001. Simple effects tests revealed that all items differed significantly across the datasets except items #6 (I very easily get lost in a new city), #10 (I don't remember routes very well while riding as a passenger in a car), and #13 (I usually let someone else do the navigational planning for long trips). Table 2 uses subscripts to indicate which datasets did or did not differ significantly for a particular item.
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
271
In sum, mean SOD scores varied significantly across the language/ethnicity datasets, and more importantly, these differences varied in different ways across the 15 items of the scale. These results are quite ambiguous in and of themselves. We carried out several more analyses in an attempt to understand better the nature of similarities and differences in self-reported sense-of-direction across the datasets. 2.2.2 Sex Differences Males commonly report having a better sense-of-direction than do females, e.g., see [12, 55]. We examined potential sex differences in the five datasets. As Table 1 shows, the proportion of female and male respondents is quite variable across the five datasets, so we did not perform an omnibus significance test involving dataset, items, and sex as factors. Instead, we compared the average scores of females to those of males within each dataset. As Table 3 shows, in fact, males reported significantly better sense-of-direction in all datasets except Freiburg, where it was virtually identical for the sexes and did not significantly differ. Table 3. Mean SOD scores for female and male respondents in each dataset City Female Scores (n) Male Scores (n) Santa Barbara 4.3 (48) 4.9 (56)* Freiburg 4.4 (49) 4.3 (41) Saarbrücken 4.0 (93) 4.6 (22)* Tokyo 3.3 (86) 4.2 (51)* Beijing 4.0 (29) 4.6 (73)* *Means differ significantly at the p<.05 level or better.
Mean scores were better for males than for females, except in the Freiburg dataset. In the other four datasets, the mean difference favoring males is .6 of a scale point for Santa Barbara, .6 for Saarbrücken, .9 for Tokyo, and .6 for Beijing. The recurring pattern of similar sex-related differences across the datasets supports the idea that sense-of-direction is a coherent concept in German, Japanese, and Chinese. We have no explanation for the lack of sex differences in the Freiburg dataset; it cannot be anything linguistic or ethnic per se, as the Saarbrücken dataset shows sex differences like the other three datasets. The fact that the proportion of the two sexes is so variable across the five datasets (i.e., dataset and sex are confounded), and the fact that males rate themselves as having a better sense-of-direction in most of the datasets, suggests that we exercise caution in interpreting comparisons across the datasets. The poor overall scores for the Tokyo SBSOD can partially be accounted for by the large proportion of female respondents in that dataset. In fact, not only does the Tokyo dataset include a large proportion of female respondents, but the females in that dataset reported an exceptionally poor overall SOD, considerably poorer than males in the Tokyo data, as well as poorer than female and male respondents in the other four datasets. The Saarbrücken dataset has an even higher proportion of female respondents than the Tokyo dataset does, and its overall mean SOD score was second poorest after the Tokyo dataset.
272
D.R. Montello and D. Xiao
To help further disentangle dataset differences due to language, ethnicity, or culture from sex differences, we repeated our comparisons across the datasets, but separately within each sex. In fact, scores differed significantly for both sexes as a function of dataset and the interaction of dataset by item: for female respondents, dataset F(4, 298) = 11.96, p<.0001; dataset by items F(56, 1110.8) = 4.02, p<.0001; for male respondents, dataset F(4, 236) = 3.89, p<.01; dataset by items F(56, 869.6) = 2.95, p<.0001. We do not pursue further analysis of the pattern of item differences across datasets for each sex, but the point is clear that the dataset differences in overall SOD we find can only partially be attributed to a difference in the proportion of the two sexes in each dataset. 2.2.3 Factorial Structure and Internal Consistency Factor analysis was used to examine the factorial structure of each dataset (i.e., the analyzed matrix had item communalities on the diagonal, not 1.0 as in PCA). Eigenvalues express the variance accounted for by each factor; at a minimum, useful factors should have eigenvalues > 1.0, as otherwise, a multivariate factor does not even improve on the descriptive power of one of the original items from the scale. As Table 4 shows, the first factor extracted in each dataset accounts for a clear majority proportion of the variance making up the factors extracted (i.e., this is not the total variance of the original items): 79% for Santa Barbara, 73% for Freiburg, 76% for Saarbrücken, 82% for Tokyo, and 68% for Beijing. For each dataset, the next largest factor is largely irrelevant, with eigenvalues less than 1.0. Table 4. Exploring the unidimensionality of the SOD scale in each of the five datasets via factor analysis and internal consistency City Santa Barbara Freiburg Saarbrücken Tokyo Beijing
Cases1
106 89 114 137 101
Eigenvalue of First Factor Extracted
5.30 4.60 5.26 5.92 3.81
Eigenvalue of Second Factor Extracted
Internal Consistency2
0.69 0.85 0.89 0.79 0.78
.88 .84 .87 .89 .80
1
The number of respondents who provided responses to all 15 items in the scale. Cronbach’s α
2
Examining the pattern matrix of factor loadings for each item reveals that in all datasets, all 15 items loaded positively on the first factor. In fact, a strong majority of items in each dataset had loadings greater than .40 on the first factor, and only the Tokyo and Beijing datasets had even one item with a loading <.20. Furthermore, the pattern matrices for the first factor were rather similar across datasets. In each dataset, item #4 had nearly the strongest loading on the first factor of all the items, and it was the highest loading item on the Freiburg, Saarbrücken, and Beijing scales. Conversely, item #2 had the weakest loading on the first factor in all datasets except Beijing, where it had the third weakest loading. To compare the loading patterns across the datasets more systematically, we calculated correlation coefficients for each dataset in comparison to the Santa Barbara dataset by using items as the unit of analysis and correlating the 15
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
273
pairs of loadings from the datasets taken two at a time (thus, N=15 for each correlation). The pattern of loadings across the 15 items for the Santa Barbara dataset correlated substantially with the pattern for each of the other datasets: .49 with Freiburg, .62 with Saarbrücken, .61 with Tokyo, and .57 with Beijing. Finally, we examined the unidimensionality of the various datasets by examining the internal reliability of each. Table 4 shows that all five datasets have similarly high internal consistency, as reflected in Cronbach’s α values between .80 and .90. Thus, all language versions of self-reported SOD, administered in all places, have good internal consistency. By itself, this suggests relatively strong unidimensionality. Our examination of the factorial structure of the five datasets provides relatively strong evidence in favor of the coherence of sense-of-direction and the likely feasibility of translated versions of the Santa Barbara SOD as a useful tool in German, Japanese, and Mandarin Chinese. In each case, the scale has one very strong factor and no second factors of appreciable magnitude that would normally be considered in need of interpretation by guidelines common in factor analysis. The pattern of factor loadings is rather similar to the pattern for the original Santa Barbara scale for each of the non-English datasets, in each case sharing around 25% or more variance. All five datasets have internal consistency of .80 or better. The SOD scale measures mostly one thing in each dataset.
3 Study 2: Cross-Linguistic Comparison of Cardinal Direction Words from Internet Corpora As we discussed in the Introduction, linguistic differences in word use, particularly in the reference system used with spatial expressions, have been proposed to underlie differences in spatial thinking style and ability. Here, we present an exploratory attempt to test this idea by comparing word counts from Internet corpora developed by linguists with scores on the SOD scale. We chose to look specifically at the use of cardinal direction words in the corpora because that directly addresses the use of an absolute reference system (an abstract allocentric system [35]) in a way that can be identified and counted fairly unambiguously in corpora. Also, ability to think about space in absolute terms clearly relates to SOD overall, and, in fact, the SBSOD contains a specific item (#5) that concerns thinking in terms of cardinal directions. Although it would certainly be informative to perform broader and more general analyses of corpora for spatial terms, the other items in the SBSOD do not lend themselves to obvious word counts and do not obviously relate to spatial reference systems, widely recognized as fundamental to spatial language. Here, we examine the Internet corpora from all four languages in our datasets above: English, German, Japanese, and Mandarin Chinese. 3.1 Methods 3.1.1 Corpora To gain further insight into the relationship between self-reported sense-of-direction and the way languages differ in expressing spatial concepts, we analyzed linguistic Internet corpora created by the “Wacky” project. This uses a Web crawler to find web sites that contain particular words or phrases in different knowledge domains. The corpora are subjected to various filtering procedures, including the removal of very
274
D.R. Montello and D. Xiao
small and very large pages, duplicate pages, and pornographic sites (which use long strings of machine-generated text to fool search engines). Details about the creation of the corpus are found in [56]. We used frequency counts of single words from the corpora of Internet word forms in English, German, Japanese, and simplified Chinese from the Centre for Translation Studies at the University of Leeds found at http:// corpus.leeds.ac.uk/list.html. 3.1.2 Procedure We counted the frequency of words that explicitly referred to cardinal directions in the four language corpora. Small-case and capitalized versions of words were counted separately, as long as they occurred with a frequency of at least 5 per one million words (for context, “relaxing,” “pulmonary,” and “sang” appear with a frequency of 5.00 in English). Counted words in English included North (with the greatest frequency of cardinal-direction words in English of 158.44 per one million), South, West, East; north, south, west, east; Northern, Southern, Western, Eastern; northern, southern, western, eastern; Northwest, Northeast, Southwest, Southeast; and Midwest (other relevant terms did not occur with enough frequency). Although it might be considered more appropriate to exclude cardinal-direction words used as region names (such as “Midwest”), we realized that a count of single words in English would always include parts of region names that we could not identify and exclude (such as “North America”). Thus, we included region names in all four languages (such as “Osteuropa” in German). For the Japanese corpus, we included both kana and kanji versions of cardinal-direction words. However, we excluded single-letter references (N, S, E, W) in the four languages because we thought it would be too ambiguous as to whether they actually referred to cardinal directions or something else, like other names. In fact, we suspect that Japanese, in particular, may include substantial single-letter references to cardinal directions in its corpus. 3.2 Results and Discussion Table 5 gives the frequency count of cardinal direction words from the Internet corpora in the four languages we examined. These are broken down by variations of north, south, east, west, and combination words like northeast. The Mandarin Chinese count was the highest at over 2,000 per million words, more than twice the count of almost 1,000 in English, the second highest. The Japanese count was lower at about 700, and the German count was even lower at about 500. Table 5. Frequency counts per one million words of cardinal-direction words in English, German, Japanese, and Mandarin Chinese.1 Language North South East West Combined English 262.20 254.22 169.77 228.66 28.61 German 73.48 87.02 158.07 165.98 15.23 Japanese 193.82 64.17 295.07 99.84 38.53 Mandarin 760.26 425.98 381.61 462.42 121.91 1 All versions of a given term were counted, e.g., North, north, Northern, English; Norden, Nord, nördlichen, nördlich in German.
Total
943.46 499.78 691.43 2152.18 and northern in
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
275
These word counts compare rather closely to the pattern of responses to item #5 from the SOD scale that concerns whether a person thinks about his or her environment in terms of cardinal directions (reported in Table 2). Respondents from the Beijing dataset reported the strongest agreement with item #5, and the Mandarin corpora included the largest count of cardinal direction words, by far. Respondents from the Santa Barbara dataset reported the second strongest agreement with item #5, and the English corpora included the second largest count of cardinal direction words. Respondents from the Freiburg, Saarbrücken, and Tokyo datasets reported the weakest agreement with item #5 (significantly weaker than Beijing and Santa Barbara), and the German and Japanese corpora included the smallest count of cardinal direction words.
4 General Discussion Our studies are most consistent with the conclusion that sense-of-direction is a unitary and meaningful concept across the four languages we compared: it’s original American English, German, Japanese, and Mandarin Chinese. We also believe that the Santa Barbara Scale is useful as a predictive tool in all four languages, or at the least, that it deserves further investigation in these and other languages as a potentially useful tool. The strongest evidence to support these conclusions is the clear unidimensional factor structures and high internal consistency we find in all five datasets for the 15-item scale. The correlation of the factor loadings for each of the four non-English datasets with the original Santa Barbara data also supports these ideas. All items load positively on the first factor in all five datasets. In particular, item #4 that specifically mentions “sense of direction” loads the highest of all items in three of the datasets, and second highest in the other two. Similarly, item #2 has a very weak loading on the first factor in all five datasets. We also find support for the likely similarity of meaning of self-reported sense-of-direction across the datasets in the recurring pattern of sex-related differences (favoring males) across all but one of the datasets. Of course, although it is justified to say that the scale measures primarily one thing in each dataset, we cannot say for sure that this one thing is exactly the same in each dataset. That is, we cannot say confidently that respondents in the different datasets understand sense-of-direction in exactly the same way. In fact, we doubt they do. We would need to conduct content validation studies in the different language groups similar to those reported by [12] in order to establish that more convincingly. Even so, from what researchers have come to understand about the way natural language encodes meaning, we do not believe it is possible for the terms used in the original Santa Barbara Scale to be translated perfectly. This strikes us as a useful and interesting area for further research. Even if we accept the unitary nature of sense-of-direction across languages and cultures, and grant that its meaning is largely similar, we do not find that respondents in the five datasets have equivalent levels of sense-of-direction. Overall mean SOD varies significantly across the five datasets, with the Santa Barbara respondents reporting the best SOD, significantly better than the Saarbrücken and Tokyo respondents. The Tokyo respondents report the poorest SOD, significantly worse than
276
D.R. Montello and D. Xiao
all four of the other sets of respondents. This can partially be explained by the fact that the Tokyo dataset includes a large proportion of female respondents, and female respondents report poorer SOD than male respondents in all but one of the datasets. But even when the datasets are compared within the two sexes, we find significant differences among the datasets overall and in interaction with the different items. Taken together, our data analyses support the idea that sense-of-direction is a very similar but non-identical concept across American English, German, Japanese, and Mandarin Chinese languages/cultures. They also suggest that members of different language/culture groups may in fact differ somewhat in their sense-of-direction. Of course, in the absence of validation testing, we recognize that cultural differences in modesty (self-assertion) could contribute to overall differences in mean reported sense-of-direction across respondent groups. Indeed, such a claim would be an attractive explanation for sex-related differences found within a culture, except that in the case of American respondents, evidence reported in studies such as [12, 15, 55] shows that the better SOD scores typically reported on average by males actually correspond to better performance on various survey-style spatial tasks. In any case, it bears emphasizing that cultural differences in modesty would only help explain overall mean differences and not patterns of internal consistency, factor structure, or other aspects of our data. If the analyses do in fact provide evidence for measurable differences in sense-ofdirection across the five datasets, and not just differences in reporting style, can we address why this might be so? As we reviewed in the Introduction, most of the literature on cultural differences in spatial cognition from the last couple decades has focused on the use of reference systems in different languages. Of course, all four languages we examine use all three systems (absolute, relative, intrinsic), and we presume they are similar in varying the use of different systems according to factors like the scale of space, indoor-outdoor, and so on, e.g., [33]. Thus, our results provide no particular evaluation of the claims of Levinson and his colleagues [37, 38]. They do increase our interest in attempting to assess self-reported sense-of-direction among some of the language groups studied by Levinson et al., although we anticipate that adequate translation would be much more difficult with people from much less technologically developed lifestyles. Levinson et al. have reported observations of impressively good performance by their participants on task such as maintaining orientation over long-distance travel, pointing to non-visible landmarks, and so on. This implies to us that these participants would rate themselves as better on a selfreport sense-of-direction scale than our American, German, Japanese, or Chinese respondents. But given the relative nature of psychometric judgments, how much would Levinson et al.’s participants adjust their internal scales of what constitutes good or poor ability and preference to take account of their overall much better levels? One could also take Levinson et al.’s interpretations to suggest that sense-ofdirection does not vary much within the groups they study, but is uniformly excellent across individuals. This is very definitely not what we and other researchers find within our samples; performance on wayfinding and learning tasks that influence assessments of sense-of-direction vary greatly among individuals, even within samples of respondents that are fairly uniformly educated and otherwise able. For some of us, this makes the results that Levinson et al. report that much more astonishing.
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
277
We do find evidence in our analysis of Internet corpora for the suggestion that respondents who speak languages referring more frequently to cardinal directions are more likely to self-report that they think in terms of cardinal directions. We also cited a couple of other studies that provide evidence for regional differences in the use of cardinal directions specifically within the U.S. [52, 53]. We are thus curious about regional comparisons within the U.S. on the SBSOD. We do recognize that the cardinal directions are just one example of the forms that absolute systems of reference can take. In many parts of the world where cultures have less modern technological sophistication (compasses, globally accurate maps, GPS units), absolute systems are often based on local environmental features like mountains or oceans that are so large, they function as global references over large portions of home territory, if not all of it. Besides the choice of linguistic terms, as we reviewed above, people living in different parts of the world live in residential environments that are not identical. We increasingly find merit in the idea that living in flat, hilly, or mountainous terrain; living near or far from large water bodies; or living in regions with or without rectilinear roads and property lines (including whether they are aligned with cardinal directions or not) influences the ease of using different reference systems. More specifically, we think that the use of absolute systems—and the particular absolute system that is used—does depend to a considerable degree on environmental factors like these. We note that among the settings of our five datasets, Santa Barbara (and the U.S. more generally) and Beijing are most clearly organized into grid patterns of city layout. Furthermore, we accept that when meaning is habitually expressed in natural language in particular ways, people become accustomed to thinking in that way. They can think in that manner more quickly and more competently. This is an endorsement of a version of linguistic relativity but not necessarily a version that starts the nexus of causation in the language choices themselves. In other words, we would not call this a linguistic difference per se, and we are not sure if this would appropriately be called a cultural difference. This theoretical story suggests that speakers of any language from any culture might eventually come to change their spatial thought and language when moving to new environments differing in relevant properties. Research on changes in spatial cognition with changes in residential environment thus intrigues us. Also intriguing is the possibility that the role of factors like habitual language use and residential environment may vary systematically across individuals within language/cultural groups (or across homogeneous language/culture groups based on sex, age, expertise, and so on). Finally, the analyses we report here highlight shortcomings of the Santa Barbara Sense-of-Direction Scale in two ways. First, the scale is not necessarily the best possible self-report scale, even in American English. For instance, if we were to revise the scale, we would not include items such as #2. It does not contribute to the scale as a whole as well as one would want, either empirically or conceptually; i.e., “having a poor memory for where you left things” is probably not strongly part of sense-of-direction when the things in question are keys or sunglasses. We also accept Sholl et al.’s [17] findings that a single-item question about sense-of-direction (i.e., our item #4) often works almost as well as the 15-item scale. Second, we appreciate that sense-of-direction is most clearly a measure of survey spatial abilities in environmental spaces. Researchers are interested in other aspects of
278
D.R. Montello and D. Xiao
spatial cognitive skills and preferences, including reasoning with pictures and objects, reasoning about routes (which can be done with or without survey reasoning), reasoning with cartographic maps and other geo-information systems, giving and interpreting verbal directions, and more. Aside from continued attempts to develop valid and reliable ways to measures these other skills and preferences, we find value in continued research on cross-linguistic, cross-cultural, and cross-environmental similarities and difference. Acknowledgements. We thank Christoph Hölscher, Stefan Münzer, and Toru Ishikawa for generously sharing their datasets with us. Thanks to Yu Liu, who helped the second author collect the Beijing SBSOD data, and to Stefan Gries, who helped us find the corpora needed for the word frequency comparison. We also acknowledge the helpful observations of three anonymous reviewers of an earlier version of this manuscript. Finally, we appreciate the valuable and stimulating discussions of these and other issues spatially cognitive with Mary Hegarty and the other members of SCRAM.
References 1. Carroll, J.B.: Human Cognitive Abilities: A Survey of Factor-Analytic Studies. Cambridge University Press, Cambridge (1993) 2. Eliot, J.: Models of Psychological Space: Psychometric, Developmental, and Experimental Approaches. Springer, New York (1987) 3. Allen, G.L., Kirasic, K.C., Dobson, S.H., Long, R.G., Beck, S.: Predicting Environmental Learning from Spatial Abilities: An Indirect Route. Intell. 22, 327–355 (1996) 4. Bryant, K.J.: Geographical/Spatial Orientation Ability Within Real-World and Simulated Large-Scale Environments. Multivar. Beh. Res. 26, 109–136 (1991) 5. Eliot, J.: A Classification of Object and Environmental Spatial Tests. Perc. Motor Skills 59, 171–174 (1984) 6. Evans, G.W.: Environmental Cognition. Psych. Bull. 88, 259–287 (1980) 7. Montello, D.R.: Navigation. In: Shah, P., Miyake, A. (eds.) The Cambridge Handbook of Visuospatial Thinking, pp. 257–294. Cambridge University Press, Cambridge (2005) 8. Hegarty, M., Montello, D.R., Richardson, A.E., Ishikawa, T., Lovelace, K.: Spatial Abilities at Different Scales: Individual Differences in Aptitude-Test Performance and Spatial-Layout Learning. Intell. 34, 151–176 (2006) 9. Lorenz, C.A., Neisser, U.: Ecological and Psychometric Dimensions of Spatial Ability. Report #10, Emory Cognition Project, Emory University, Nashville, TN (July 1986) 10. Pearson, J.L., Ialongo, N.S.: The Relationship Between Spatial Ability and Environmental Knowledge. J. Envir. Psych. 6, 299–304 (1986) 11. Kozlowski, L.T., Bryant, K.J.: Sense of Direction, Spatial Orientation, and Cognitive Maps. J. Exper. Psych.: Hum. Perc. Perf. 3, 590–598 (1977) 12. Prestopnik, J.L., Roskos-Ewoldsen, B.: The Relations Among Wayfinding Strategy Use, Sense of Direction, Sex, Familiarity, and Wayfinding Ability. J. Envir. Psych. 20, 177– 191 (2000) 13. Sholl, M.J.: The Relationship Between Sense of Direction and Mental Geographic Updating. Intell. 12, 299–314 (1988) 14. Bryant, K.J.: Personality Correlates of Sense of Direction and Geographical Orientation. J. Person. Soc. Psych. 43, 1318–1324 (1982)
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
279
15. Hegarty, M., Richardson, A.E., Montello, D.R., Lovelace, K., Subbiah, I.: Development of a Self-Report Measure of Environmental Spatial Ability. Intell. 30, 425–447 (2002) 16. Cornell, E.H., Sorenson, A., Mio, T.: Human Sense of Direction and Wayfinding. Ann. Assoc. Amer. Geog. 93, 399–425 (2003) 17. Sholl, M.J., Kenny, R.J., DellaPorta, K.A.: Allocentric-Heading Recall and Its Relation to Self-Reported Sense-of-Direction. J. Exper. Psych.: Learn. Mem. Cog. 32, 516–533 (2006) 18. Ishikawa, T., Montello, D.R.: Spatial Knowledge Acquisition from Direct Experience in the Environment: Individual Differences in the Development of Metric Knowledge and the Integration of Separately Learned Places. Cog. Psych. 52, 93–129 (2006) 19. Lawton, C.A.: Strategies for Indoor Wayfinding: The Role of Orientation. J. Envir. Psych. 16, 137–145 (1996) 20. Malinowski, J.C., Gillespie, W.T.: Individual Differences in Performance on a LargeScale, Real-World Wayfinding Task. J. Envir. Psych. 21, 73–82 (2001) 21. Dehaene, S., Izard, V., Pica, P., Spelke, E.: Core Knowledge of Geometry in an Amazonian Indigene Group. Science 311, 381–384 (2006) 22. Haviland, J.B., Levinson, S.C.: Special Issue: Spatial Conceptualization in Mayan Languages. Ling. 32 (1994) 23. Munnich, E., Landau, B., Dosher, B.A.: Spatial Language and Spatial Representation: A Cross-Linguistic Comparison. Cog. 81, 171–207 (2001) 24. Munroe, R.H., Munroe, R.L., Brasher, A.: Precursors of Spatial Ability: A Longitudinal Study Among the Longoli of Kenya. J. Soc. Psych. 125, 23–33 (1985) 25. Xiao, D., Liu, Y.: Study of Cultural Impacts on Location Judgments in Eastern China. In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 20–31. Springer, Heidelberg (2007) 26. Boroditsky, L.: Linguistic Relativity. In: Nadel, L. (ed.) Encyclopedia of Cognitive Science, pp. 917–921. MacMillan Press, London (2003) 27. Gumperz, J.J., Levinson, S.C. (eds.): Rethinking Linguistic Relativity. Cambridge University Press, Cambridge (1996) 28. Heider, E.R.: Universals in Color Naming and Memory. J. Exper. Psych. 93, 10–20 (1972) 29. Kay, P., Kempton, W.: What Is the Sapir-Whorf Hypothesis? Amer. Anth. 86, 65–79 (1984) 30. Lakoff, G.: Women, Fire, and Dangerous Things. University of Chicago Press, Chicago (1987) 31. Pinker, S.: The Language Instinct. William Morrow and Company, New York (1994) 32. Levinson, S.C., Kita, S., Haun, D.B.M., Rasch, B.H.: Returning the Tables: Language Affects Spatial Reasoning. Cog. 84, 155–188 (2002) 33. Li, P., Gleitman, L.: Turning the Tables: Language and Spatial Reasoning. Cog. 83, 265– 294 (2002) 34. Habel, C., Werner, S.: Special Issue on “Spatial Reference Systems”. Spat. Cog. Comp. 1 (1999) 35. Hart, R.A., Moore, G.T.: The Development of Spatial Cognition: A Review. In: Downs, R.M., Stea, D. (eds.) Image and Environment, pp. 246–288. Aldine, Chicago (1973) 36. Klatzky, R.: Allocentric and Egocentric Spatial Representations: Definitions, Distinctions, and Interconnections. In: Freksa, C., Habel, C., Wender, K. (eds.) Spatial Cognition 2000. LNCS (LNAI), vol. 1849, pp. 1–17. Springer, Heidelberg (1998) 37. Levinson, S.C.: Frames of Reference and Molyneux’s Question: Crosslinguistic Evidence. In: Bloom, P., Peterson, M.A., Nadel, L., Garrett, M.F. (eds.) Language and Space, pp. 109–169. The MIT Press, Cambridge (1996) 38. Levinson, S.C.: Space in Language and Cognition: Explorations in Cognitive Diversity. Cambridge University Press, Cambridge (2003)
280
D.R. Montello and D. Xiao
39. Majid, A., Bowerman, M., Kita, S., Haun, D.B.M., Levinson, S.C.: Can Language Restructure Cognition? The Case for Space. Trends Cog. Sci. 8, 108–114 (2004) 40. Matthews, M.H.: Gender, Home Range and Environmental Cognition. Transactions of the Inst. Brit. Geog. 12, 43–56 (1987) 41. Newcombe, N., Dubas, J.S.: A Longitudinal Study of Predictors of Spatial Ability in Adolescent Females. Child Devel. 63, 37–46 (1992) 42. Tracy, D.M.: Toy-Playing Behavior, Sex-Role Orientation, Spatial Ability, and Science Achievement. J. Res. Sci. Teach. 27, 637–649 (1990) 43. Uttal, D.H.: Seeing the Big Picture: Map Use and the Development of Spatial Cognition. Devel. Sci. 3, 247–264 (2000) 44. Waller, D.: Individual Differences in Spatial Learning from Computer-Simulated Environments. J. Exper. Psych.: App. 6, 307–321 (2000) 45. Rapoport, A.: Environmental Cognition in Cross-Cultural Perspective. In: Moore, G.T., Golledge, R.G. (eds.) Environmental Knowing, pp. 220–234. Dowden, Hutchinson & Ross, Stroudsburg, PA (1976) 46. Turnbull, C.M.: Some Observations Regarding the Experiences and Behavior of the BaMbuti Pygmies. Amer. J. Psych. 74, 304–308 (1961) 47. Lynch, K.: The Image of the City. MIT Press, Cambridge (1960) 48. Freundschuh, S.M.: The Effect of the Pattern of the Environment on Spatial Knowledge Acquisition. In: Mark, D.M., Frank, A.U. (eds.) Cognitive and Linguistic Aspects of Geographic Space, pp. 167–183. Kluwer Academic Publishers, Dordrecht (1991) 49. Brown, C.H.: Where Do Cardinal Direction Terms Come from? Anthrop. Ling. 25, 121– 161 (1983) 50. Davies, C., Pederson, E.: Grid Patterns and Cultural Expectations in Urban Wayfinding. In: Montello, D.R. (ed.) COSIT 2001. LNCS, vol. 2205, pp. 400–414. Springer, Heidelberg (2001) 51. Montello, D.R.: Spatial Orientation and the Angularity of Urban Routes: A Field Study. Envir. Beh. 23, 47–69 (1991) 52. Lawton, C.A.: Gender and Regional Differences in Spatial Referents Used in Direction Giving. Sex Roles 44, 321–337 (2001) 53. Xu, S., Jaiswal, A., Zhang, X., Klippel, A., Mitra, P., MacEachren, A.M.: From Data Collection to Analysis: Exploring Regional Linguistic Variation in Route Directions by Spatially-Stratified Web Sampling. In: Ross, R.J., Hois, J., Kelleher, J. (eds.) Computational Models of Spatial Language Interpretation (CoSLI) Workshop at Spatial Cognition 2010, pp. 49–52. Mt. Hood, Oregon (2010) 54. Montello, D.R.: How Significant Are Cultural Differences in Spatial Cognition? In: Kuhn, W., Frank, A.U. (eds.) COSIT 1995. LNCS, vol. 988, pp. 485–500. Springer, Heidelberg (1995) 55. Montello, D.R., Lovelace, K.L., Golledge, R.G., Self, C.M.: Sex-Related Differences and Similarities in Geographic and Environmental Spatial Abilities. Ann. Assoc. Amer. Geog. 89, 515–534 (1999) 56. Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Lang. Resourc. Eval. 43, 209–226 (2009)
Linguistic and Cultural Universality of the Concept of Sense-of-Direction
281
Appendix A. Instructions and items from the original English-language Santa Barbara Sense-of-Direction Scale, presented in this order to all respondents. This questionnaire consists of several statements about your spatial and navigational abilities, preferences, and experiences. After each statement, you should circle a number to indicate your level of agreement with the statement. Circle “1” if you strongly agree that the statement applies to you, “7” if you strongly disagree, or some number in between if your agreement is intermediate. Circle “4” if you neither agree nor disagree. Each item rated by circling a number on a 7-point Likert scale (provided after each item in original scale): strongly agree 1 2 3 4 5 6 7 strongly disagree 1. I am very good at giving directions. 2. I have a poor memory for where I left things. 3. I am very good at judging distances. 4. My “sense of direction” is very good. 5. I tend to think of my environment in terms of cardinal directions (N, S, E, W). 6. I very easily get lost in a new city. 7. I enjoy reading maps. 8. I have trouble understanding directions. 9. I am very good at reading maps. 10. I don't remember routes very well while riding as a passenger in a car. 11. I don't enjoy giving directions. 12. It's not important to me to know where I am. 13. I usually let someone else do the navigational planning for long trips. 14. I can usually remember a new route after I have traveled it only once. 15. I don't have a very good “mental map” of my environment.
282
D.R. Montello and D. Xiao
Appendix B. Instructions and items from the German-language Freiburg Santa Barbara Sense-of-Direction Scale, presented in this order to all respondents. Dieser Fragebogen besteht aus verschiedenen Aussagen über Ihre räumlichen Fähigkeiten, Vorlieben und Erfahrungen sowie Ihre Fähigkeiten, Vorlieben und Erfahrungen beim Finden von Wegen. Nach jeder Aussage sollen Sie einen Kreis um diejenige Zahl ziehen, die den Grad Ihrer Zustimmung mit dieser Aussage am besten ausdrückt. Markieren Sie die „1“ wenn Sie stark zustimmen, dass diese Aussage für Sie zutrifft, markieren Sie „7“ wenn Sie dies stark ablehnen oder markieren Sie eine Zahl dazwischen, wenn Ihre Zustimmung dazwischen liegt. Markieren Sie die „4“ wenn Sie weder zustimmen noch ablehnen. Each item rated by circling a number on a 7-point Likert scale (provided after each item in original scale): stimme stark zu 1 2 3 4 5 6 7 lehne stark ab 1. Ich bin sehr gut im Geben von Wegbeschreibungen. 2. Ich kann mir nur schlecht merken, wo ich Dinge liegen gelassen habe. 3. Ich bin sehr gut im Schätzen von Entfernungen. 4. Mein „Orientierungssinn“ ist sehr gut. 5. Wenn ich über meine Umgebung nachdenke, verwende ich meist die vier Himmelsrichtungen (N,S,O,W). 6. In einer neuen Stadt verlaufe ich mich sehr leicht. 7. Landkarten lesen macht mir Spaß. 8. Ich habe Probleme, Wegbeschreibungen zu verstehen. 9. Ich bin sehr gut im Kartenlesen. 10. Als Beifahrer im Auto erinnere ich mich nicht sehr gut an die gefahrenen Strecken. 11. Ich gebe nicht gerne Wegbeschreibungen. 12. Für mich ist es nicht wichtig, zu wissen wo ich bin. 13. Normalerweise überlasse ich anderen die Wegeplanung für längere Fahrten. 14. In der Regel kann ich mich an einen neuen Weg erinnern, wenn ich ihn lediglich einmal zurückgelegt habe. 15. Ich habe keine sehr gute „innere Karte“ meiner Umgebung.
Towards a Formalization of Social Spaces for Socially Aware Robots Felix Lindner and Carola Eschenbach Knowledge and Language Processing Department of Informatics University of Hamburg Vogt-Kölln-Straße 30, 22527 Hamburg {lindner,eschenbach}@informatik.uni-hamburg.de
Abstract. This article presents a taxonomy of social spaces distinguishing five basic types: personal space, activity space, affordance space, territory, and penetrated space. The respective space-constituting situations and the mereotopological structure for each social space type are specified. We show how permissions for actions of agents in social spaces can be modeled using the situations calculus. Specifications of social spaces and permissions build the fundament for socially aware action planning.
1
Introduction
Human-robot interaction (HRI) is concerned with the design of interaction between robots and humans and, thereby, to foster the acceptability of robots in every day situations, e.g., at home, at work, in museums, or in hospitals. In this respect, social robots are defined as “autonomous or semi-autonomous robots that interact and communicate with humans by following the behavioral norms expected by the people with whom the robot is intended to interact” [1]. A recent long-term study shows that the behavior of state-of-the-art service robots used in hospitals do not always meet people’s expectations [15]. One insight gained from this study is that humans are sensitive to the spatial behavior of robots: Humans feel offended by robots cutting in on their paths, standing in the way, or approaching inappropriately close. Particularly, as robots are situated in physical space, socially adequate behavior inevitably involves socially adequate spatial behavior, which demands for the capability to reason about social restrictions regarding the performance of actions in spatial regions carrying social meaning. To design spatial behaviors for robots that match people’s expectations, we model human social spatial behavior that is phenomenologically described in literature on human social interaction. Drawing from ethnological studies [6,7] and from existing approaches to conceptualize spacing in social sciences [13], we propose qualitative formalizations of social spaces. These formal specifications can be directly exploited for socially aware symbolic action planning in robotics. Our approach complements empirical findings and quantitative data M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 283–303, 2011. c Springer-Verlag Berlin Heidelberg 2011
284
F. Lindner and C. Eschenbach
raised during numerous HRI experiments, e.g., distances and orientations kept in human-robot encounters [8,22]. This article is structured as follows: Section 2 outlines existing types of social spaces described in social sciences literature on human-human interaction and discusses their relevance for HRI. In Sect. 3, a taxonomy of social spaces is formally specified. Based on these formalizations, an axiomatization of permissions with regard to the social adequacy of actions in social spaces using the situation calculus [20] is proposed in Sect. 4. We introduce a simple action planner making use of the knowledge about social spaces and their respective social restrictions.
2
Social Spaces
Löw [13] defines social spaces as relational arrangements of living beings and social objects at places. In the constitution of social spaces, two sub-processes are involved: Spacing is the process of acting that leads to a specific arrangement of objects and living beings, and synthesis is the process of integrating, perceiving, and interpreting spatial constellations as social entities. From a HRI point of view, the relevance of social spaces is twofold: On the one hand, robots are physically situated and, as they act in the environment, they produce social spaces. This social space production should take place in a socially adequate manner. On the other hand, a service robot will be faced with social spaces constituted by humans. The robot then should know about the social meaning of social regions to adapt its behavior accordingly in order to match people’s expectations. In this article, both of the roles of social spaces are described. However, our specification of socially aware spatial behavior focuses on socially adequate action sequences in the face of social spaces constituted by other agents, thereby neglecting the spaces produced by the robot while acting. In particular, we consider entering and parking in social regions: Our robot ought not to enter regions it is not allowed to enter, it ought to avoid blocking certain social regions by parking there or by putting objects in such regions. Social spaces constitute a spatial layer that can be described by topological terms using regions as the basic spatial entities. Although the depictions in this section inevitably contain geometric representations, we do not discuss the geometry of social spaces, as a high diversity of factors determine the shape and size of social space geometries. The geometric grounding of social spaces varies with time, such that at certain times different social spaces can overlap or influence each other’s geometric extension. Nevertheless, the internal topological structure of social spaces is temporally invariant. Therefore, topological notions are useful for describing the time-invariant spatial properties of the structure of social spaces. Specifically, topological specifications support reasoning about action permissions. In the following, we distinguish five types of social spaces: personal space, activity space, affordance space, territory, and penetrated space.
Towards a Formalization of Social Spaces for Socially Aware Robots
2.1
285
Personal Space
In his seminal work [7], Hall describes personal space as an invisible ellipsoidshaped space surrounding each human. The personal space consists of four basic regions: intimate region, personal region, social region, and public region. Each of these four regions is subdivided into a corresponding near region and a far region (see Fig. 1, which depicts eight regions in total). Every such region carries a specific social meaning depending on the type of interaction and on certain properties like gender, age, cultural background, and social status of the interactants. According to Hall, the intimate region is reserved for lovers and close friends. In this region, people touch and hug each other, whereas communication via other modalities (especially speech) is unusual. The far intimate region affords verbal communication in whispering mode. Visual perception in the intimate regions is distorted and people normally feel uncomfortable if someone intrudes their intimate region without permission. People in public who stand in near personal region are perceived as a social unit and thus signal their intimate relationship, or their withness as Goffman [6] puts it. In some cultures, people tend to interpret intrusion of strangers into the near personal region as severe violation. The far personal region keeps people at arm length. In this region, dialogs with friends take place.
Human Being
Near Intimate Region Far Intimate Region Near Personal Region Far Personal Region Near Social Region Far Social Region Near Public Region Far Public Region
Fig. 1. The eight regions of Hall’s Personal Space
The near social region is used for conversation in public and to nonfriends. People jointly working together position themselves in near social region. The far social region usually is free for intrusion by other people without annoying the claimant of the personal space. People positioned in the far social region are perceived as not belonging together. People in the public region are normally ignored, but there are numerous formal settings, in which interaction in the public region takes place [11]: lectures to students, a speech, a concert, or a performance in a theater. The voice then must be raised, and the tempo and phrasing must be adapted. In Hall’s conception, the public region is not limited in its extent. For our purpose, we think of the
286
F. Lindner and C. Eschenbach
public region to be a limited region just like the other personal space regions. Thus, a personal space can be contained in a room or a building. Personal space is relevant to social robots, because robots should place itselves appropriately with regard to the current task [22] and the social relationship between the robot and the human (i.e., the roles played in a given interactional context). A talking robot should place itself in a region appropriate for talking, a robot passing by should not enter regions that are too intimate. 2.2
Activity Space
A notion of social spaces that are constituted via action is introduced by Ostermann and Timpf [16] under the term activity footprints. Activity footprints are used for the analysis of space appropriation in public parks: The activity of playing football constitutes the space needed for playing football, the activity of running constitutes the space needed for running, etc. The authors further discuss relations of compatibility between different types of activity footprints, i.e., whether it is adequate to have different types of activity footprints overlap (e.g., kicking around in the park is antagonistic to reading a book, thus, overlaps of footprints of these kinds yield social conflicts). Activity space resembles the concept of activity footprints, however, not every aspect of an activity footprint counts as an activity space, as discussed in subsequent sections. Moreover, activity spaces are formalized on a qualitative level (rather than quantitavely as in the case of activity footprints) and they contain representations of the respective space-constituting situations. Another type of activity spaces has been described by Kendon [9]. Kendon’s model of so called F-Formation results from studies conducted in human-human interactional settings with particular focus on the spatial patterns people produce while having conversations. F-Formations consist of three sub-regions: ospace, p-space, and r-space (see Fig. 2). The o-space is the region, in which the interaction actually takes place (subsequently denoted as transactional region), the p-space is the region where the interactants are located in (agent region), and the r-space separating the setting from the environment (buffer region). In our conception, every activity space yields a corresponding spatial structure. Agents Buffer Region (r) Agent Region (p) Transactional Region (o) Fig. 2. Kendon’s F-Formation of the Face-to-Face type
A robot should be aware of activity spaces, so that robot placement and motion does not conflict with activity spaces produced by other social agents.
Towards a Formalization of Social Spaces for Socially Aware Robots
2.3
287
Affordance Space
The affordance space is related to the concept of affordances as potential activities the environment provides to agents [5]. Galton [4] investigates and formalizes affordances with respect to space. He thereby explicitly notes the modal characteristics of affordances: A situation affords an action to an agent if and only if it is possible for the agent to perform the action in this situation. In the context of robotics, Raubal and Moratz [19] introduce the notion of social-institutional affordances. Although it might be physically possible for an agent to perform an action in a particular situation, it might be socially inacceptable. Notably, the representations of affordances introduced in [19] support reasoning about affordances of different individual agents. This is particularly relevant in HRI, because in HRI, agents with very different capabilities (humans and robots) have to coordinate action. In analogy to the notion of affordances as potential activities, affordance spaces are potential activity spaces. Consequently, affordance spaces are composed of a potential agent region, a potential transactional region, and a potential buffer region. A notion of potential activity spaces has been introduced by Ostermann and Timpf [16]. However, the internal structure of affordance spaces has not yet been discussed. In addition, it lacks a formal analysis of the implications affordance spaces have for acting and interacting social agents. For the pupose of planning social behavior, the conceptual distinction of activity spaces from affordance spaces is crucial. For instance, crossing an affordance space is generally unproblematic as compared to crossing an activity space. To give an example, consider the affordance space constituted by the pressability affordance of a light switch. The spatial extension of this space depends on the abilities and the body shape of the agent. For a human with normal stature and motor abilities, the affordance space is as big as needed for the person to stand in front of the switch and to reach for it with her arm. For a robot without any means to press a light switch, consequently, there is no such affordance space. But even though a light switch might not afford pressing to a specific robot, to act socially aware, the robot should know that to other agents the light switch does afford pressing. Thus, the robot should not block affordance spaces for a longer period of time, for instance, by parking there or by placing objects in it. 2.4
Territory
In biology, the term territory denotes an area claimed by an animal or by a group of animals defending it against competitors. Most territorial animals mark the borders of the territory so that competing animals perceive the area as being already occupied by some other individual. In social sciences, the concept of territory is adapted to explain spatial behavior of humans owning, claiming, and defending space (e.g., [6,11]).
288
F. Lindner and C. Eschenbach
Territories can be as big as a country or as small as a cup. Lawson [11] distinguishes national territory, city territory, and familiy territory. There is always a notion of possession and exclusive usage connected to a territory. In many occasions, signs tell people that they are allowed or not allowed to enter a particular territory (e.g., “authorized personnel only”). Furthermore, territories can be prohibited for some species (e.g., “dogs have to stay outside”) and behavior might be restricted for those that are located inside a territory (e.g., “no smoking”). Goffman [6] distinguishes three kinds of markers for territories: Boundary markers explicitly mark the territorial boundary (e.g., the armrest of a chair, the walls of a room, or the bar used in checkout counters to separate the articles of one customer from the next), central markers are located in the center region of a territory (e.g., a towel on the beach signaling occupancy), and ear markers are labels that signal the possession of an object or of a portion of space (e.g., a door-bell nameplate). Territories call for different requirements on the robot’s behavior (e.g., territories in a person’s home vs. territories in a hospital vs. territories in a supermarket). Unauthorized intrusion of territories usually leads to social violations. 2.5
Penetrated Space
There are circumstances, in which side effects of activities, such as noise, odor, dust, or light, make activities appear socially inadequate. The activity footprints by Ostermann and Timpf [16] consider this aspect: The activity footprint constituted by the activity of barbecuing extends to a wide area as fume spreads to a far distance away from the actual location of the barbecue, depending from where the wind blows. Keeping with this example, in the terminology developed in this article, we distinguish the activity space of barbecuing (i.e., the space actually needed for barbecuing) from the penetrated space (i.e., olfactorypenetrated space) of barbecuing. The center of a penetrated space is the initiator situation of the penetration (i.e., agents and/or objects producing penetration). Usually, there is no claimant for such a penetrated space (i.e., nobody would claim to be the owner of a pentrated space). Nevertheless, there can be agents being to some extend responsible for its existance. Penetrated spaces usually do not call for special permissions in order for an agent to act in it. This constitutes another argument for the separation of activity space and penetrated space for the purpose of reasoning about behavior, although, of course, most penetrated spaces co-occur with activity spaces. A social robot might have to reason about the adaption of its behavior due to penetrated spaces. For instance, in noise-penetrated spaces, a robot might have to change its location to be recognized by the other interactants [14]. The actions of the robot in many cases produce penetrated spaces of which the robot should be aware, e.g., the robot should not vacuum-clean offices people are just working in.
Towards a Formalization of Social Spaces for Socially Aware Robots
3
289
Formalizing Social Spaces
This section provides formal specifications of the five types of social spaces regarding their constitution by agents and activities and their spatial structure, which is described using mereotopological concepts. 3.1
Spatial Framework
Our specifications make use of relations provided by several mereotopological frameworks such as the Region Connection Calculus (RCC) [18]. The following mereotopological relations that can hold between regions of arbitrary shape are needed: P(r, r ) TPPr, r ) NTPP(r, r ) EC(r, r ) O(r, r )
r is part of r r is a tangential proper part of r r is a nontangential proper part of r r is externally connected to r r overlaps r
As social spaces constitute dynamic spatial structures that may move relative to physical space, the following mereotopological specifications of social spaces can also be understood in the sense of Donnelly’s theory of relative places [3]. Taking this view, social space regions form location complexes that maintain an invariant internal topological structure but can coincide with different locations at different times. Apart from the relations mentioned, the function sum(r, r ) is needed to refer to the region that is the sum of r and r [18]. In addition to that, the relation surrounds (SR) is defined as a special case of external connectedness (D1). A region r surrounds a region r iff r is externally connected to r and every region r externally connected to r overlaps r. (D1)
SR(r, r ) ≡def EC(r, r ) ∧ ∀r [EC(r , r ) ⊃ O(r , r)]
To state that the part relation holds between an agent or an object and a region, we write PO . This notation is an abbreviation for stating that the region occupied by an agent (first argument) is part of another region (second argument), cf. [4] for a similar treatment. Both personal spaces and penetrated spaces exhibit a gradual structure with a center of high intimacy or intensity, fading towards the periphery. For this reason, we adapt a qualitative approach to modeling graded structures based on bundles of regions suggested by Kulik and colleagues [10]. We demonstrate in Sect. 4 that the specification of permissions regarding territories and activity spaces is supported by region bundles as well. As far as is neccessary to understand the gradual model, we replicate here the axiomatization of region bundles proposed in [10].
290
F. Lindner and C. Eschenbach
A region bundle is constituted by one or more regions. The relation Contains relates a region bundle to each constituting region. The extensionality assumption (ARB1) states that different region bundles contain different regions (SI1 in [10]). (ARB1)
∀b, b [[RegionBundle(b) ∧ RegionBundle(b )] ⊃ [∀r [Contains(b, r) ≡ Contains(b, r)] ⊃ b = b ]
A region bundle b defines a reflexive and transitive relation b of centrality for its regions (D2). Region r is at least as central as region r relative to bundle b (r b r ) iff r and r both are contained in b and r is part of r (D4 in [10]). (D2)
r b r ≡def Contains(b, r) ∧ Contains(b, r ) ∧ P(r, r )
Axiom (ARB2) states that this order is total for the bundle regions, i.e., for every two bundle regions r and r , one is at least as central as the other (SB2 in [10]). (ARB2)
∀r, r , b [Contains(b, r) ∧ Contains(b, r ) ⊃ [(r b r ) ∨ (r b r)]]
Based on a region bundle, arbitrary regions can be compared with respect to the gradient structure defined by the region bundle. One possibility (taken here) is to compare two regions with respect to the most central bundle region they overlap. An ordering for arbitrary regions can therefore be defined as (D3). A region r is at least as central as a region r with regard to a region bundle b iff every bundle region r that overlaps r also overlaps r (D7 in [10]). (D4) and (D5) introduce additional notions for the maximal symmetric and asymmetric subrelation of this partial order. (D3)
r ≥b r ≡def ∀r [Contains(b, r ) ⊃ [O(r , r ) ⊃ O(r, r )]
(D4) (D5)
r =b r ≡def [(r >b r ) ∧ (r >b r)] r >b r ≡def [(r >b r ) ∧ ¬(r >b r)]
As a consequence (T1), any two regions r and r can be compared with respect to every region bundle b (T5 in [10]). (T1)
∀r, r , b [(r ≥b r ) ∨ (r ≥b r)]
A notion of betweenness of regions with regard to a region bundle is defined by (D6): A region r is said to be located between two regions r and r w.r.t. a region bundle b iff r is more central than r and r is more central than r (or the same but r and r change roles). (D6)
Btw(b, r , r, r ) ≡def [[(r >b r) ∧ (r >b r )] ∨ [(r >b r) ∧ (r >b r )]]
To give an example, in Fig. 3 there are several regions located relatively to a region bundle b consisting of three ellipsoid regions. The regions R1 to R5 are located within this gradient structure. With the relations defined above it holds that R4 >b R2 =b R3 >b R1 >b R5, and Btw(b, R5, R3, R4). The formalization of region bundles does not restrict the number of regions defining a region bundle. Region bundles can contain an infinite or a finite number of regions. Even one or two regions can be sufficient for a region bundle.
Towards a Formalization of Social Spaces for Socially Aware Robots
291
R1 R2 R3 R4
R5
Fig. 3. Regions located relatively to a region bundle
3.2
Modeling Penetrated Space
A penetrated space sp consists of a physical situation p producing it and a group of agents1 ag that is responsible for p (APS1). (APS1) ∀sp [PenetratedSpace(sp) ⊃ ∃p, ag [PhysicalSituation(p) ∧ Constitutes(p, sp)∧ AgentGroup(ag) ∧ hasResponsibleAgents(p, ag)]] Usually, the physical situation produces a penetration of space such that the intensity of penetration is highest in the vicinity of the space-initiating situation and diffuses along the way to the periphery. This gradient of different degrees of penetration is modeled by associating a bundle of regions b (APS2). (APS2) ∀sp [PenetratedSpace(sp) ⊃ ∃b [RegionBundle(b)∧hasGradient(sp, b)]] The shape of the regions is a consequence of the underlying physical process that produce the penetrated space. It is not restricted by the model in any way. 3.3
Modeling Personal Space
At the social level, Hall’s personal space is represented as a social space having a constituting human. Every personal space sp is constituted by a human h and every human constitutes a personal space: (AHPS1) (AHPS2)
∀sp [PersonalSpace(sp) ⊃ ∃h [Human(h) ∧ Constitutes(h, sp)]] ∀h [Human(h) ⊃ ∃s [PersonalSpace(sp) ∧ Constitutes(h, sp)]]
On the spatial dimension, personal space constitutes a region bundle representing the intimacy gradient which underlies the personal space regions discussed in Sect. 2.1. Within this region bundle, the elementary ring-like regions of a personal space (intimate region, personal region, etc.) can be embedded, such that the intimacy relation between these regions matches Hall’s model of personal space (see Fig. 1). Depending on the level of granularity, there are four or eight such elementary personal space regions. For the sake of simplicity but without loss of generality, we 1
A group of agents may consist of just one agent.
292
F. Lindner and C. Eschenbach
consider the four-region version. Hence, a personal space constitutes an intimate region, a personal region, a social region, and a public region (AHPS3). (AHPS3)
∀sp [PersonalSpace(sp) ⊃ [∃r [IntimateRegion(sp, r)] ∧ ∃r [PersonalRegion(sp, r)] ∧ ∃r [SocialRegion(sp, r)] ∧ ∃r [PublicRegion(sp, r)]]
The intimacy gradient of the personal space corresponds to a region bundle related to the personal space (AHPS4). The intimacy gradient conforms to the specification of the personal space regions as the intimate region and the regions derived by successively accumulating the other elementary personal space regions all are regions that constitute the region bundle (AHPS5). Consequently, the intimate region is more intimate than the personal region, which is more intimate than the social region, which is more intimate than the public region (T2). (AHPS4)
∀sp [PersonalSpace(sp) ⊃ ∃b [RegionBundle(b) ∧ hasIntimacyGradient(sp, b)]]
(AHPS5)
∀sp, b, r, r , r , r [[PersonalSpace(sp) ∧ hasIntimacyGradient(sp, b) ∧ IntimateRegion(sp, r) ∧ PersonalRegion(sp, r ) ∧ SocialRegion(sp, r ) ∧ PublicRegion(sp, r )] ⊃ [Contains(b, r) ∧ Contains(b, sum(r, r )) ∧ Contains(b, sum(sum(r, r ), r )) ∧ Contains(b, sum(sum(sum(r, r ), r ), r ))]]
(T2)
∀sp, b, r, r , r , r [[PersonalSpace(sp) ∧ hasIntimacyGradient(sp, b) ∧ IntimateRegion(sp, r) ∧ PersonalRegion(sp, r ) ∧ SocialRegion(sp, r ) ∧ PublicRegion(sp, r )] ⊃ [(r >b r ) ∧ (r >b r ) ∧ (r >b r )]]
In the following, the region denoted by “≥ personal” is the region that is the sum of the intimate and the personal region, i.e., the maximal region that is at least as intimate as the personal region. Similarly, the sum of the social region and the public region “≤ social” is the maximal region that is at most as intimate as the social region. The elementary personal space regions together with the six sum regions depicted in Fig. 4 will be referred to as personal space regions. Finally, it can be stated that the human h constituting the personal space sp is always located in the intimate region: (AHPS6)
∀h, sp, r [[Human(h) ∧ PersonalSpace(sp) ∧ Constitutes(h, sp) ∧ IntimateRegion(sp, r)] ⊃ PO (h, r)]
Towards a Formalization of Social Spaces for Socially Aware Robots
293
≥ public TPP
NTPP ≥ social
≤ personal
TPP
TPP
NTPP
SR
≤ personal ≥ social
SR ≥ personal TPP
NTPP intimate
SR
personal
TPP
TPP ≤ social
TPP
TPP
social
TPP public
Fig. 4. The topology of personal space
3.4
Modeling Activity Space and Affordance Space
Activity spaces are constituted by activities performed by groups of agents (AACS1). At the topological level, activity space is characterized by three designated regions: the agent region, the transactional region, and the buffer region (AACS2). The group of agents performing an activity constituting an activity space is located in the agent region that belongs to that activity space (AACS3). (AACS1) ∀sp [ActivitySpace(sp) ⊃ ∃ag, ac [Activity(ac) ∧ AgentGroup(ag) ∧ Constitutes(ac, sp) ∧ performs(ag, ac)]] (AACS2) ∀sp [ActivitySpace(sp) ⊃ [∃r [AgentRegion(sp, r)] ∧ ∃r [TransactionalRegion(sp, r)] ∧ ∃r [BufferRegion(sp, r)]]] (AACS3) ∀sp, ag, ac, r [[ActivitySpace(sp) ∧ AgentGroup(ag) ∧ Activity(ac) ∧ performs(ag, ac) ∧ Constitutes(ac, sp) ∧ AgentRegion(sp, r)] ⊃ PO (ag, r)] The model of affordance spaces is quite similar. This is straightforward, because affordance spaces are potential activity spaces. Affordance spaces are constituted by affordances2 (AAFS1). In analogy to activity spaces, an affordance space has a potential agent region, a potential transactional region, and a potential buffer region (AAFS2). 2
As this discussion focusses on the spatial effect of affordances, we remain silent as to what an affordance is.
294
F. Lindner and C. Eschenbach
(AAFS1)
∀sp [AffordanceSpace(sp) ⊃ ∃af [Affordance(af ) ∧ Constitutes(af , sp)]]
(AAFS2)
∀sp [AffordanceSpace(sp) ⊃ [∃r [PotentialAgentRegion(sp, r)] ∧ ∃r [PotentialTransactionalRegion(sp, r)] ∧ ∃r [PotentialBufferRegion(sp, r)]]]
Figure 5(a) depicts the topology of activity space regions and of affordance spaces regions. The region labeled A is the (potential) agent region, the region labeled T is the (potential) transactional region, and the region labled B is the (potential) buffer region. The sum of the regions A and T establish the (potential) core region AT. ATB is the whole activity/affordance space region being the sum of AT and B. The model so far is very generic. More detailed topological descriptions spelling out subregions of the various activity or affordance spaces, as well as the geometric shape, depend on the type of activity or affordance generating it. 3.5
Modeling Territory
Territories are constituted by claims asserted by groups of agents (AT1). Topologically, territories consist of a center region and a margin region (AT2). (AT1)
∀sp [Territory(sp) ⊃ ∃c, ag [Claim(c) ∧ Constitutes(c, sp) ∧
(AT2)
AgentGroup(ag) ∧ hasClaimant(c, ag)]] ∀sp [Territory(sp) ⊃ [∃r [CenterRegion(sp, r)] ∧ ∃r [MarginRegion(sp, r)]]]
The topological structure of territories is depicted in Fig. 5(b). The three regions identified are referred to as the territory space regions. The topological structure of territories can be found as a building block in the other topologies of the previously mentioned social spaces (i.e., an inner region
ATB Territory Region
NTPP AT P A
P T (a)
TPP
TPP
NTPP
SR B
Center
SR
Margin
(b)
Fig. 5. The topological specifications of both activity spaces and of affordance spaces (a), and of territories (b)
Towards a Formalization of Social Spaces for Socially Aware Robots
295
surrounded by another region). However, the social role of the regions depends on the type of social space. Because no agent is required to be present for a territory to be constituted, no general statement about the topological situation with regard to the location of agents can be made. But the claim asserted by the agent group can be made explicit by markers: Boundary markers are located in the margin region, whereas central markers are located in the center region; ear markers can be located in the sum of these regions (cf. Sect. 2.4). However, markers are a time-variant feature, as location and type of markers can change during a territory’s life span.
4
Reasoning about Permissions in Social Spaces
The course of actions of a social agent is constrained by the presence of social spaces. For instance, a social agent needs authorization to enter a territory and it should ask for permission to cross the more intimate regions of a personal space. Reasoning about normative behavior in social spaces requires knowledge about the social adequacy of actions, i.e., knowing which kind of behavior is permissible and how to acquire permissions if necessary. Our goal is to specify an agent that has the capability to reason about appropriateness of actions. Therefore, an axiomatization is introduced as a basis for reasoning about permissions with regard to the performance of actions in social spaces. This axiomatization entails the social rules by which we want a socially aware robot to think and act. In the following, the situation calculus with Reiter’s solution to the frame problem [20] is used for specifying actions that affect permissions. A basic action theory in the situation calculus contains, among others, a set of first-order sentences that describe the initial situation S0 , and a set of successor state axioms describing the effects of actions. A binary function do(a, s) denotes the successor situation to s that results from performing the action a in s. The scenario depicted here is that a social robot perceives a situation and reasons about which regions it is allowed to enter or to park in. Therefore, the spatial actions entering a region r, enter(r), and parking in a region r, park(r), are introduced. Crossing as a third type of spatial action occurs if the agent enters a region and enters another region subsequently without having parked. Thus, crossing is considered a composite action. To acquire a permission to enter or to park in a social region, two signalling actions are specified: signal(enter(r)) and signal(park(r)). The possibility to gain permission for actions in regions demonstrates that social spaces are, to a planner, more than just obstacles that are to be avoided. Instead, social restrictions ought not to be violated but they could, and if so, the robot should signal this violation appropriately to appear more transparent and socially aware to the human interactants. Humans use to act in a similar way, for instance, if they duck down and speed up when they walk in front of the screen in the cinema. In the remainder of this section, we specify requirements a simplified domain theory for socially aware path planning should fulfill and provide corresponding successor state axioms.
296
4.1
F. Lindner and C. Eschenbach
Preliminaries
The model is based on some simplifying assumptions. First, it is always permissible to signal any action (APSig). We add this assumption to the axioms of our action theory, as nothing more needs to be said about the permissibility of signalling actions. The other two assumptions are formulated as requirements and will be derivable as theorems of the following specification of spatial actions. First, we require that permissions that hold in some situation s cannot be deprived in a subsequent situation do(a , s) (RP1). Second, if the robot appropriately signals its intention to perform an action a , then it acquires the permission to do so (RP2). (APSig)
∀s, a [Permissible(signal(a), s)]
(RP1) (RP2)
∀s, a, a [Permissible(a, s) ⊃ Permissible(a, do(a , s))] ∀s, a [Permissible(a, do(signal(a), s))]
The last assumption simplifies social actions of permission acquisition enormously. Signal actions are placeholders for action sequences that acquire permissions, such as dialogs. In realistic settings, attempts to gain permissions by dialogs can also fail. However, dialogs are not easily modeled in the situation calculus and they are not in the scope of this paper. Another strategy for a robot might be to wait until situations change. For instance, if an activity space constituted by people having a chat in a hallway blocks the path, a robot might also decide to wait until the chat is over. Since the change of social spaces over time is not yet modeled, this kind of reasoning is currently out of scope, as well. Thus, another important assumption of our model is that the spatial layout is stable, i.e., we take a snapshot view. While the robot perceives, reasons, and acts, no changes occur in the environment. 4.2
Gradual Structure for Permissions
There are two ways to propagate action permissions. The characterization of personal spaces by Hall already expresses that the gradual structure of the personal space regions fits a gradual structure regarding the permissions to enter these regions. Correspondingly, the intimacy gradient is exploited in the following regarding the transfer of permissions between those regions (AHPS7). When the agent holds the permission to enter a personal space region it is also allowed to enter the regions that are between (regarding the intimacy gradient) that region and its current location. For instance, if the robot is located in the public region and has the permission to enter the personal region, then it also has the permission to enter the social region, but not necessarily the intimate region. However, if the robot is located in the intimate region of a personal space and has the permission to enter the social region, then it also has the permission to enter the personal region, but not necessarily the public region. Both directions are relevant for social behavior: On the one hand, the robot should not violate personal spaces. On the other hand, it could also be the case
Towards a Formalization of Social Spaces for Socially Aware Robots
297
that the robot has the obligation to stay near to its owner and thus not to move beyond a certain personal space region, e.g., to preserve the withness (see Sect. 2.1). Using betweenness, requirement (RP3) covers both cases: If a robot located in the social space region rloc of a social space sp has the permission to enter a region r of sp, then the robot is also allowed to enter all social space regions r in sp which are between rloc and r.3 (AHPS7) ∀sp [PersonalSpace(sp) ⊃ (RP3)
∀b [hasPermissionGradient(sp, b) ≡ hasIntimacyGradient(sp, b)]] ∀sp, b, r, rloc , s [[SocialSpace(sp) ∧ SocialSpaceRegion(sp, r) ∧ hasPermissionGradient(sp, b) ∧ Permissible(enter(r), s) ∧ CurrentLocation(sp, rloc , s)] ⊃ ∀r [[SocialSpaceRegion(sp, r ) ∧ Btw(b, r, r , rloc )] ⊃ Permissible(enter(r ), s)]]
A similar gradual structure can be found regarding activity spaces and territories. In these cases, there is a core region and a boundary region, such that the permission to enter the core implies the permission to enter the boundary. Correspondingly, we associate a gradual permission structure to activity spaces and territories. We provide axioms (AT3) and (AT4) for the case of territories (one can impose corresponding axioms for activity spaces). (AT4) states the correspondence between the social regions and the permission gradient for the case of territories, i.e., the permission gradient contains the center region and the sum of the center region and the margin region. Specific activity space types and territory types might make it necessary to define finer-grained permission gradients. Therefore, axioms (AT3&4) are open for additional regions to be contained in the respective region bundles. (AT3)
∀sp [[Territory(sp)] ⊃ ∃b [RegionBundle(b) ∧ hasPermissionGradient(sp, b)]]
(AT4)
∀sp, b, r, r [[Territory(sp) ∧ hasPermissionGradient(sp, b) ∧ CenterRegion(sp, r) ∧ MarginRegion(sp, r )] ⊃ Contains(b, r) ∧ Contains(b, sum(r, r )]
The second type of permission propagation exploits the fact that actions can presuppose other actions, e.g., to park in a region presupposes entering that region. Therefore, we assume that the permission to park in a region generally entails the permission to enter that region (RP4). As a consequence, if our robot intends to enter a region in order to park there, it is sufficient to signal parking. 3
The current location of the robot is defined relative to location-complexes of social spaces. For instance, the robot is located in the social region of a particular personal space and in the agents region of a particular activity space at the same time iff at that time, both social space regions partially coincide (cf. [3]) with the robot.
298
F. Lindner and C. Eschenbach
(RP4)
∀sp, r, s [[SocialSpace(sp) ∧ SocialSpaceRegion(sp, r) ∧ Permissible(park(r), s] ⊃ Permissible(enter(r), s)]
The formulae labeled (RP3) and (RP4) express requirements that should be met by the specification of the initial situation S0 and situations derived from S0 by any sequence of actions. As a complete specification of S0 requires a detailed modelling of the social spaces present and the permissions initially granted, we restrict the further discussion to formulating succcessor state axioms that guarantee that permissions are propagated along action sequences respecting (RP3) and (RP4). 4.3
Successor State Axioms
Successor state axioms describe how the world changes due to actions that have been performed. While planning action sequences, the robot simulates how the world might change. Because we assume a static environment, change is limited to the location of the robot and to permissions. As the situation evolves, reasoning about appropriateness of actions in social spaces should be guaranteed. Therefore, the axiomatization should meet two requirements: First, the requirements (RP3&4) are to be preserved over time. Second, the robot should be able to systematically plan signal actions to acquire permissions (currently, only the acquisition but not the withdrawal of permissions is considered). The successor state axiom (APSS1) states that after performing an action a in situation s, the robot has the permission to park in region r iff it had the permission before (according to (RP1)) or if a is signalling its intention to park in r (according to (RP2)). (APSS1)
∀s, sp, r, a [[SocialSpace(sp) ∧ SocialSpaceRegion(sp, r)] ⊃ [Permissible(park(r), do(a, s)) ≡ [Permissible(park(r), s) ∨ a = signal(park(r))]]]
The successor state axiom for entering is more complex, because it must also preserve permission propagation, i.e., that less restrictive actions are allowed to be performed if the permission to perform the more restrictive actions is granted. To meet this requirement, again beetweenness is used. After performing action a in s, the robot being located in region rloc has the permission to enter the social space region r iff one of the three conditions holds: First, the robot already had the permission to perform a in situation s, or, second, the robot signals its intention to enter a social space region r (or park there), or, lastly, the robot signals its intention to enter a social space region r (or park there), which is located such that r is between rloc and r (APSS2). The last condition presupposes that the social space under consideration has a permission gradient.
Towards a Formalization of Social Spaces for Socially Aware Robots
(APSS2)
299
∀s, sp, r, a [[SocialSpace(sp) ∧ SocialSpaceRegion(sp, r)] ⊃ [Permissible(enter(r), do(a, s)) ≡ [Permissible(enter(r), s) ∨ [a = signal(enter(r)) ∨ a = signal(park(r))] ∨ ∃b, r , rloc [hasPermissionGradient(sp, b)] ∧ SocialSpaceRegion(sp, r ) ∧ CurrentLocation(sp, rloc , s) ∧ [a = signal(enter(r )) ∨ a = signal(park(r ))] ∧ Btw(b, rloc , r, r )]]]
4.4
A Simple Golog Planner
The situation-calculus-based model developed so far builds the foundation for generating socially acceptable action sequences that can be executed by an artificial agent. An implementation of a simple action planner is provided in Listing 1.1. It is written in Golog [12], which is a programming language that directly uses situation calculus domain theories. The program in Listing 1.1 generates courses of actions leading the robot from its current location to a destination without ruthlessly violating social restrictions. An action sequence can be described as a sequence of transitions between self-connected regions [18] having a homogeneous social meaning enriched with signalling actions. Definition (D7) makes clear what is meant by the notion of regions having a homogeneous social meaning: A region r is called socially homogeneous (SH) iff every region r that is part of r is overlapped by exactly the same social space regions as r.4 (D7)
SH(r) ≡def ∀r [P(r , r) ⊃ ∀r [SocialSpaceRegion(r ) ⊃ [O(r , r ) ≡ O(r , r)]]]
Before an action is added to the plan (i.e., entering or parking), it is checked whether the robot has the permissions to perform the action in the target region under consideration. As the target region could be overlapped by arbitrarily many social spaces, the planner has to determine whether it is permissible to perform the intended action with respect to all overlapping social space regions. Therefore, if there is a social space region overlapping the target region in which the action at hand is not permissible, the planner selects an adaquate signal (due to our simplifying assumtions, every signal action results in the acquisition of the respective permission). Selecting a proper signal is nontrivial. For instance, being located in the public region of a personal space and intending to park in the social region, it could 4
Because we take a snapshot view throughout this article, we use the timeindependend relation O. If considering that location-complexes of social spaces can move, we suggest to substitute O with Donnelly’s relation PCOIN (cf. [3]).
300
F. Lindner and C. Eschenbach
signal the park action with respect to the social region, but also with respect to the personal region, or with respect to the intimate region. It is also possible to ask for parking directly or, alternatively, to ask for entering first and then to ask for parking afterwards. After all permissions have been successfully acquired, the intended action is added to the plan. proc ( s i m p l e S o c i a l l y A w a r e P a t h P l a n n e r ( r ) , i f ( Cur r e ntReg io n ( r ) , a c q u i r e P e r m i s s i o n A nd P e r f o r m A c t io n ( park , r ) , /∗ e l s e ∗/ pi ( r ’ , ? ( Cur r e ntReg io n ( r ’ ) ) : pi ( r ’ ’ , ? (EC( r ’ , r ’ ’ ) & SH( r ’ ’ ) ) : a c q u i r e P e r m i s s i o nA n d P e r f o r m A ct i o n ( e n t e r , r ’ ’ ) : simpleSociallyAwarePathPlanner ( r ) ) ) ) ) proc ( a c q u i r e P e r m i s s i o n A n d P e r f o r m A c t i o n ( actionType , r ) , i f ( o v e r l a p p i n g S o c i a l R e g i o n I m p e r m i s s i b l e ( actionType , r ) , pi ( r ’ , ? ( S o c i a l S p a c e R e g i o n ( r ’ ) & O( r , r ’ ) & i m p e r m i s s i b l e ( actionType , r ’ ) ) : s e l e c t A n d P e r f o r m S i g n a l ( actionType , r ’ ) : a c q u i r e P e r m i s s i o n A n d P e r f o r m A c t i o n ( actionType , r ) ), /∗ e l s e ∗/ per fo r m ( actionType , r ) ) ) Listing 1.1. Simple Golog Planner
To give an example, we consider a museum-tour guide robot that has the task to provide information to visitors. Figure 6 depicts an example situation such a robot could encounter. First, there is a personal space constituted by a visitor viewing the painting. Second, there is an activity space that is constituted by the visitor’s activity of viewing the painting. It is composed of an agents region, in which the visitor is located. The transactional region is spanned according to the field of view between the visitor and the painting. Third, there are two affordance spaces constituted by the affordance of viewing the painting (by other visitors) and the affordance of talking to the visitor, respectively. Buffer regions are skipped for simplicity. By default, it is not permissible to enter activity space regions. So, if the planner generated a path through the transactional region of an activity space region (as in Figure 6), the plan should also contain actions that compensate for this violation. Therefore, a socially acceptable action sequence generated by the the procedure in Listing 1.1 is to cross the affordance space regions of the first affordance space, then to signal the entrance into the transactional region of the activity space, crossing it, and finally move into the goal region.
Towards a Formalization of Social Spaces for Socially Aware Robots
301
Painting r6 r3
r4
r5
r7 r8
r2 r1 Visitor Robot Fig. 6. A social-space aware action sequence
[enter(r2 ); enter(r3 ); . . . ; signal(enter(r6 )); enter(r6 ); enter(r7 ); enter(r8 )]
5
Related Work
The work by Sisbot and colleagues [21] considers two social contraints for planning trajectories: safety and visibility. Their approach is based on a weighted occupancy map. Motivated by Hall’s notion of personal space, grid cells near to a person receive higher weights, so do cells in the back of a person to punish locations with poor visibility. The planning process is specified as an optimization process searching for a trajectory along the grid cells minimizing the weight of the overall trajectory. Our focus is on the constitution of social spaces and on permissions attached to them. This involves the avoidance of personal space intrusion, if it is inadequate, but also to allow entering a personal space, if this is appropriate. Cirillo and colleagues [2] integrate social constraints in a symbolic action planner using temporal logics to avoid socially inacceptable actions. However, they do not discuss the spatial dimension of social behavior and do not consider the concept of social space as a means to constrain actions in human-robot encounters. Pommerening and colleagues [17] integrate a qualitative spatial calculus with Golog to realize spatial coordination of agent-controlled vehicles. The agents are to act according to normative right-of-way rules as written in law code in order to avoid collisions. While the rules considered by Pommerening and colleagues must be followed to derive crash-free behaviors, the rules related to social spaces ought to be followed but might be neglected in cases of urgency or danger. Nevertheless, the communication structure between different components of a simulating system combining Golog and qualitative spatial reasoning proposed by Pommerening and colleagues might be reusable for a similar simulation system of socially aware behavior.
302
6
F. Lindner and C. Eschenbach
Conclusions
The spatial behavior of service robots that are meant to interact with humans should match people’s expectations. We think that the concept of social spaces provides a promising basis for the analysis of people’s expectations, as well as for the modeling and implementation of socially aware robots. On the one hand, robots are faced with social spaces produced by other agents, most notably by humans. On the other hand, they are actively involved in the social space production as they act in a physical environment. The taxonomy of social spaces introduced in this article provides a generic framework upon which a wide range of social spatial situations can be modeled. Based on qualitative representations of social spaces and on knowledge about action permissions, an artificial agent can systematically reason about the social adequacy of spatial actions and about the acquisition of permissions. This can be exploited for the generation of socially adequate courses of actions. Future research will deal with an analysis of the temporal characteristics of social spaces in order to cope with the fact that location-complexes of social spaces can move relative to physical space. We will also explore the geometric embedding of social spaces in concrete situations. We expect a high diversity of factors determining the respective geometries. Finally, deontic ought-to-do reasoning about spatial actions with respect to social spaces will be a matter of deeper investigation. Acknowledgments. We thank our reviewers for their thoughtful and helpful comments. Many important points were highlighted that will shape our ongoing work. We thank Christopher Habel for inspiring discussions on the topic of this article.
References 1. Bartneck, C., Forlizzi, J.: A design-centred framework for social human-robot interaction. In: Proceedings of the RO-MAN 2004, pp. 591–594 (2004) 2. Cirillo, M., Karlsson, L., Saffiotti, A.: A human-aware robot task planner. In: Proceedings of ICAPS 2009, pp. 58–65 (2009) 3. Donnelly, M.: Relative places. Applied Ontology 1, 55–75 (2005) 4. Galton, A.: The formalities of affordance. In: ECAI 2010: Proceedings of the Workshop on Spatio-Temporal Dynamics, pp. 1–6 (2010) 5. Gibson, J.J.: The theory of affordances. In: Shaw, R.E., Bransford, J. (eds.) Perceiving, Acting and Knowing: Toward an Ecological Psychology, pp. 67–82. Erlbaum, Hillsdale (1977) 6. Goffman, E.: Relations in Public – Microstudies of the Public Order. Transaction Publishers, New Brunswick (2010) (originally published in 1971 by Basic Books, New York) 7. Hall, E.T.: The Hidden Dimension, Man’s Use of Space in Public and Private. The Bodley Head, London (1966) 8. Hüttenrauch, H., Severinson-Eklundh, K., Green, A., Topp, E.A.: Investigating spatial relationships in human-robot interaction. In: Proceedings of IROS 2006, pp. 5052–5059 (2006)
Towards a Formalization of Social Spaces for Socially Aware Robots
303
9. Kendon, A.: Conducting Interaction: Patterns of Behavior and Focused Encounters. Cambridge University Press, Cambridge (1990) 10. Kulik, L., Eschenbach, C., Habel, C., Schmidtke, H.R.: A graded approach to directions between extended objects. In: Egenhofer, M.J., Mark, D.M. (eds.) GIScience 2002. LNCS, vol. 2478, pp. 119–131. Springer, Heidelberg (2002) 11. Lawson, B.: The Language of Space. Architectural Press, Oxford (2001) 12. Levesque, H., Reiter, R., Lespérance, Y., Lin, F., Scherl, R.: GOLOG: A logic programming language for dynamic domains. Journal of Logic Programming 31, 59–84 (1997) 13. Löw, M.: Raumsoziologie. Suhrkamp, Frankfurt am Main, Germany (2001) 14. Martinson, E., Brock, D.: Improving human-robot interaction through adaptation to the auditory scene. In: Proceedings of HRI 2007, pp. 113–120 (2007) 15. Mutlu, B., Forlizzi, J.: Robots in organizations: Workflow, social, and environmental factors in human-robot interaction. In: Proceedings of HRI 2008, pp. 287–294 (2008) 16. Ostermann, F., Timpf, S.: Modelling space appropriation in public parks. In: Proceedings of the 10th AGILE International Conference on Geographic Information Science (2007) 17. Pommerening, F., Wölfl, S., Westphal, M.: Right-of-way rules as use case for integrating GOLOG and qualitative reasoning. In: Mertsching, B., Hund, M., Aziz, Z. (eds.) KI 2009. LNCS, vol. 5803, pp. 468–475. Springer, Heidelberg (2009) 18. Randell, D.A., Cui, Z., Cohn, A.G.: A spatial logic based on regions and connections. In: Proceedings of KR 1992, pp. 165–176 (1992) 19. Raubal, M., Moratz, R.: A functional model for affordance-based agents. In: Rome, E., Hertzberg, J., Dorffner, G. (eds.) Towards Affordance-Based Robot Control, pp. 91–105. Springer, Heidelberg (2008) 20. Reiter, R.: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. MIT Press, Cambridge (2001) 21. Sisbot, E.A., Marin-Urias, L.F., Alami, R., Simeon, T.: A human aware mobile robot motion planner. IEEE Transactions on Robotics 23(5), 874–883 (2007) 22. Walters, M.L., Dautenhahn, K., te Boekhorst, R., Koay, K.L., Syrda, D.S., Nehaniv, C.L.: An empirical framework for human-robot proxemics. In: AISB 2009: Proceedings of the Symposium on New Frontiers in Human-Robot Interaction. pp. 144–149 (2009)
Finite Relativist Geometry Grounded in Perceptual Operations Simon Scheider and Werner Kuhn University of M¨ unster, Institute for Geoinformatics, Weseler Straße 253, D-48151 M¨ unster, Germany [email protected] http://musil.uni-muenster.de/
Abstract. Formal geometry is a fundamental tool for showing how relevant metric qualities, such as depths, lengths, and volumes, as well as location concepts, such as points, can be constructed from experience. The ontological challenge of information grounding lies in the choice of concepts to consider as primitive, vs. those to be constructed. It also lies in accounting for the relativity and finiteness of experiential space. The grounding approach proposed here constructs geometrical concepts from primitives of the human attentional apparatus for guiding attention and performing perceptual operations. This apparatus enables humans to take attentional steps in their perceived vista environment and to perform geometric comparisons. We account for the relativity of experienced space by constructing locations relative to a reference frame of perceived pointlike features. The paper discusses perceptual operations and the idea of point-like features, and introduces a constructive calculus that reflects the generation of domains of geometric comparison from the perspective of an observer. The calculus is then used to construct a model and to motivate an axiomatization of absolute geometry in a finite relativist flavour. Keywords: constructive Euclidean geometry, relativist geometry, information grounding, operational semantics.
1
Introduction
How should the multitude of spatial concepts underlying spatial data be interpreted in terms of experience? The philosophical grounding problem [12] gains practical relevance if we ask ourselves what kind of observations a certain data set refers to [36,16]. This question has recently led scientists to regard sensors in a wider sense, including human observers, as a means to ground the semantic web [14]. From a practical viewpoint, it remains often unclear how geometrical attributes like widths, heights, depths and directions were (or could have been) practically obtained. A waterbody gives rise to many possible water depths if the underlying reference operations remain hidden [30,37]. Likewise, a given location M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 304–327, 2011. c Springer-Verlag Berlin Heidelberg 2011
Finite Relativist Geometry Grounded in Perceptual Operations
305
can be conceived in many different ways: relative to diverse egocentric or allocentric reference frames [25], as well as in terms of geographic coordinates. Since locations and geometric attributes are among the major categories underlying spatial data, we are interested in the kinds of inter-subjective space experiences they originate from, similar to [23]. Geometry, as all traditional mathematics, evolved from concrete experiences and problems, stated by the Greeks or earlier. In order to achieve greater generality, modern age mathematicians drove their discipline away from these experiences by means of abstraction and domain closure. For example, in arithmetics, the experiential basis of natural numbers in counting soon became extended in order to incorporate infinity, rational, real, and complex numbers. Just as arithmetics abandoned explicit counting operations in order to close the numbers with respect to arithmetic operations, geometry closed its experiential domain of measurement by assuming infinities of points. From the perspective of information grounding, the undisputed merits of domain closure and abstraction make it sometimes difficult to see what the roots of geometric information are. These roots of information play an important role in all kinds of semantic reference systems [16]. In particular, spatial reference systems are established by geodesists in terms of observed directions, angles and lengths related to physically real, not abstract, phenomena. Thus, the kind of geometry performed by a geodesist is different from mathematical geometry in that it is constructive and finite instead of abstract and infinite1 . This remains true even if calculations are performed on discrete approximations of real number fields, as they are in computers. Axiomatic geometries commonly begin with abstractions but do not account for how they are obtained. They populate their universe of discourse with abstract points, spheres, lines and planes, even though an infinitely small point is a mental fiction [48]. Furthermore, from a practical viewpoint, there is no way of determining an absolute point in space and time2 . This relativity of space was recognized already by Leibniz [18], but manifests itself also in spatial cognition research. Different egocentric and allocentric frames of reference serve to construct different levels of space apprehension: Starting from the space around the body [47], we arrive at navigation space by reconstruction from memory [25]. It seems therefore inappropriate to base a theory about grounding spatial information on the assumption of absolute or abstract space. In this paper, we propose to conceive of geometry in terms of perceptual operations, namely perceptual predications on foci of attention, as first suggested in [37]. Foci of attention are atomic (but finite) moments in which an observer’s 1
2
For similar reasons, Habel [11] has proposed that cognitively adequate temporal reference systems should be finite with a so-called density in intensio. A similar idea stands behind Aristotle’s notion of potential infinities [1]. In our view, potentiality can only mean that repeatable operations are available, following [20]. Even if we use a spatial reference system, this system is logically anchored in (and therefore presupposes the identity of) concrete places. Such an anchor place is a necessary part of a geodetic datum for a mathematical ellipsoid representing the earth surface.
306
S. Scheider and W. Kuhn
attention is focused on some spot in his vista environment. Our predications are inter-subjectively available operations for comparing foci3 . The current paper adds to this idea by accounting for relativism and constructive finitism of experiential geometry (Sect. 2). We provide perceptual justification for our choice of primitives and argue that the identity of locations needs to be constructed based on length and direction comparisons taken with reference to some anchor frame consisting of point-like features, such as a particular end of a rod (Sect. 3). We then introduce an operational calculus that allows us to generate an appropriate finite model. Its rules can be used to motivate axioms of finite relativist geometry.
2
Constructive and Axiomatic Geometries
What kinds of mathematical entities should be presumed in order to account for experiential geometry? Axiomatic geometry is not a one-way street. Since Euclid’s elements, a number of formalizations of Euclidean geometry have been proposed, with different assumptions about admitted objects and relations. For example, Hilbert [13] presupposed lines and points, whereas Tarski presumed only points and two kinds of relations on them [44]. Pointless geometries [43,7], on the contrary, presume a mereology of solids or spheres in order to define regions [3]. The apparent flexibility of taking concepts as primitive seems to be an inevitable characteristic of mental fictions [48] and logical reifications [34]. From a grounding perspective, however, the choice of primitives needs to be driven not by mathematical elegance in the first place, but by human perceptual competence. Similar to Greek mathematics, Suppes’ proposal [42] uses constructive finite formalisms in order to deal with applied problems. In his formalism, geometric figures are explicitly constructed by a finite series of steps from a small point basis. The operations he proposes are doubling and bisecting of line segments, which allow, for example, to construct parallelograms (Fig. 1). Note that this grounding approach differs from finite geometry in mathematics [21]. The interest is not in finite models for axiomatizations (such as affine plane figures of finite order), but one describes how geometric figures and properties (of a Euclidean flavour) can be constructed in finite sequences. This is not feasible in standard formalizations of Euclidean Geometry, since these require infinite models due to some of their axioms. Infinity axioms4 take the form of universal-existential sentences, i.e. , ∀x1 , ... .∃y1 ... .Φ(x1 , ..., y1 , ...). They allow to express recursions of existence 3 4
The theoretical basis of this idea is developed at some length in [36]. By this notion, we vaguely refer to the axiomatic causes of infinity in a theory. These may resemble the axiom of infinity of ZF-set theory, which enforces a set containing successors for all its elements. We are yet unsure how to make our notion more precise, since universal-existential form is only a necessary criterion. However, we give examples for infinity axioms in the following.
Finite Relativist Geometry Grounded in Perceptual Operations
307
claims, and thus to populate the domain of interpretation infinitely. Such axioms abound in geometry and enforce their models to be infinite. If we take Tarski’s elementary axiomatization of Euclidean Geometry [44], then we find 4 of the 13 axioms to be of such a form (or translatable into such a form, see [45]). For example, the axiom schema of continuity (Axiom 13) requires a boundary point for every two predicates that divide a ray into two halves. This is essentially the idea of a Dedekind cut, and thus requires the continuum of cardinality 2ℵ0 .
Fig. 1. Suppes’ constructive geometry can be used to construct parallelograms. Bisecting line γ0 , β0 yields new point a1 and doubling line α0 , a1 yields a2 . Compare [42].
Fig. 2. Tarski’s Axiom of Segment Construction. Compare [45].
But even if we dispense with Axiom 13, it is provable that models still need to be isomorphic to vector spaces over ordered fields [44], and these are infinite on the cardinality level ℵ0 . Reasons for this are the three remaining infinity axioms of Pasch, Euclid and the Axiom of Segment Construction. The latter axiom, for example, requires, for any existing pair of points b, c and for any given line (denoted by another pair of points, q, a), the existence of a pair of points a, x on that line which is congruent to b, c (compare Fig. 2). This requires infinity by itself: There is now a new pair of points on a line, for example q, x in Fig. 2. If we apply the axiom again to this pair and the line pair q, a, it requires a new pair a, x∗ congruent to q, x, and so forth. Something equivalent is also enforced by the axiom of Pasch. In a constructive geometry, infinity axioms like the axiom of segment construction need to be replaced by explicit constructions. These can be expressed in first order logic (FOL) by a finite list of existential quantifications that state the existence of any constructed point5 . Another possibility is to describe the underlying operations explicitly, not only their results, in the spirit of Piaget’s logic [27]. This can be done in terms of a constructive calculus [19], such as those used in intuitionistic logic or algebra. Both approaches are desirable and can be combined: In Sect. 4, we will use a constructive calculus in order to motivate a certain FOL axiomatization. 5
This approach was taken in [36], and is based on Quine’s proposal to express existence by the use of the existential quantifier [34].
308
3
S. Scheider and W. Kuhn
Human Attention and Perceptual Operations
What kinds of human cognitive operations can serve as means of geometric construction? In [38] and [36], we have argued that the human attentional apparatus, through which human attention is anchored in pre-conceptual Gestalt mechanisms, can serve as the operational basis for semantic grounding6 . The idea is that humans acquire certain pre-conceptual mechanisms in order to precompute Gestalts [15] in their perceived near-body environment. Gestalts serve as anchors of attention, i.e. they allow referencing, and enable one to predicate the presence of surfaces and other things (perceptual predication) without drawing on conceptual reasoning. The mechanisms may involve conscious parts and may be learned, e.g. in the sense of learning to play tennis: while the performance must be guided and learned consciously, the complex sensory-motor details are internalised. The arguments for this view were recently advanced by Pylyshyn [31] based on empirical findings in object based attention [39]. He argued that without a pre-conceptual reference mechanism, human cognition would end up in a regress cycle of meaningless concepts. That this human attentional apparatus is at the same time the basis for inter-subjectivity of language, was recently argued by Tomasello [46] and is a central idea behind Quine’s observation sentences [33]. According to Langacker [17] and Talmy, guided attention and Gestalt presence account for the meaning of language as such. We refer to the cited literature and to [38] and [36] for a deeper discussion. 3.1
Focusing Human Attention
We agree with von Glaserfeld7 , that there has to be some “pulsing” mechanism that produces discrete mental entities on the very lowest level of conscious perception8 . We state the identity of a moment in which a human being focuses attention on a certain signal from the near-body environment. This signal may be a precomputed Gestalt, i.e., a structure pre-conceptually synthesized from visual, tactile, proprioceptive and other inputs, without the observer being necessarily aware of it. As any phenomenon, a Gestalt can enter consciousness only via attentional moments. The domain of foci of attention is considered as a root for other domains of consciousness, as it is the only one that can be directly coordinated across observers by the mechanism of joint attention [46]. It is also considered to be finite and therefore discrete, because human memory is bounded. A focus of attention may be distinguished from an other because they come at different discrete time pulses. Perceiving time in its simplest form therefore 6 7 8
A similar suggestion was made by Marchetti [22] and called ‘attentional semantics’. Ernst von Glasersfeld developed a ‘pulse’ model for the mental construction of unities, pluralities and number; see, for example, Chapter 9 in [10] or [9]. Although the question of whether conscious perception is discrete or not is still open, there is much psychophysical evidence for its discreteness [49].
Finite Relativist Geometry Grounded in Perceptual Operations
309
means to perceive the temporal order, denoted by ≤T , of foci of attention. The pulsing attention does not have to be focused on another signal. If it is, it produces a flow of conscious experience, which we call perceptual predication. Predication simply means that the human observer detects and stores the presence of Gestalts at one or several foci of attention. Mental operations are then available to construct higher level entities from this material flow of consciousness. 3.2
Identification of Point-Like Features
It is essential to understand that the perception of surfaces plays a central role for many other kinds of perceptual operations that can be performed. In this spirit, Gibson [8] granted surfaces a central position in his ontology of the environment. We argue here for surface-based perceptual predication. Observers identify prominent parts of their environment, such as relative parts of bodies, openings, or the free space in front of them, with respect to some already identified reference surfaces. These were called features in the DOLCE ontology [24]. They have their own criterion of identity, but existentially depend on an identifiable object, which is their “host”. Perceivable features of a cup, for example, are its handle but also its opening. The opening of a cup would not exist without it, but is not a part of the cup. A feature of a building is the opening of its entrance. Further examples for features are the corner of a table or the peak of some mountain. We propose to call these latter examples point-like features, because they give rise to concentric sphere Gestalts that correspond to the mathematical fiction of a point. Features are an important class of perceivable entities on their own, even though they depend on host surfaces. Studies in Gestalt psychology provide evidence for some sort of visual “hidden structure” that may account for this phenomenon. Rudolf Arnheim [2] studied the visual perception of balance, shape and form. He noticed that the perception of balance of black dots drawn into a square (Fig. 3) depends on how they are placed relative to the hidden field of visual tension shown in Fig. 4, which emerges relative to the square. Note that this field is not part of the square drawing. It rather depicts how black dots in the square are “dragged” towards its centers by a field of visual force. Arnheim assumes that this Gestalt mechanism accounts for the apparent human ability to detect whether the black dot is slightly off-center, without consciously comparing directions and lengths. There is also recent evidence for a neurological mechanism underlying the intuitive sense of a location [4]. Burgess and others studied neurons in mammals, e.g. rats, that identify relative allocentric places (called place cells). These cells fire in response to other cells (called boundary vector cells), that detect surfaces at a certain allocentric direction and distance (see Fig. 5). Allocentric means that the firing of all these cells is independent of an egocentric reference frame, but depends on external landmark objects and surfaces [4]. There are even place cells configured in a grid-like manner [4]. Therefore, point-like features of this kind may be called “proto-locations”. They can be considered preliminary stages
310
S. Scheider and W. Kuhn
Fig. 3. A dot placed off-center into a square [2]
Fig. 4. The hidden field of visual force exterted on a dot placed into a square [2]
Fig. 5. Place cell firing (white dots) of a rat tracked in a box (left). Principle of boundary vector cells (right). Adapted from Burgess [4], see text for details.
of spatial reference frames. The space of kinesthetic coordination of our body relies on lots of similar allocentric and egocentric mechanisms ([35], Chapter 3). We denote the predications of arbitrary point-like features by P F (x, y)9 . The relation expresses that in the two moments of attention x and y, an observer focused on the same point-like feature. 3.3
Identification of Directions and Lengths
Since the operations of doubling or bisecting of lines [42] seem too restricted to capture constructive observed geometry in the near-body space, we will not directly use Suppes’ proposal, even though we stick to his approach. Instead, we will develop a constructive modification of Tarski’s axiomatization [44], because it is possible to interpret his primitives [45] in an intuitive way. We proposed in [37] that humans experience the geometrical and topological structures of their environment by performing and comparing attentional steps. An attentional step is the actual movement of attention from focus x to y. Humans perceive length and direction of steps, because they are able to compare steps of equal length and of equal direction. And thereby, we assume, they are able to observe and measure lengths of arbitrary things in their environment. 9
Alternatively, one may want to differentiate different kinds of point-like features. One also may add another parameter for pointing at the reference surface of a feature, see [36].
Finite Relativist Geometry Grounded in Perceptual Operations
311
Fig. 6. Equal length and linear order for steps
We have suggested [36,37] that there are (at least) two Gestalt mechanisms available for geometric predication. One is a mechanism for comparing distances between pairs of foci. It can be conceived as the result of constructing a straight stick or some imagined straight Gestalt and being able to match its ends with two pairs of foci. For example, physically, we may align a stick with some object and move it around to match with some arbitrary foci of attention. We do exactly this when we use a non-collapsible compass. Note that the operation of comparing steps may be different from the one for performing steps10 . The observation predicate xy =L uz (compare Fig. 6) asserts that foci x and y and foci u and z could be matched in this way11 . Another Gestalt mechanism allows for perceiving whether three foci of attention are ordered along a line12 . OnL(x , z , y) means that a focus z is on a line between x and y or co-located with any of them (compare Fig. 6). Note that OnL implies collinearity and betweenness13 . It may be the result of comparing a focus of attention with two others by detecting whether or not it lies on an imagined line through them. 3.4
Identification of Locations
In distinction to common axiomatizations of point geometry, such as [44], the behavior of these observation predicates needs to be described not on the level of their domain and range, i.e., on the level of foci of attention, but with respect to constructed locations. The “points” of an experiential geometry are locations, not foci of attention. They exist only relative to a frame of reference14 and certain comparison operations, as the ones introduced above. This distinction 10
11 12 13 14
We assume here that some operation for performing steps generates foci of attention, while some operation of comparing them generates geometric relations between them. This predicate was called ‘congruence’ by Taski in [44]. Whether these foci were generated in a sequence of steps or not, is considered irrelevant here. This predicate was called ‘betweenness’ by Taski in [44]. We use the term frame here not in the sense of a formal reference system [16], but in the sense of perceivable point-like features one can refer to.
312
S. Scheider and W. Kuhn
is ontologically essential, since a given attentional focus can be used to identify different locations with respect to different frames of reference. For example, if you sit in a train and focus two times on the apex of a table in front of you, then, at both moments, you are focusing on the same point with respect to the table, but on two different points with respect to a frame of reference located outside the train and being at rest relative to the landscape. A reference frame is not only necessary to fix the measurement units and directions of an observed geometry. Together with basic comparison operations discussed above, it actually establishes a geometry with all its points and all its laws in the first place15 . As A.S. Eddington argued, we must recognize “that all our knowledge of space rests on the behaviour of material measuring scales”, and not on some pre-experiential absolute space [6]. To illustrate this argument, suppose you are sitting again in this train with some measuring tape at your disposal. Focusing on the table in front of you, you can jump with your attention from one of its ends x to another y and back to the first one x . You will thereby notice that the length of the table has remained equal, i.e., xy =L yx , and therefore length comparison with reference to this frame and the tape is seemingly symmetric and suited for Euclidean geometry. But what if you choose a reference frame consisting of the table edges and a tree rushing past the window? If you jump with your attention from this tree to the table edge and back, symmetry of length is not preserved. So the choice of the frame of reference influences formal properties of your geometry. Similarly, if you choose to make length comparisons with a rubber band rather than a tape, symmetry properties may be preserved in the second case, but not in the first one (compare also [6]). So it is the choice of reference frame and comparison operations together that constitute an experiential geometry16 . Euclidean-like geometry in perceived space can only be established based on choosing a stable reference frame of 4 reidentifiable points for 3 dimensions. Note that these four points need not only be reidentifiable by a single observer. If the geometry needs to be shared among people, the points also need to be indicated to others. From all we said above, this means they need to be chosen on the basis of shared Gestalts external to the geometrical system. We propose therefore that reference points may be based on point-like features, such as one identifiable corner of a perceived table.
4
An Operational Model of Constructive Relative Geometry
Our goal is to show how observations expressed by the predicates OnL and =L , as well as relative locations, may be constructed by a human observer. Our 15 16
This idea of relative space was proposed already by Leibniz [18] and Poincar´e [28]. But nevertheless, these choices do not yet determine it, as Poincar´e argued [28]. Geometry is likewise affected by Quine’s empirical indeterminacy [32], in the sense that, given a reference frame and comparison operators, there is more than one way of building a geometry.
Finite Relativist Geometry Grounded in Perceptual Operations
313
main argument of Sect. 2 was that in a constructive finite geometry, infinity axioms need to be substituted by explict constructions. But how to describe an explicit construction in a formal way? We argued that one possibility is to describe the underlying operations explicitly in terms of an operational calculus. This means to describe experiential geometry by way of the operations that may generate it. In the next subsection, we will discuss the notion of an operational calculus known from intuitionism, and suggest how it may be reused to do finite constructions as well as geometric inference by concatenating constructive and inference calculi. We also show that locations or “points” of a geometry need not be presumed but can be constructed relative to a reference frame. 4.1
Operational Calculi, Inference and Explicit Construction
We suggest to use a form of Paul Lorenzen’s operational calculus [19] to describe an operational model of the perceptual operations that were discussed in the last section. Note that a calculus, which is formal, should not be confused with the actual human operations it describes17 . Furthermore, it only reflects our own preliminary ideas, which may need revision in the future18 . An operational calculus (not to be confused with infinitesimal calculus) is a basic mathematical tool of formal construction based on rules. Its most prominent application areas are formal inference in logic, where the validity of a sentence is proved by generating it from other ones using certain rules of deduction, and the formation of well-formed sentences from syntactic atoms. In intuitionism, constructive calculi are used in a more fundamental way, namely as a tool of constructive justification for logic as such, based on the idea of inductive proofs. Starting with Brouwer, Heyting and Kolmogorov, intuitionists have clarified the meaning of logical constants as well as inference rules based on operating with certain calculi19 . Lorenzen’s general conception of a calculus does not only apply to logic or inference, but also to mathematical object construction. For example, the mathematical idea of infinity can be reduced to potential infinity if we conceive it in terms of a calculus [20]. The flexibility of a calculus allows us to do both, constructing objects as well as to infer facts about them. A calculus is, according to Lorenzen [19], a description of a procedure to generate symbols (“Figuren”) from given symbols. The given symbols are written down at the beginning (A). New symbols are generated using a set of rules20 (R) 17 18 19
20
In particular, we do not claim here that cognitive human operations are formal symbol manipulations, as claimed in [26]. In particular, an important further development will be to attempt a calculus without predicates at all (other than equality), i.e., an algebra of perceptual operations. Lorenzen’s “protologic” [19] can be seen as an early attempt to give logical constants and deduction rules a proof-theoretic semantics, similar to Prawitz [29] and Dummett [5] (see [40]). These correspond to axioms and theorems in axiomatic theories.
314
S. Scheider and W. Kuhn
that can be iteratively applied to symbols. The rules have free variables standing for symbols to be substituted for them. For example, the following is a calculus for the construction of natural numbers: Knat : 1 x −→ x1
(A) (R)
(primitive atom: 1) (object variable: x)
Note that the arrow above is not a logical implication but denotes a rule, i.e. the permission to write down an instance of the symbols at the end of the arrow for every substitution of variables with objects. We call these variables object variables and denote them by lower case letters x, y, z.... They stand for objects constructible in the calculus, in this case for numbers. A rule can have more than one input or output figure separated by a comma, I, I, ... −→ O, O, .... At the beginning of this calculus, there only exists the primitive atom 1. If we iteratively apply rule R starting from the atom, we can generate a series of strings of primitive atoms, e.g, R(1) = 11, R(R(1)) = 111, .... These strings of primitive atoms are called objects, whereas strings of primitive atoms with object variables, e.g. x1 in the rule above, are called object formulas. We generate new objects by inserting objects into object formulas. We call a set of symbols generated in this way a derivation. Note that derived objects can always be ordered according to their production sequence. We not only need to generate objects, but also realize relations among them. Predications are strings of objects and relation symbols generated according to further rules. Formulas are strings that additionally contain object variables and are used in these rules. We generate predications by substituting objects in formulas. Knat+ : 1 + 1 = 11 x+1=y − → x1 + 1 = y1 x+y =z − → x + y1 = z1
(A) (R1 ) (R2 )
(primitive atoms: 1, +, =) (object variables: x, y, z)
For example, arithmetics can be constructed by rules R1 and R2 , which allow to derive the predication 11 + 11 = 1111 by substituting objects in the formula x + y = z. Note that in contrast to Tarskian formal semantics, operative figures do not have an interpretation into a domain. The distinction between predicates and objects is therefore not based on such an interpretation. Let us return now to our initial question: What is an explicit construction? Our calculus Knat+ obviously contains closure rules. Our constructed arithmetic set would be infinite if every constructible object actually was constructed, but this is impossible as a matter of fact. We suggest therefore that an explicit construction is not a calculus, but involves a particular application of a calculus
Finite Relativist Geometry Grounded in Perceptual Operations
315
in a finite number of steps, i.e. a particular derivation. This requires that the existence of constructed entities needs to be conceived in terms of derivation, not derivability. In an intuitionist sense, however, existence ∃, as well as other logical constants, are based on derivability in a calculus. Here, ∃x just means that object x with certain properties can be derived [19]. Similarly, inference of rules, called admissibility by Lorenzen [40], is based on derivability: A rule is called admissible if it does not increase the set of derivable figures. For example, a rule is admissible if the head of the rule can be derived from the condition by concatenating already admissible rules (deduction)21 . Analogously, negation is understood in terms of underivability: ¬A means that predication A is underivable in the calculus. This can be expressed by a particular admissible rule: ¬A is defined as the rule A −→ ⊥, where ⊥ is an underivable symbol in the respective calculus [19]22 . Using this definition we can also infer negations, for example by reductio ad absurdum (R.A.A.): Suppose we have already derived ¬A and B −→ A, where B is the hypothesis to be refuted. It is then easy to infer that ¬B. For, since by condition B −→ A and by definition A −→ ⊥, we have B −→ ⊥ by deduction, which just means ¬B. We intend to use an operative calculus not only to do logical inference, but also to generate initial finite models. Since the former is based on derivability while the latter on actual derivation, we propose to distinguish calculi according to their purpose, i.e. between inferential and constructive calculi. A constructive calculus is just an auxiliary mechanism to construct a finite domain. Non-existence means that an object actually has not been constructed in a particular derivation, which encompasses but is broader than underivability. This interpretation is useful since it can reflect a particular observation process: Observation is a product of an observer taking perceptual decisions and directing his attention at phenomena present in his field of view. Thereby, he does not everything he may be able to do, and we are only interested in his observed facts. In an inferential calculus, instead, just as in intuitionistic logic, objects exist and predications are true if and only if they are derivable in it. True facts are either given (i.e., observed) or derivable from the given, and all others are considered false. If the constant ⊥ has been introduced into the calculus, negations can be inferred by showing underivability from observed facts23 . Regarding human perception, such a calculus has its analogue in Gestalt completion mechanisms [15], which account for large parts of unconscious automatic human reasoning.
21 22 23
There are still further inference principles, e.g. inversion, see [19]. If this rule is admissible and ⊥ is underivable, then A must be underivable, too. Another possibility is that the observer may predicate negative statements directly, i.e. in terms of observed absence of a phenomenon in the field of view. In order to account for the existence of occluded phenomena, observed absence thereby does not imply non-existence.
316
S. Scheider and W. Kuhn
In order to generate a model of finite relativist geometry, we propose therefore to concatenate operational calculi in the following way: In Sect. 4.2, we generate the domain of foci of attention in a constructive calculus, in which object derivation (not: derivability) corresponds to an observer’s generation of a focus, and which allows “positive” initial predications as observed facts. In such a calculus, contradiction can never occur since every generated statement is a positive assertion of some observed phenomenon. In order to infer expected Euclidean properties from these observed facts, we then apply an inferential calculus in the standard intuitionist sense to the finite number of facts generated in the former calculus. This relational closure calculus contains closure rules (Sect. 4.3) which largely correspond to geometric implications. The set of predications derivable in this way is finite, because the set of relations on a finite domain is bound to be finite. Note that the input to the inferential calculus are those and only those positive facts generated in one particular derivation of the constructive calculus. Furthermore, closure rules may have negations in their condition but not in their heads. Thus the resulting model is assured to be finite and does not contain contradictions. We are aware that this procedure does not yet ensure that every model derivable in this way is one of the intended finite relativist geometry. Our paper goal is rather to construct one such model, and use it to demonstrate the consistency of constructive inference rules. 4.2
Initial Attentional Construction
In the following, we will use letters a, b, c, d, ..., t to denote objects and x, y, z, v, w, u to denote object variables. For convenience, we will use the following abbreviation for object figures: Let o be an object atom with a0 ≡ o. Let xy stand for any object constructed by concatenation, then xy+1 ≡ xy o. For example, a1 ≡ a0 o ≡ oo. Thus, two object figures with different increments are always unequal. Now consider the following calculus for generating attentional steps and their observed interrelations: Kattention : (primitive atoms: o, OnL, =L , P F ) a0
(A)
xy −→ xy+1 xy , xw , xu , xv −→ xy xw =L xu xv
(Rstep ) (RL )
xy , xw , xu −→ OnL(xy , xu , xw ) xy , xw −→ P F (xy , xw )
(ROnL ) (RP F )
The beginning of this calculus consists of a single focus of attention a0 . The first rule is a rule to generate a new attentional moment starting from any attentional moment, i.e., to take a step. It generates foci of attention by incrementing the subscripts of their symbols. All other rules do not generate foci of attention, but relations among them. The second rule generates a predication stating that xu , xv has the same distance as xy , xw . The third rule generates a predication
Finite Relativist Geometry Grounded in Perceptual Operations
Fig. 7. Construction of a triangle in terms of attentional moments. Time is indicated by the horizontal 3rd dimension in this figure.
317
Fig. 8. A reference frame for a 3dimensional cartesian coordinate system. The figure depicts its 3-dim spatial projection without time.
stating that xu lies on the line between xy , xw . The fourth rule associates two foci on the same point-like feature. Taking an attentional step stands for the act of referencing, i.e. for moving the attentional focus in the near-body environment. It generates our domain of interpretation. The other operations represent perceptual predications, i.e. operations that compare and associate foci of attention with each other. In order to take an attentional step with specific properties, we have to construct it by concatenating operations, i.e. by first taking a step and then generating the required relations by comparison. This reflects the idea that predication and referencing are two different kinds of operations, while in practice they may happen almost synchronically. In this calculus, we can generate the following sequence of foci of attention constructing an equilateral triangle (compare Fig. 7). We first make a step from the beginning to any other focus. Then we add another step such that the pairs consisting of new focus and the two previous ones are congruent to the first step.
(1) (2) (3) (4)
DerivationTriangle : a1 a2 a0 a1 =L a1 a2 a0 a1 =L a0 a2
|Rstep (a0 ) |Rstep (a1 ) |RL (a0 , a1 , a1 , a2 ) |RL (a0 , a1 , a0 , a2 )
This is basically a description of what we do when we construct a triangle with a compass, where a2 is in the intersection of two circles centered on a0 and a1 with radius a0 , a1 . Note that this procedure generates a nondegenerate triangle only if these foci do not coincide (and are not on a line). This means we first need to construct the notion of locational coincidence, which exists only relative to some frame of reference. A spatial reference frame consists of 3 point-like features standing perpendicular to each other on a common origin, which is also a point like feature. For better readability, we write focus names as depicted in Fig. 8 and not with proper increments.
318
S. Scheider and W. Kuhn
DerivationReferenceFrame : (1)
s
|Rstep (a∗ )
(2) (3)
P F (s, s) a1
|RP F (s, s) |Rstep (s)
(4) (5)
P F (a1 , a1 ) a∗ s =L sa1
|RP F (a1 , a1 ) |RL (a∗ , s, s, a1 )
(6) (7)
OnL(a∗ , s, a1 ) a2
|ROnL (a∗ , s, a1 ) |Rstep (a1 )
(8) (9)
P F (a2 , a2 ) a∗ s =L sa2
|RP F (a2 , a2 ) |RL (a∗ , s, s, a2 )
(10) (11)
a3 P F (a3 , a3 )
|Rstep (a2 ) |RP F (a3 , a3 )
(12) (13)
a∗ s =L sa3 a∗ a2 = L a2 a3
|RL (a∗ , s, s, a3 ) |RL (a∗ , a2 , a2 , a3 )
(14) (15)
a∗ a2 = L a1 a3 a∗ a2 = L a1 a2
|RL (a∗ , a2 , a1 , a3 ) |RL (a∗ , a2 , a1 , a2 )
In this construction, s denotes a focus on the origin and ai denotes one of three foci on perpendicular unit vectors in this reference system (compare Fig. 8). Focus a∗ is on an auxiliary point needed to assert orthogonality. Orthogonality is assured by the condition that distances of foci ai to each other are all congruent to a∗ a2 , and by the fact that a∗ is not on the same point-like feature as s24 . This latter negative fact is inferred in the inference calculus of Sect. 4.3. The construction of this reference frame assures that our model has got at least 3 spatial dimensions. It therefore directly corresponds to Tarski’s lower dimension axiom for 3D (compare [45]). Note also that a primitive way of time perception (≤T ) is given by the derivation order. 4.3
Relational Closure Calculus
The following inferential calculus is intended to close the domain of relations with respect to perceptual predications in such a way that it reflects our expectations about experienced geometry. The rules that have to be introduced largely correspond to geometric axioms in a FOL theory, such as the one of Tarski [41]. Yet, our calculus assures finite constructibility and accounts for relativity of locations. 24
Orthogonality can then be proved along the following lines (compare Fig. 8): a∗ , s, a1 lie on distinct point-like features of a line. Thus angle a∗ , s, a2 must be supplementary to angle a1 , s, a2 . Since segment a∗ a2 is congruent to a1 a2 , and the angle sides must also be congruent by construction, triangles a∗ , s, a2 and a1 , s, a2 must be congruent, too. Thus the supplementary angles must be congruent. The intended result now follows from the fact that congruent supplementary angles are always right angles.
Finite Relativist Geometry Grounded in Perceptual Operations
319
The beginning of this calculus consists of all and only those objects and facts generated by the reference frame derivation of the initial attentional calculus in the last section25 . The calculus exclusively contains rules to add new relation tuples based on existing ones (relational closure rules), so the object domain remains equal. We will use inference to show which Euclidean properties are entailed by this calculus. We will also show that relative geometry, which is built on foci of attention, not locations, behaves neutral with respect to foci on the same location, as expected. Reference Frame Rules. As argued in Sect. 3.4, a spatial reference frame consists of point-like features, not foci of attention, and can be used to define locations relative to it. For this purpose, it needs to retain the intrinsic geometric properties constructed in the last section through time. This can be expressed by requiring the same configuration for arbitrary foci that lie on the same point-like features (we use for iterating over inputs and outputs). For a rule that can be used into both directions, we write . We first abbreviate the fact that foci lie on the same reference frame: 3 Rule 1 (DRef ) : PF (s, xs ), i=1 PF (ai , xi ) RefFrame(xs , x1 , x2 , x3 ) Rule 2 (RRef ) : RefFrame(xs , x1 , x2 , x3 ) −→ OnL(a ∗ , xs , x1 ), 1≤i<j≤3 xi xj =L a1 a2 Rule 3 (Rfix ) : RefFrame(xs , x1 , x2 , x3 ) xs s =L ss,
3 i=1
3 i=1
a ∗ s = L xs xi ,
xi ai =L ai ai
This last rule says that foci on the same point-like feature of the frame have zero distance from each other. This assures that the reference frame is immovable with respect to the observer. It is later used to prove that the four point-like features correspond to four locations. Note that from rule 1 and the reflexive facts about PF in the reference frame construction (p. 318f.), it follows immediately that RefFrame(s, a1 , a2 , a3 ). Rules about Congruence. Consider the following closure rules for =L : Rule 4 (Reflexivity=L ) : x , y −→ xy =L xy, xx =L yy Rule 5 (Connectivity=L ) : xy =L zu, xy =L vw −→ zu =L vw Rule 6 (Identity=L ) : xy =L zz −→ x =Ref y These rules seem to comply with our intuition about length comparison: The distance of two foci is always equal to itself, and if a length equals two other lengths, then those are equal, two. The last rule says that the distance of a focus to itself is the same as the distance between two foci on the same location, denoted by x =Ref y. The following rules can immediately be proven to follow: 25
The object constants s, a∗ , a1 , a2 , a3 denote the generated foci of attention.
320
S. Scheider and W. Kuhn
Fig. 9. The location of a focus relative to the frame is fixed by its distance to the point-like features of the frame
Derived rule 1 (Symmetry=L ): xy =L zu −→ zu =L xy Proof. xy =L zu by condition and xy =L zu by rule 4. By rule 5, zu =L xy. Derived rule 2 (Transitivity=L ): xy =L uv , uv =L vz −→ xy =L vz Proof. Applying derived rule 1 to the first input above directly yields rule 5. In order to express that our model has got no more than three spatial dimensions, we additionally require that the relative location of two foci of attention x, y is fixed once their distances to the chosen four point-like features of the reference frame are equal (Fig. 9). This is done by the following rules, which define location equivalence relative to the frame introduced above: Rule 7 (Dlocus ) : RefFrame(xs , x1 , x2 , x3 ), xxs =L yxs ,
3 i=1
xxi =L yxi −→ x =Ref y
Rule 8 (R3D ) : x =Ref x , y =Ref y −→ xy =L y x Location equivalence x =Ref x simply means that x and x are bound to have the same distances to every other location in focus on y and y . To put it in another way: length comparison behaves neutrally with respect to foci on the same location. Note that the last rule has y and x reversed, which can be used to prove that the order of foci is irrelevant. This is because equidistant steps are reversible: it is always possible to return to the same locus by taking a step forward and then a step back with the same length. Derived rule 3 (Reflexivity=Ref ): x −→ x =Ref x Proof. By rule 4, we have xa =L xa, where a ∈ {s, a1 , a2 , a3 }. By rule 7, therefore x =Ref x . Derived rule 4 (Symmetry=Ref ): x =Ref y −→ y =Ref x Proof. By derived rule 1 and the input, we have ya =L xa with a ∈ {s, a1 , a2 , a3 }. So by rule 7, y =Ref x . Derived rule 5 (Reversibility=L ): x , y −→ xy =L yx
Finite Relativist Geometry Grounded in Perceptual Operations
321
Proof. By derived rule 3, x =Ref x and y =Ref y. By rule 8, xy =L yx. The challenge is now to show that from these rules, a general locus neutral geometry on foci can be obtained. The following generalizes these results over location equivalent foci: Derived rule 6 (Locus neutrality 1): x =Ref x , y =Ref y −→ xy =L x y Proof. By rule 8 and derived rule 3, we have xy =L yx and y x =L x y . By rule 8 and the input, it follows also that xy =L y x . By derived rule 2, then xy =L x y . Derived rule 7 x =Ref y −→ xx =L xy Proof. By derived rule 3, we have x =Ref x. By input also x =Ref y, and so, by derived rule 6, xx =L xy. Derived rule 8 (Locus neutrality 2): x =Ref x , y =Ref y −→ xx =L yy Proof. By derived rule 7 and the input, we have xx =L xx and yy =L yy . By rule 4, we also have xx =L yy, and by derived rule 2, xx =L yy . In the following, for convenience of reading, if we use object variables u, u with prime, we implicitly consider u =Ref u among the conditions of the respective rule: Derived rule 9 (General connectivity): xy =L zu, x y =L vw −→ z u =L v w Proof. xy =L zu by condition. Since also zu =L z u by locus neutrality 1, we get z u =L xy by connectivity and symmetry. With x y =L vw by condition and vw =L v w by locus neutrality 1, we get x y =L v w by transitivity. Using xy =L x y , we get the required result by transitivity. Taking a step of zero length, i.e., to step on the spot, leads to the same locus: Derived rule 10 (Locus identity): xy =L zz x =Ref y Proof. From right to left: x =Ref y and z =Ref z by conditions. By locus neutrality 2, the result immediately follows. From left to right: z =Ref z by condition, so zz =L zz by derived rule 7. By condition, also xy =L zz , and so xy =L zz by connectivity. By rule 6, we get x =Ref y. If we substitute location equivalence =Ref with the identity sign26 =, then derived rule 10 corresponds to Tarski’s identity axiom of congruence [41,45], rule R3D to Tarki’s first congruence axiom [41,45], and derived rule 9 to congruence Axiom 2 in [41,45]. We have already mentioned that the lower dimension axiom [41,45] is captured by the construction of the reference frame itself. Note that locus identity is a biconditional instead of a simple implication as in [41]. This directly assures that steps of zero length are always congruent to each other, without any need to draw on the Axiom of Segment Construction. 26
But note that = in our theory does not mean the same as =Ref , because foci are not locations.
322
S. Scheider and W. Kuhn
Rules about Point-like Features. Now consider the following rules which basically generate the symmetric transitive closure of PF : Rule 9 (ReflexivityPF ): PF (x , y) −→ PF (x , x ), PF (y, y) Rule 10 (TransSymPF ): PF (x , y), PF (y, z ) −→ PF (x , z ), PF (z , x ) We can prove now that all foci on the same point-like feature of the chosen reference frame (which are an equivalence class) are also on the same location. This means the point-like features of the frame must range among the locations: Derived rule 11 PF (x , s), PF (y, s) −→ x =Ref y Proof. By rule 1, we have RefFrame(x , a1 , a2 , a3 ) and RefFrame(y, a1 , a2 , a3 ). By rule 2, we have a∗ s =L xai and a∗ s =L yai for all i ∈ {1, 2, 3}. By transitivity, yai =L xai , and by rule 3 and rule 6, xs =L ys. This satisfies the input of rule 7, and so x =Ref y. Rules about Collinearity and Order. The following rules characterize the collinearity relation OnL. These are quite numerous compared to [41], because we have to compensate for the loss of the Axiom of Pasch and Segment Construction, compare [41]. First, OnL also has some identity rule: If we focus our attention on a spot lying between two foci located at the same locus, then this spot must be at the very same locus. The reflexivity axiom assures that OnL applies to the degenerate case where two points coincide, and symmetry captures the comprehensible fact that points on a line can be ordered in two ways. The other rules assure that OnL-triples with two points in common are ordered on a line, as one would expect. Rule 11 (IdentityOnL ): OnL(x , y, x ), x =Ref x −→ x =Ref y Rule 12 (ReflexivityOnL ): y =Ref y , x −→ OnL(x , y, y ) Rule 13 (SymmetryOnL ): OnL(x , y, z ) −→ OnL(z , y, x ) Rule 14 (InnerTransOnL ): OnL(x , y, z ), OnL(y, u, z ) −→ OnL(x , y, u), OnL(x , u, z ) Similarly, it is now possible to generalize these results over location equivalent foci27 : Derived rule 12 (Locus neutrality 3): OnL(x , y, z ) −→ OnL(x , y , z ) Proof. OnL(x, y, z) by condition and OnL(y, y , z) by reflexivity and symmetry. By rule 14, we obtain OnL(x, y, z ), and reversing the result by symmetry yields OnL(z , y, x). Together with OnL(y, y , x) by reflexivity, we get OnL(z , y , x) by rule 14. And again, since OnL(y , x , x) by reflexivity and symmetry, we obtain OnL(z , y , x ) by rule 14, and so OnL(x , y , z ) by symmetry. 27
Remember that the condition x =Ref x is abbreviated using primed variables x, x .
Finite Relativist Geometry Grounded in Perceptual Operations
323
Derived rule 13 (General Inner Transitivity): OnL(x , y, z ), OnL(y , u, z ) −→ OnL(x , y , u ), OnL(x , u , z ) Proof. Follows immediately from applying rule 14 and derived rule 12. Derived rule 14 OnL(x , y, z ), OnL(x , z , v ) −→ OnL(y , z , v ), OnL(x , y , v ) Proof. If we convert the condition by symmetry, we get the condition for derived rule 13, and thus OnL(v , z , y ), whose symmetrical conversion yields the first required result. Similarly, for the second result. If we substitute =Ref with =, then rules 11, 12 and 13, as well as derived rules 13 and 14 correspond to essential axioms or theorems in [41]. The rules corresponding theorems cannot be derived as in [41] because of the loss of infinity axioms. Subtractivity of Lengths. Now we need to add rules for governing the interrelation of the two geometrical observation predicates. These are essential in order to describe something similar to a Euclidean space. It turns out that we can use variants of Tarski’s five segment axiom in order to obtain a finite version of absolute geometry. The so called (inner) five segment axiom allows us to express length summations as well as to characterize angles. As shown in Fig. 10, the rule states that in a certain configuration of four segments, the length of a fifth segment needs to be fixed, i.e. x y z u Rule 15 (Inner5Seg): IFS x∗ y∗ z∗ u∗ −→ yu =L y ∗ u ∗ x y z u , where IFS x∗ y∗ z∗ u∗ abbreviates OnL(x , y, z ), OnL(x ∗ , y ∗ , z ∗ ), xz =L x ∗ z ∗ , yz =L y ∗ z ∗ , xu =L x ∗ u ∗ , zu =L z ∗ u ∗ . For example, using this rule, we can prove a subtractivity property of lengths: Subtracting congruent segments from congruent segments derives congruent segments: Derived rule 15 (Subtractivity): OnL(x , y, z ), OnL(x ∗ , y ∗ , z ∗ ), xz =L x ∗ z ∗ , yz =L y ∗ z ∗ −→ xy =L x ∗ y ∗ Proof. Apply rule Inner5Seg to the condition taking x for u.
Fig. 10. The (Inner) Five Segment Axiom. The length of segment yu is fixed once x, y, z, u exhibit a five segment configuration. Source: [45].
324
S. Scheider and W. Kuhn
Inferring and Using Negations. The facts inferred so far do not incorporate any negative statements. We show lastly how to derive negative statements from observed positive facts in an intuitionist manner, and how to use these as inputs to further rules that have negations in their conditions. We can infer that our reference frame needs to be non-degenerate, by proving that two foci of the reference frame do not coincide: ¬a∗ =Ref s
(16)
Proof. We first prove the underivability of PF (a ∗ , s) in our inference calculus. By rules 9 and 10, point-like features are equivalence classes of foci, so PF (a ∗ , s) would be the case if and only if a∗ was in the same equivalence class that contains s. The latter is just {s} by the reference frame derivation step (2) on page 318. Since there are no other inference rules to derive P F (a∗ , s), it is admissible that P F (a∗ , s) −→ ⊥. Now we prove the rest by R.A.A. By derived rule 7, we have s =Ref a∗ −→ a∗ s =L ss. Applying reflexivity rule 4 on all ai , we can use rule 3 (inverse direction) to derive s =Ref a ∗ −→ RefFrame(a ∗ , a1 , a2 , a3 ), and rule 1 (inverse direction) to obtain s =Ref a∗ −→ P F (a∗ , s). With P F (a∗ , s) −→ ⊥, we obtain s =Ref a∗ −→ ⊥, and this just means ¬a∗ =Ref s by definition. It can now be similarly proved by contradiction that the point-like features of the frame must not be coincident, too, e.g. for a pair xs , x1 : Derived rule 16 (Non-degeneracy): RefFrame(xs , x1 , x2 , x3 ) −→ ¬xs =Ref x1 Proof. From the condition and from rule RRef , we know that a∗ s =L xs x1 . Now suppose xs =Ref x1 was derivable by inference. Then, by derived rule 10 (from left to right), this would mean a ∗ =Ref s. But this would contradict the already derived statement ¬a ∗ =Ref s above. Once negation is introduced, we may add rules that have negations in their conditions (negative closure rules). For example, there is one rule misssing in order to derive that OnL triples with two loci in common are ordered on the same line (compare [41]): Rule 16 OuterTransOnL : OnL(x , y, z ), OnL(y , z , u ), ¬y =Ref z −→ OnL(x , y , u ), OnL(x , z , u ) We can then also add a variant of the five segment axiom with negative condition in order to assure additivity of segments [41].
5
Conclusion
We have argued for certain perceptual operations that can be used to ground geometry experientially. The human attentional apparatus allows for referencing and predication of geometrically relevant Gestalt phenomena in the vista environment. In particular, it allows for detecting whether one focus of attention
Finite Relativist Geometry Grounded in Perceptual Operations
325
precedes another one (primitive perception of time), whether attention focuses on the same point-like feature (P F ), whether a given pair of foci is congruent to another pair (=L ), and whether a focus points between two others (OnL). We argued further that, in order to construct a geometry in the usual relativist sense based on these human competences, we need to identify a frame of reference made of point-like features, with respect to which relative locations, i.e. points, can be identified by arbitrary observers. We have used a constructive calculus to generate such a frame of reference. A further inferential calculus with closure rules allowed us to introduce location equivalence =Ref with respect to this frame, and to derive much of the intuitively expected behavior of the geometric notions. In particular, it allows to derive and use negations in an intuitionist sense. The former calculus was only used to generate a particular initial construction, which corresponds to conscious attentional selection and perceptual predication. The latter one accounts for geometric inference, understood here as a kind of automatic Gestalt completion. In this way, we constructed a reference frame that corresponds to the 3D lower dimension axiom and its geometric properties by a set of inference rules (rules 1-17) that correspond to axioms and definitions of a FOL geometry. Such a theory is a finite relativist variant of absolute geometry, which does not have the parallel postulate, but allows us to define angles, lengths and projections with their usual properties (compare [41], and also [36]), and furthermore to define locations relative to the reference frame. A simple model of this theory can be generated by exhaustively applying the inference calculus to the initial finite construction. Regarding the two geometric calculi, it would be desirable to have a consistent set of rules that ensure every derivation to be a model of finite relativist geometry. This is not the case yet. It is, for example, possible to construct a > 3D initial space which would not comply with R3D . Regarding completeness, the calculus lacks a rule that corresponds to the disjunctive outer connectivity Axiom 18 in [45], since intuitionist disjunctions require one of the disjuncts to be derivable [19]. In general, it is open which of the remaining Euclidean axioms in [45] (e.g. the parallel postulate) can be given such a constructive interpretation. Acknowledgements. This work was funded by the European Commission (ICT-FP7-249120 ENVISION project) and the German Research Foundation (Semantic Reference Systems project DFG KU 1368/4-2). It also owes a lot to its anonymous reviewers.
References 1. Aristotle: Physics. Hardie, R. P., Gaye, R. K.(eds.) University of Adelaide, Adelaide (2007), http://ebooks.adelaide.edu.au/a/aristotle/physics/ 2. Arnheim, R.: Art and visual perception. University of California Press, Berkeley (2004) (50th anniversary printing edn.) 3. Bennett, B.: A categorical axiomatisation of region-based geometry. Fundamenta Informaticae 46(1-2), 145–158 (2001)
326
S. Scheider and W. Kuhn
4. Burgess, N.: Spatial cognition and the brain. Ann. N. Y. Acad. Sci. 1124, 77–97 (2008) 5. Dummett, M.: The logical basis of metaphysics. Duckworth, London (1991) 6. Eddington, A.: What is geometry? In: Smart, J. (ed.) Problems of Space and Time, pp. 163–177. The Macmillan Company, New York (1964) 7. Gerla, G., Volpe, R.: Geometry without points. The American Mathematical Monthly 92(10), 707–711 (1985) 8. Gibson, J.: The ecological approach to visual perception. Houghton Mifflin, Boston (1979) 9. v. Glasersfeld, E.: An attentional model for the conceptual construction of units and number. Journal for Research in Mathematics Education 12(2), 83–94 (1981) 10. v. Glasersfeld, E.: Radical Constructivism: A Way of Knowing and Learning. The Falmer Press, London (1995) 11. Habel, C.: Discreteness, finiteness, and the structure of topological spaces. In: Topological foundations of cognitive science. Papers from the workshop at the FISI-CS, Buffalo, NY. Graduiertenkolleg Kognitionswissenschaft (Report 37), pp. 81–90. Universit¨ at Hamburg, Hamburg (1994) 12. Harnad, S.: The symbol grounding problem. Physica D 42, 335–346 (1990) 13. Hilbert, D.: Grundlagen der Geometrie, 12th edn. (1977) 14. Janowicz, K., Compton, M.: The stimulus-sensor-observation ontology design pattern and its integration into the semantic sensor network ontology. In: Proceedings of the 3rd International Workshop on Semantic Sensor Networks, SSN 2010 (2010) 15. K¨ ohler, W.: Gestalt psychology. An introduction to new concepts in modern psychology. Liveright, New York (1992) 16. Kuhn, W.: Semantic reference systems. Int. J. Geogr. Inf. Science 17(5), 405–409 (2003) 17. Langacker, R.: Nouns and verbs. Language 63(1), 53–94 (1987) 18. Leibniz, G., Clarke, S.: The Leibniz-Clarke correspondence. Manchester University Press, Manchester (1956) 19. Lorenzen, P.: Einf¨ uhrung in die operative Logik und Mathematik. Springer, Berlin (1955) 20. Lorenzen, P.: Das aktual-unendliche in der mathematik. Philosophia Naturalis 4(1), 3–11 (1957) 21. Malkevitch, J.: Finite geometries? (2006), http://www.ams.org/featurecolumn/ archive/finitegeometries.html 22. Marchetti, G.: A presentation of attentional semantics. Cognitive Processsing 7(3), 163–194 (2006) 23. Mark, D., Frank, A.: Experiential and formal models of geographic space. Environment and Planning B 23, 3–24 (1996) 24. Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A.: Wonderweb deliverable d18: Ontology library, Trento, Italy (2003) 25. Montello, D.R.: Scale and multiple psychologies of space. In: Campari, I., Frank, A.U. (eds.) COSIT 1993. LNCS, vol. 716, pp. 312–321. Springer, Heidelberg (1993) 26. Newell, A., Simon, H.: Computer science as empirical inquiry: Symbols and search. Commun. ACM 19(3), 113–126 (1976) 27. Piaget, J.: Genetic epistemology. Woodbridge lecture no. 8, 1st edn. Columbia University Press, New York (1970) 28. Poincar´e, H.: Science and hypothesis. Dover Publ., N.Y (1952) 29. Prawitz, D.: Ideas and results in proof theory. In: Fenstad, J. (ed.) Proc. 2nd Scandinavian Logic Symposium, pp. 237–309. North-Holland, Amsterdam (1971)
Finite Relativist Geometry Grounded in Perceptual Operations
327
30. Probst, F., Espeter, M.: Spatial dimensionality as classification criterion for qualities. In: Bennett, B., Fellbaum, C. (eds.) Formal Ontology in Information Systems: Proceedings of the Fourth International Conference: FOIS 2006. Frontiers in Artificial Intelligence and Applications, vol. 150, pp. 77–88. IOS Press, Amsterdam (2006) 31. Pylyshyn, Z.: Things and places. How the mind connects with the world. The MIT Press, Cambridge (2007) 32. Quine, W.: Two dogmas of empiricism. The Philosophical Review 60, 20–43 (1951) 33. Quine, W.: The roots of reference. Open Court Publishing, La Salle (1974) 34. Quine, W.: On what there is. In: From a Logical Point of View. 9 LogicoPhilosophical Essays, 2nd edn. Harvard University Press, Cambridge (1980) 35. Rizzolatti, G., Sinigaglia, C.: Mirrors in the brain: How our minds share actions and emotions. Oxford University Press, Oxford (2008) 36. Scheider, S.: Grounding geographic information in perceptual operations. Ph.D. thesis, University of M¨ unster (2011) 37. Scheider, S., Janowicz, K., Kuhn, W.: Grounding geographic categories in the meaningful environment. In: Hornsby, K., Claramunt, C., Denis, M., Ligozat, G. (eds.) COSIT 2009. LNCS, vol. 5756, pp. 69–87. Springer, Heidelberg (2009) 38. Scheider, S., Probst, F., Janowicz, K.: Constructing bodies and their qualities from observations. In: Proc. of the Sixth International Conference on Formal Ontology in Information Systems (FOIS 2010), pp. 131–144. IOS Press, Amsterdam (2010) 39. Scholl, B.: Objects and attention: The state of the art. Cognition 80, 1–46 (2001) 40. Schroeder-Heister, P.: Lorenzen’s operative justification of intuitionistic logic. In: van Atten, M., Boldini, P., Bourdeau, M., Heinzmann, G. (eds.) One Hundred Years of Intuitionism (1907-2007), Birkh¨ auser, Basel (2008) 41. Schwabh¨ auser, W., Szmielew, W., Tarski, A.: Metamathematische Methoden in der Geometrie, Teil I: Ein axiomatischer Aufbau der euklidischen Geometrie. Springer, Berlin (1983) 42. Suppes, P.: Finitism in geometry. Erkenntnis 54, 133–144 (2001) 43. Tarski, A.: Foundations of the geometry of solids. In: Tarski, A., Woodger, J. (eds.) Logic, Semantics, Metamathematics: Papers from 1923 to 1938, pp. 24–30. Clarendon Press, Oxford (1956) 44. Tarski, A.: What is elementary geometry. In: Henkin, P.S.L., Tarski, A. (eds.) The Axiomatic Method. With Special Reference to Geometry and Physics, pp. 16–29. North-Holland Publishing, Amsterdam (1959) 45. Tarski, A., Givant, S.: Tarski’s system of geometry. The Bulletin of Symbolic Logic 5(2), 175–214 (1999) 46. Tomasello, M.: The cultural origins of human cognition. Harvard University Press, Cambridge (1999) 47. Tversky, B.: Structures of mental spaces: How people think about space. Environment and Behaviour 35(1), 66–80 (2003) 48. Vaihinger, H.: Die Philosophie des Als Ob. System der theoretischen, praktischen und religi¨ osen Fiktionen der Menschheit auf Grund eines idealistischen Positivismus. VDM Verlag Dr. M¨ uller, Saarbr¨ ucken (2007) 49. VanRullen, R., Koch, C.: Is perception discrete or continuous? Trends in Cognitive Sciences 7(5), 207–213 (2003)
Linking Spatial Haptic Perception to Linguistic Representations: Assisting Utterances for Tactile-Map Explorations Kris Lohmann, Carola Eschenbach, and Christopher Habel Department for Informatics, University of Hamburg Vogt-K¨ olln-Str. 30, 22527 Hamburg {lohmann,eschenbach,habel}@informatik.uni-hamburg.de
Abstract. Assisting utterances are helpful for blind and visually impaired map users exploring tactile maps. Virtual tactile maps explorable by haptic human-computer interfaces form the basis for multimodal presentations including automatically generated assisting utterances. This paper presents first empirical results regarding the type of utterances suitable for assisting the acquisition of survey knowledge on the basis of virtual tactile maps. The structure of the internal knowledge base, which has to support a connection between dynamic exploration movements and natural language, is presented. An example illustrates the approach and shows its practicability. Keywords: Virtual audio-tactile map, haptic human-computer interaction, representation for natural-language generation, assisted map reading, spatial knowledge acquisition.
1
Introduction
1.1
Motivation
Humans make use of spatial knowledge in everyday tasks, such as finding their way to the next bakery, planning their route through a shopping mall, and in communicating about their environment with other people. Whereas sighted people can fall back easily on maps when solving a task that requires additional spatial knowledge, blind and visually impaired people do rarely have access to appropriate external representations of the environment. To overcome this restriction, physically realized tactile maps that can serve as substitutes for visual maps [31] have been developed during the last decades. Appropriate tactile maps increase the autonomy of blind and visually impaired people, in particular by supporting route planning and wayfinding in complex urban environments without the assistance of a sighted guide [2]. Nevertheless, compared to visual maps,
The research reported in this paper has been partially supported by DFG (German Science Foundation) in IRTG 1247 ‘Cross-modal Interaction in Natural and Artificial Cognitive Systems’ (CINACS). We thank the anonymous reviewers for their highly useful commentaries.
M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 328–349, 2011. c Springer-Verlag Berlin Heidelberg 2011
Linking Spatial Haptic Perception to Linguistic Representations
329
current tactile maps have major drawbacks in terms of speed, accuracy and efficiency of the knowledge acquisition process resulting from the highly sequential character of tactile map reading (see [17], on the sequentiality of tactile recognition). For integrating the stream of sensory impressions over time, additional information given in another modality, such as speech, is useful [34]. The approach of Verbally-Assisting Virtual-Environment Tactile Maps (VAVETaM) aims at generating coherent natural-language assistance caused by the user’s hand movements during exploring tactile maps. These maps are presented using a haptic human-computer interaction device, the Sensable PHANToM Omni. The device and virtual tactile maps are discussed in Sect. 1.2. Maps of urban areas that include streets, squares and potential landmarks used to gain survey knowledge provide the research scenario for the approach. This paper focuses on spatial and conceptual representations suitable for generating verbal utterances, which improve spatial information the user gets from exploration movements. In particular, the proposed representation formalisms bridge between the layer of describing the content of haptic perception and the layer of constructing the content to be verbalized. Figure 1 shows how the components discussed in this paper interact with other components of the VAVETaM approach. The MEP Observer component categorizes the user’s movements semantically as map exploratory procedures (MEPs). The representation of the user’s movements and what they focus on is a central issue for the generation task (see Sect. 3). Note that the computational process of detecting these movements is out of the scope of this paper. In Sect. 4, we discuss how we determined a set of plausible assisting utterances to be generated by studying assisting utterances of human agents and how the resulting message classes are experimentally evaluated. One of the components we focus on in the following, called Virtual-Environment Tactile Map (VETM), is in charge of representing the knowledge needed for making the virtual tactile map accessible via the haptic device as well as for generating the verbal assisting utterances. VETM incorporates knowledge needed (1) for presenting the map in a way to be explored using the haptic device, (2) for
Fig. 1. Structure of the components needed for the generation of verbal assistance
330
K. Lohmann, C. Eschenbach, and C. Habel
interpreting the movements the user performs, and (3) for talking about the map (generating verbalizations). To fulfill the requirements for these tasks, VETM includes two representational layers, a spatial-geometric layer and a propositional layer. The representation formalism and the content of the propositional layer of the VETM are discussed in detail in Sect. 5. For solving the what-to-say task, the Generation of Verbal Assistance (GVA) component accesses the propositional layer of the VETM component for selecting and subsequent processing of information appropriate for verbalization. This selection is depending on the current haptic focus. GVA processing leads to a preverbal message [14], i.e., a semantic representation suitable for verbalization, to be realized by the Formulator and Articulator components of a speech production system. We provide an example for the process of selecting information for a preverbal message in Sect. 6. Refer to [16] and [8] for a more detailed discussion of the components outlined. 1.2
The Presentation of Tactile Maps in Virtual Environments
Virtual tactile maps are virtual counterparts to traditional physical tactile maps. They are tactilely perceivable using the Phantom Omni device (see Fig. 2). These maps resemble physical tactile maps in their structure. However, map objects, such as streets and buildings are depicted as concavities in a horizontal plane. Line-following and exploring the shape of geometric figures proved to be much easier when the objects are concave instead of raised (the latter way is standard for traditional tactile maps). Figure 2(b) shows a visualization of a cross-section through a map model. To explore the map, the users move the handle of the haptic device, which can be moved in all three spatial dimensions. The user’s haptic sensation is realized by applying force feedback, for example, when the virtual map surface is touched, and enables feeling the virtual tactile map. Performing exploration
Fig. 2. (a) The Sensable PHANToM Omni haptic interface (b) a visualization of a cross-section through a virtual tactile map
Linking Spatial Haptic Perception to Linguistic Representations
331
movements the user perceives the horizontal plane, i.e., the base of the map, and the engraved concavities, which depict map elements. The position information of a distinguished point of the device (at the joint of the handle), named ‘interface point’, is continually recorded by sensors of the haptic device. Thus the current position of exploration, the ongoing exploration movement, and the exploration history can be observed and analyzed by VAVETaM.
2
Multimodal Spatial Knowledge Acquisition and Assistance Scenarios
2.1
Acquisition of Route and Survey Knowledge
Humans’ spatial knowledge of their environment is highly differentiated. A wellestablished distinction is that between route knowledge and survey knowledge [26]. Route knowledge is often characterized as procedural knowledge on how to get from a specific point A to a specific point B. It includes information about decision points at which actions such as turning have to be performed [35,32]. Route knowledge does not include information about non-local spatial relationships between entities, such as knowing that a landmark A is north of a landmark B (nevertheless, spatial relations can possibly be inferred from detailed route knowledge). In contrast, survey knowledge of an environment has an overview-like, pictorial character (see [20,33], on viewpoint (in)dependency of spatial representations). Survey knowledge is very important for acting autonomously in the environment, e.g., for navigation and route planning. For navigating agents, it is essential to have access to survey knowledge if corrections from a planned route have to be made, for example, because a part of the planned route is not accessible, or to choose short-cuts. Furthermore, a route-planning agent relies on survey knowledge in order to plan a route appropriate for a task, taking into account individual needs and interests. Knowledge about the environment can be acquired in different modes1 , in particular, (i) by exploring and perceiving the environment (direct experience) and (ii) by using external representations such as (tactile) maps, GIS systems, written travel guides, or route instructions. Both, experience-based and representation-based acquisition can be supported by instructions or descriptions from a human or an artificial agent. Modes of acquisition have an impact on the spatial mental models, i.e., the internal representations of the environment [29], that are build up during acquisition. For the acquisition of survey knowledge, Lobben distinguishes between the modes of environmental mapping and survey mapping [15]. Environmental mapping is an experience-based process of building up a spatial mental model by exposure to an environment during exploration. 1
In this paper we use the term ‘mode’ in regard to acquisition processes and the term ‘modality’ (see below) with respect to representations.
332
K. Lohmann, C. Eschenbach, and C. Habel
Survey mapping is characterized as the process of generating a spatial mental model by the use of maps and is a case of representation-based acquisition. Acquiring spatial knowledge using tactile maps is a specific case of survey mapping, which is worth its effort since access to survey knowledge is important for autonomous navigation of blind and visually impaired people. On the one hand, survey knowledge supports autonomous and independent acting in the environment. On the other hand, inferring survey knowledge through environmental mapping is especially challenging without vision, since without visual perception it is harder to get information (such as distal landmarks [36]) that enables the generation of suitable environmental or absolute frames of reference. Empirical studies have shown the superiority of using tactile maps in comparison to direct experience for the acquisition of spatial mental models for navigation of blind and visually impaired people (see [30] for an overview). For example, Espinosa et al. [2] asked blind subjects to follow a route in an unfamiliar environment. The participants learned the route either by environmental mapping, by learning it from a tactile map, or by a combination of both. The groups using a tactile map outperformed the other group in their navigation performance when tested afterwards. Ungar [30] reports the general ability of blind children to understand tactile maps and, when using appropriate strategies, benefiting from them. 2.2
Assisted Acquisition of Spatial Knowledge: The Case of Verbally Assisted Tactile Maps
Acquisition of spatial knowledge is often performed in cooperative processes, in which external representations such as maps or language play a central role. For example, a first-year student exploring the campus can get assistance for building up spatial knowledge about the new environment by asking senior students for directions. Beyond instructing by pointing gestures, the latter can use different types of external representations: For example, they can draw freehand maps, they can annotate printed maps with text and drawings, they can give verbal instructions, and, last but not least, they can combine these instructional modalities. Whereas in direction giving, regardless of the modalities used, the participants in the joint action hold different roles, namely the role of the knowledge acquiring agent and the role of the assisting agent, there exist also symmetric cooperative acquisition processes, such as joint planning of a route using a map, in which both participants play the same role. In tactile-map reading using VAVETaM, which is intended to reduce the difficulties in processing of traditional tactile maps (see Sect. 1.1), the human user holds the role of the knowledge acquirer and gets assisted by automatically generated verbal descriptions. In particular, this descriptions are generated with the goal to support the process of acquiring survey knowledge by free exploration of tactile maps. In gaining an overview of an environment—in contrast to finding and learning a specific route—the map reading agents should be free in deciding the order in which they explore objects and areas of the map, in other words, assisted map
Linking Spatial Haptic Perception to Linguistic Representations
333
exploration for building up survey knowledge should be a case of free exploration. Thus, the knowledge acquirer and the assisting agent, the two participants of the cooperative activity of assisted map reading, behave differently regarding taking the initiative. Regularly, the user takes the initiative by exploring objects represented in the tactile map and the assistant reacts to the user’s actions by generating instructive verbal utterances. The assisting agent has to consider the current action of tactile exploration as well as the history of the user’s exploration and of the assisting utterances generated before. The approach to augmenting tactile maps by sound we discuss in this paper is to give verbal descriptions similar to those a human assistant would give in a comparable situation. 2.3
Multimodal Tactile Maps: Related Work
Multimodal interfaces providing visually impaired people access to spatial information are currently in the focus of research. Golledge and colleagues [4] extensive overview of the technologies used, argues in particular for the relevance of tactile interfaces and the use of auditory information (sound and speech). Jacobson [10] discusses multimodal systems for substituting vision-based systems. In conclusion, Jacobson states that haptic and auditory interfaces have potential to efficiently convey information to blind and visually impaired users. During the last two decades, the combination of tactile maps and acoustically presented information, e.g., sonified cues or pre-recorded phrases identifying objects presented on the map, has emerged as a promising approach towards multimodal improvement of map usage for visually impaired people. Some approaches have been developed that rely on substituting haptic feedback completely by sound. ‘KnowWhere’ is a system developed by Krueger and Gilden [12]. In this system, sound is used as substitute for tactile perception. Jacobson developed a similar approach in which areas of a touch pad are overlaid with sound [9]. Parente and Bishop’s BATS system uses spatial sound, which appears to originate from a point in three-dimensional space combined with synthesized speech labels expressing the names of objects [22]. The system can be controlled by consumer-grade tactile devices such as mice, trackballs, and force-feedback joysticks. Other approaches rely on a combination of traditional physical tactile maps and other hardware. Parkes [23] presents a multi-purpose system that is based on a touchpad on which previously produced traditional tactile maps can be placed. Miele and colleagues [19], Zeng and Weber [37], and Wang and colleagues [34] developed—based on different types of tactile interfaces (e.g., touch pads and large-scale Braille displays)—similar systems, which provide additional information using speech. The last-mentioned system is evaluated by six blind participants. De Felice and colleagues [1] developed an application using a similar device as we do. Objects can be associated with text that is converted to speech using synthesis. A comparable system, including automatic generation of the maps using image data, is discussed by Kostopoulos and colleagues [11]. Magnusson and Rassmus-Gr¨ ohn [18] describe a PHANToM-based system that incorporates dynamic objects such as bicycles and cars. They report that most
334
K. Lohmann, C. Eschenbach, and C. Habel
users were able to interact with the complex virtual environment. However, the authors also report users’ demands for more schematic, map-like representations. The approaches mentioned above give audio information when certain objects are touched, either automatically, or as reaction to an explicit request of the user. They provide sound output by using direct links between tactilely activated entities and pre-stored phrases or pre-recorded sounds. Giudice and colleagues [3] report on a learning setting contrasting to the ones discussed above. In their setting, verbal descriptions of a virtual indoor environment are given by a virtual verbal display, a nonvisual interface, which provides verbal directions determined by the location and orientation of the agent. Their experimental results give strong evidence that context-sensitive verbal directions contribute to the development of survey knowledge. Akin to Guidice and colleagues, the approach we present here focusses on situation-based generation of verbal output. The motivation of our work is to develop a system that gives assisting utterances similar to what a human assistant could give. This includes to describe focussed parts of a map in an appropriate, helpful manner. To be able to do so, the user’s movements are semantically categorized to control natural language generation. Thus, our approach has to be able to take the exploration and verbal-assistance history into account. In doing so, unnecessary repetitions can be avoided and the assisting utterances can be constructed in a linguistically appropriate manner, e.g., by using anaphora.
3
Haptic Focus and Perceptual Categories
When artificial agents generate situated verbalizations in dynamic scenarios, they have to organize the input in semantic categories [25]. In the discussed setting, the assisting agent, needs to have access to a representation of the actions of tactile exploration including such a categorization. The haptic device provides a stream of position information of the interface point. The task of the MEP Observer component is to process this information. In the following, the requirements for a representation of the user’s exploration actions are discussed. To support the production of assisting utterances, this representation includes information about the way the map is explored and, if available, the object(s) in the haptic focus. For general haptic perception, Lederman and Klatzky describe a set of exploratory procedures (EPs) [13]. The exploratory procedures are developed for haptic perception categories such as identifying the shape of a three-dimensional object or identifying the temperature of an object. However, more elaborate procedures have to be identified to serve the current purpose of categorizing the user’s actions during map exploration. The specialized categories are called map exploratory procedures (MEPs) [16,8]. They result from video analyses of visualizations of explorations of virtual tactile maps. The five MEPs are listed in Table 1. They are mainly distinguished regarding the type of object explored with the procedure. The term ‘track’ is a more general term for structures enabling locomotion, such as streets. Track objects on the map are explored by line following. By
Linking Spatial Haptic Perception to Linguistic Representations
335
observing this characteristic motion, the track-exploration category can be identified. This category of tactile exploration is named trackMEP. In addition to information about the exploration category, the assisting agent needs to have information about the objects in the haptic focus. Correspondingly, a trackMEP is specified by the specific track segment that is explored (cf. Sect. 5.1). The optional specification of the current speed of motion is useful for generating utterances referring to the user’s exact position (e.g., ‘Now, the town hall is to your left.’) If the movement is too fast, such utterances should be prevented. In addition to track segments, potential landmarks are the subject of exploration. Their exploration is performed by a map exploration procedure named landmarkMEP. As for the trackMEP, to be appropriate input for a language generation component, the landmarkMEP is specified with the map object in the haptic focus. In contrast to the trackMEP, there is no need to include further information such as the speed of the movement. The third exploration category of interest is the region exploration category. For the regionMEP, the specification of the region in focus is optional since some map regions might not be represented on the propositional layer of the VETM that stores the information needed for verbalization (e.g., the upper left area of the map). A frequent and important exploration category identified in the video analyses is the exploration of the frame of the map. The exploration category is frameMEP. This category is specified with an identification of the frame segment touched. The stopMEP is an indicator that nothing happens at the moment rather than providing information about the map object in the haptic focus. As the user interacts at one specific point with the map, not more than one map exploratory procedure can be performed at a time. As a result, the dynamic input to the Generation of Verbal Assistance component are sequentially ordered map exploratory procedures and their optional specification (see Table 1). Table 1. Types of map exploratory procedures (MEPs) and their specification MEP Type trackMEP landmarkMEP regionMEP frameMEP stopMEP
4 4.1
Required Specification track-segment(s) identification landmark identification
Optional Specification movement speed region identification
frame-segment identification
Determination and Evaluation of a Set of Assisting Utterances Message Classes
An important part of developing a language-based system is to determine the information that is to be expressed by the system. As an empirical basis, a corpus
336
K. Lohmann, C. Eschenbach, and C. Habel
of assisting utterances produced by humans in the role of assistants was collected to identify the types of utterances. In this study, a blind-folded map explorer perceived the map using the haptic interface. The assisting humans saw a visualization of the interface point moving on a visualization of the tactile map. The maps had a frame to prevent the user of the map from unintentionally leaving the map. On the maps, streets, buildings and trees were present. The utterances of the assisting humans were recorded and manually transcribed. An analysis was performed to determine frequently occurring informational categories, in the following called ‘message classes’ (see [24], for the term ‘message’). Messages of the classes (1.1)–(1.3) were frequently uttered for every kind of object that was touched including the frame of the map. Additionally, when tracks (i.e., streets) and potential landmark objects (i.e., trees or buildings) were touched, messages of the class (1.4) were given. (1.1) Definite identification of the object (e.g., names). (1.2) Messages expressing the spatial relation between objects (e.g., ‘left of’, ‘between’). (1.3) Messages expressing information about qualitative distance between objects (‘nearby’ or ‘next to’). (1.4) Information about extreme positions in the map (‘leftmost’, ‘rightmost’, . . . ). Messages of the following classes were given only for track objects. (2.1) Information about the extension of the track, i.e., information about what determines the ends of it. (2.2) Information about junctions between tracks. (2.3) Information about geometric relations, such as parallelism of tracks. When track objects and potential landmarks were touched, messages of the classes (3.1) and (3.2) were sometimes given. In the maps used for the collection of the corpus, there were also squares with names present. Consequently, for potential landmarks the additional message class (3.2) occurred. (3.1) Information about the location in the map, expressed in sentences such as ‘The town hall is in the upper part of the map.’ (3.2) Topological containment relations with regions represented on the map, expressed in sentences such as ‘The town hall is on the city square.’ For regions on the maps, some messages of class (4.1) and (4.2) occurred. (4.1) Information about the extension or borders of regions. (4.2) Information about the shape of regions.
Linking Spatial Haptic Perception to Linguistic Representations
337
Assisting humans generally did not distinguish between real-world objects, which were represented by the geometric objects on the map, and the geometric objects themselves. All assistances were formulated in terms of the real-world objects, the intended objects of the verbalization (cf. [27,28]). The knowledge representation reflects this fact. Consequently, on the propositional layer of the VETM that is used for the generation of verbal assistance, the real-world terms of the objects on the map are exclusively represented. 4.2
Empirical Evaluation of the Message Classes
In a recently performed experiment, we investigated whether the message classes identified in the corpus study are supportive for the spatial knowledge-acquisition process. The experiment and the results are briefly outlined in this section. Participants. Participants were 24 sighted, blindfolded speakers of German (14 males, average age 24.7 years (SD = 3.3)). The choice for a setting with blindfolded subjects rather than with blind or visually impaired subjects was made since the exploration task we currently focus on will be performed by users with basic competence in reading maps. Since blind people—in particular, congenitally blind or early blind people—are often not trained in map reading, teaching of map concepts and map-reading abilities would be an independent acquisition task to be done in a preceding phase.2 Thus, blindfolded people have been chosen for determining the message classes and their use in verbal assistance. A study with late blind people, i.e., blind people with prior competence in reading maps, or trained blind people is in preparation. Material and Method. Subjects were asked to explore virtual tactile maps under two conditions: (a) supported by simple assistance and (b) supported by extended assistance. The pre-recorded assisting utterances were given in German and started by the experimenter using a custom-built interface. The participants were not explicitly told that they did not interact with an automatic system, therefore, the experiment can be described as wizard-of-oz-like. In the simpleassistance condition, information about the names of objects in the haptic focus was given, including only utterances of the class (1.1), such as (1a). (1a) This is Abbey Road. In the extended-assistance condition, further information was given. To keep the amount of messages manageable, only the most prominent messages in the corpus were included. Messages of the classes (1.3), (3.2), (4.1), and (4.2) were excluded (there are no regions with proper names on the maps used). In the experiment we used a fixed set of projective terms (‘above’, ‘right above’, ‘right’, ‘right below’, . . . ) exploiting an extrinsic frame of reference 2
Using verbal assistance in an instruction system to teach map-reading competence, is planned as additional application of VAVETaM.
338
K. Lohmann, C. Eschenbach, and C. Habel
aligned regarding the left-right dimension with the user’s intrinsic frame of reference. The direction of the movement during exploration did not play any role in the choice of projective terms.3 The terms worked well in preliminary studies and are the most-frequent version used by human assisting agents4 , even though the tactile map was presented in the orientation of a horizontal plane. Thus, the front-back dimension of the user’s intrinsic frame of reference was aligned with the up-down dimension of the fixed frame of reference. However, corresponding uses of the chosen terms can be found in descriptions of the spatial relations among text and figures on a sheet of paper. The preference of projective terms over cardinal-direction terms (i.e., ‘north’, ‘east’, . . . ) has two sources: (1) In a preliminary study some subjects had major problems with understanding the east-west reference and (2) if maps are not aligned to the north, making use of the cardinal direction terms is misleading to a high degree. The flexibility of the planned system would decrease strongly by that choice. Nevertheless, we trained the participants for the experiment and this training included utterances that made use of the frame of reference described. (2a) to (2d) are examples for English translations of assisting utterances given for a track element. These utterances correspond to the message classes (1.2), (2.1), and (2.3). Similar sets according to the message classes identified in the corpus study were created for all objects on the map. (2a) Abbey Road is parallel to Broadway. (2b) Abbey Road ends to the left at an intersection with University Street, and to the right in a dead end. (2c) Below Abbey Road, there are the town hall and the museum. (2d) Abbey Road intersects with Baker Street. The maps used are shown in Fig. 3. Two sets of German names for the streets (‘Dorfstraße’, ‘Hochstraße’, ‘Amselweg’, . . . ) and potential landmarks (‘Rathaus’, ‘Museum’, ‘Anne-Frank-Schule’, . . . ), applicable to both maps, were created and verbal utterances were recorded for all objects in both maps and each set of names. The order in which the maps were used was counterbalanced. Also, whether participants learned the map firstly with simple assistance or extended assistance and which set of names was used was counterbalanced. Procedure. First, participants were trained in general haptic interaction and in assisted map reading. Then, the subjects performed the first exploration with one of the assistance conditions. They were blindfolded and assistance was given via headphones. Subsequently, the tests described below were performed. After a ten-minute break, subjects performed the second assistance condition. 3 4
For a discussion of the use of frames of reference in descriptions of maps see Taylor and Tversky [28]. This could be an effect from the fact that the human assistant saw the visualization of the map on a vertical monitor.
Linking Spatial Haptic Perception to Linguistic Representations
339
Fig. 3. Visualizations of the tactile maps used in the experiment
Three tests were used to assess the spatial knowledge of subjects after eight minutes of assisted exploration in either condition: (1) Answering questions concerning spatial relations between objects on the map, e.g., ‘Is the oak tree left of the main station?’, (2) sketching the map, and (3) a puzzle-like recognition test consisting of reconstructing the map by recognizing correct parts of it. For the third task, a kind of puzzle was created by splitting a visual layout of the map into quadrants. For each of the quadrants, there was one correct-solution part and five wrong solutions with one or two errors. As there were four quadrants with up to two mistakes, overall, eight mistakes were possible in the puzzle task. The orientation of the puzzle parts and the quadrant they had to be placed into was given. The sketches were evaluated by two raters in two respects, namely trackknowledge and landmark-knowledge, on a 1–5 Likert-type scale. In the present paper, the results are briefly reported. A detailed discussion is in preparation. Results and Discussion. The results of the relation-question task show that participants were able to correctly answer significantly more questions in the extended-assistance condition (M = 14.04, SE = .61, t (23) = 8.08, p < .001, r = .86) than in the simple-assistance condition (M = 8.50, SE = .43) and the effect is large. In the puzzle-recognition test, participants made significantly less errors in the extended-assistance condition (M = 1.92, SE = .29) than in the simple-assistance condition (M = 3.42, SE = .27, t (23) = 3.39, p < .01, r = .58). The combined ratings of the sketches (track knowledge and landmark knowledge) show that the sketches were significantly better when the map was learned with extended assistance (M = 5.37, SE = .40, t (23) = 2.79, p = .01, r = .50) than with simple assistance (M = 4.08, SE = .26). The results show that the extended assistance provides appropriate information to ease knowledge acquisition compared to the baseline scenario in which only the names of streets and potential landmarks were given.
5
Virtual-Environment Tactile Map (VETM): Representing Knowledge for Verbalization
The generation of assisting utterances for a user who freely explores a map driven by his or her own information needs requires an internal representation of
340
K. Lohmann, C. Eschenbach, and C. Habel
the tactile map including information that can be verbalized. Correspondingly, VETM, the knowledge base specifying the structure and content of the tactile map, is a central component in the structure of the approached system and has to serve multiple purposes. On the one hand, it specifies the spatial-geometric structure to support haptic exploration. The virtual-environment character of the interface enables the user to experience the map as an analog representation of the presented area (c.f. [21]). The geometric specification is also needed to identify the exploration categories described in Sect. 3. On the other hand, the VETM has to support the representation of spatial entities that can be named and described using natural language for generating assisting utterances. The VETM is based on a hybrid representation that includes a spatialgeometric layer and a propositional layer. The propositional layer, which is in the focus of the following discussion, has to have a logical foundation as it is meant to represent the content (or the meaning) of the map. Its structure, however, needs to support accessing information about individual entities in accordance with the current haptic focus (or the development of the haptic focus over time). The propositional layer is based on an order-sorted logic. Structuring the representation according to the formalism of referential nets [6] provides the necessary basis for linking the geometric information and accessing information corresponding to the current haptic focus of the exploration. The following sections introduce the representation formalism referential nets, concluding with an example for representing (a small fraction of) a tactile map. 5.1
Conceptual Sorts
Formalisms based on order-sorted logic allow to structure the domain of the representation according to sorts of entities. The sorts can be ordered by inclusion. In VAVETaM, this tool is used to specify a conceptual grouping according to the role an entity plays within the map. The identification of an appropriate set of conceptual sorts for the map domain is crucial for two reasons. On the one hand, the sorts should support the specification of predicates and relations for the generation of natural language. On the other hand, they should support the specification of the haptic focus needed as input for the proper generation of situated verbal assistance. The maps we focus on are used for the acquisition of survey knowledge for urban areas. The conceptual sorts needed for the internal representation of the content of an urban map group entities that define a spatial setup with structures (such as streets) that enable locomotion. The conceptual sorts are independent of the part of the world the map represents. However, the conceptual sorts are not independent of the purpose of the map: A map used to communicate information about the size, position, and shape of the states of Europe requires a different set of conceptual sorts as, in this case, borders can be more relevant than streets. Figure 4 visualizes the part of the hierarchical structure of the sorts discussed in the following. The most general sort relevant for representing urban map content is called map entity. A map entity is an object in the map that can be explored haptically or that can be specified by verbal assistances. Among
Linking Spatial Haptic Perception to Linguistic Representations
341
map entity rep map entity track segment
track configuration
landmark rep region
(...) nrep map entity nrep region frame segment
track
frame Fig. 4. Part of the sort hierarchy for the urban map domain
the map entities, there are two main subgroups: map entities that represent an object present in the relevant fragment of the world (rep map entity) and non-representational map entities, such as the frame of the map, that do not represent such an entity (nrep map entity). Relevant groupings of representing map entities are those representing paths of locomotion such as streets (track, track segment), those representing any kind of junction of such paths (track configuration), those representing potential landmarks such as buildings or trees (landmark), and those representing (named) areal objects such as squares or public parks (rep region). Among the non-representational map entities there are the frame and its parts (frame, frame segment) and regions (nrep region) defined relative to the frame of reference used in spatial descriptions (such as ‘the upper part of the map’). A track is a structure enabling locomotion (cf. [7]; a similar notion is the term ‘passage’, as introduced by Werner and colleagues [35]). Usually, several track segments are part of a track. For example, dead ends, landmarks, and junctions can induce track segments between them and around them, which are explicitly represented if information for verbalization relating to these parts is included. Tracks are treated as a sub-group of track segments as a simple track might consist of only one segment and, in this case, it is not justified to introduce two distinct objects into the representation. Furthermore, track segments might be complex, consisting of several smaller segments, just as tracks. Tracks are distinguished from other track segments, as they carry names and, therefore, reference to tracks is rather simple. In analogy, the map frame is treated as a special case of frame segment. As the haptic maps are bordered by a rectangular frame, the four sides are represented as four frame segments. 5.2
Referential Objects
A referential net consists of a set of interrelated referential objects that represent the entities of the domain of discourse [6]. As the map entities serve as anchors for grounding the verbal assistance in the multimodal setting described, the objectcentered formalism is particularly well-fitted as a representational framework.
342
K. Lohmann, C. Eschenbach, and C. Habel
In Fig. 5, plm1 is the identifier of the referential object representing a house with the name ‘Mountain View’, which is above another entity represented by the identifier pt1. As can be seen in Fig. 5, lines connect the identifier of the referential object with expressions on both sides. The expressions on the left are attributes and the expressions to the right are designators. landmark geometry is(...) (...)
plm1
Mountain View ηx building(x) ηx is above(x, pt1) (...)
Fig. 5. Referential object representing a house called ‘Mountain View’
Designators are terms of the underlying order-sorted logic. Complex designators are descriptions (ηx building(x), ηx is above(x, pt1)), whereas atomic designators are names (Mountain View). Descriptions can be constructed by using a description operator. In the present paper, only the description operators ι and η are used. Basically, the description operators indicate the definiteness of the description: ιx building(x) could be verbalized as ‘the building’, whereas ηx building(x) could be verbalized as ‘a building’. The sort of a referential object is specified by using the sort symbol as an attribute. Compare [6] for a formal specification of the referential-net approach and [5] for a more detailed discussion of the use of referential nets as representation formalism for natural language generation systems. 5.3
Additional Attributes
In addition to the sort attributes, the referential-net formalism allows an assignment of further attributes to a referential object. For the present purpose, two additional attributes are necessary to specify the mereological relation between different referential objects. The part of attribute is used to express that a map entity is part of another map entity, as is the case for frame segments, which are parts of the complete frame. The parts attribute is used to express that a map entity has parts, such as the frame has the frame segments as parts. Furthermore, the attribute geometry is is used to link the propositional layer to the spatial-geometric representation. 5.4
Constant Symbols, Predicate Symbols, and Relation Symbols
The logical language used for building designators provide the usual inventory of non-logical symbols. As the language is sorted, each symbol is assigned a specification regarding both logical type (arity) and sorting. Constant symbols represent names of objects. They are included to support the identification of objects corresponding to the message class (1.1). The signature underlying the logical language maps each constant symbol to the sort of
Linking Spatial Haptic Perception to Linguistic Representations
343
object it might name. Typically, names are assigned to streets, buildings, parks, or squares. Constant symbols corresponding to such names are mapped to the sorts track, landmark, or rep region. (Unary) predicate symbols represent properties of the represented entities. As the representing map entities inherit the properties of the real-word object they represent, several of the predicate symbols stand for properties of the underlying real-world object that are expressed by natural language nouns. Nominal predicates can support messages of the class definite identification (1.1) and are additionally needed to generate messages of the class (2.1) and (2.2), which specify the ends and junctions of tracks, and (3.2) specifying regions like squares and parks. Nominal predicates allow a finer grained specification of map entities. Thus, in the case of urban maps, predicate symbols such as building, school building, town hall, and church can be included and be mapped to the sort landmark (cf. Table 2). Table 2. (Nominal) predicate symbols for urban maps and the corresponding sorts Predicate Symbols street dead end building, school building, town hall, church, tree junction, t junction, corner region, square, park map frame left frame, top frame, right frame, bottom frame
Sort track track segment landmark track configuration rep region frame frame segment
Some of the predicate symbols listed in Table 2 relate to a geometric specification. For example, whether a track segment is a dead end can be determined based on the specification of the tracks and their geometry. Correspondingly, we assume that descriptions based on spatial predicates can be determined from the geometric specification of the map in a pre-processing step and do not need to be computed at the time when the assisting utterances should be given. To support the acquisition of survey knowledge, spatial information should be verbalized, as discussed in Sect. 4.1. Correspondingly, further predicate and relation symbols carry mainly spatial meaning (see Table 3). Also in this case, we assume that the descriptions using these operators can be derived in a preprocessing step from the geometric specification of the map.
6
An Example for Selecting Information for Verbalization
In the following, we present an example of a fragment of a map (see Fig. 6) represented as a referential net using the logical language described above to show how this representation supports the generation of assistances for the acquisition of survey knowledge (see Fig. 7). The labels in the map specify the identifier of
344
K. Lohmann, C. Eschenbach, and C. Habel
Table 3. Spatial predicate and relation symbols, sort frames, meaning, and supported message class (MC) Symbol is above, is left, (. . . ), is left above, (. . . ) is between is leftmost, is topmost, (. . . )
Sort Frame (map entity, map entity)
(map entity, map entity, map entity) (map entity)
has left limit, has upper limit, (. . . )
(track segment, map entity)
is left part, is upper left part, (. . . )
(track segment, track segment)
is in track config
(track segment, track configuration) is parallel to (track segment, track segment) is located in nrepr (map entity, nrep region)
is located in repr
(map entity, rep region)
Meaning A map entity is above, is left of, (. . . ), is left above, (. . . ) another map entity. A map entity is between two map entities. A map entity is the leftmost, topmost, (. . . ) map entity in the map. The left, upper, (. . . ) end of a track segment is connected to another map entity. A track segment is the left, upper left, (. . . ) part of another track segment. A track segment is part of a track configuration. Two track segments are parallel. A map entity is located in a nonrepresentational map region. A map entity is located in a representational map region.
MC (1.2)
(1.2) (1.4)
(2.1)
(2.1)
(2.2) (2.3) (3.1)
(3.2)
the referential objects representing the map entities in the propositional map representation. The arrow indicates the movement of the user. We focus on the generation of the utterance (3), which gives information of the types definite identification (1.1) and extension (2.1) of the explored track. 5 (3) This is Abbey Road, which is restricted to the left by the map frame and forms a dead end to the right. The task of the GVA component (see Fig. 1) is to select information from the VETM knowledge base and form a preverbal message that specifies the content of the natural language expression to be generated. The preverbal message is the 5
For the sake of brevity, Fig. 6 and the example referential net in Fig. 7 include little more than the information necessary to generate assistance (3). E.g., the parts of the frame, further track segments, regions, and the intersection are skipped.
Linking Spatial Haptic Perception to Linguistic Representations
345
Fig. 6. Example map exploration
input to the component called ‘Formulator and Articulator’, which produces a corresponding natural language expression and vocalizes it. We assume that the analysis of the exploration process by the MEP Observer results in the information that a trackMEP is performed on track pt1, which serves as input to the GVA component. The selection of information is mainly guided by the haptic focus. Correspondingly, the designators of pt1 are exploited. The selection of the information has to be controlled by a discourse plan for assisted tactile map reading in combination with the knowledge about what has been said already (and how often).6 The discourse plan underlying (3) favors definite-identification massages and, in the case of tracks, puts information about the extension of tracks before information about junctions and spatial relations to other map entities. Similarly, in the study described in Sect. 4, a definite-identification message was the first message selected for a map entity entering the haptic focus. Thus, the GVA component should be able to generate a definite-identification message for the object in the haptic focus. This is possible by using an individual name, such as in ‘This is Abbey Road.’, or by expressing another definite description. E.g., the landmark represented by plm1 can be identified by ‘This is the town hall.’, ‘This is the topmost building in the map.’, or ‘This is the building above Abbey Road.’ There are several reasons to prefer the selection of a proper name as first definite-identification message. E.g., the verbalization of the name is the only means to inform the acquiring agent about the name of a map entity. Also, after having introduced the name, later messages can refer back to this map entity using the name. Correspondingly the designator Abbey Road of pt1 is selected first. The next step in the discourse plan is to give information about the extension of pt1. Hence, the GVA component selects the descriptions ηx has left limit(x, pfr) and ηx is right part(pts11, x). As these descriptions relate to the referential objects of the map frame (pfr) and a track segment (pts11), designators have to be chosen that allow reference to these entities. In the first case, the descriptor ιx map frame(x) allows to form a definite description (‘the map frame’). In the second case, the descriptor ηx dead end(x) is the basis to form a simple noun phrase that carries configurational information (‘a dead end’). The preverbal message produced by the GVA component can be represented as in Fig. 8. 6
For an overview of discourse planning in natural language generation see [24].
346
K. Lohmann, C. Eschenbach, and C. Habel landmark geometry is(...)
track geometry is(...) parts([pts11, pts12])
track segment geometry is(...) part of(pt1) frame geometry is(...) parts([pfr1, pfr2, pfr3, pfr4])
plm1
pt1
ιx town hall(x) ηx building(x) ιx [building(x) ∧ topmost(x)] ηx is above(x, pt1) ηx is below(pt1, x) ιx [building(x) ∧ is above(x, pt1)] (...) Abbey Road ηx street(x) ηx has left limit(x, pfr) ηx is right part(pts11, x) ηx is below(x, plm1) ηx is above(plm1, x) (...)
pts11
ηx dead end(x) ηx is right part(x, pt1) (...)
pfr
ιx map frame(x) ηx has left limit(pt1, x) (...)
(...)
Fig. 7. Part of a referential net representing the content of the map of Fig. 6
track
pt1
Abbey Road ηx [has left limit(x, ιy map frame(y)) ∧ is right part(ηy dead end(y), x)]
Fig. 8. Referential object representing the preverbal message underlying (3)
7
Summary
Survey knowledge, which is important to solve everyday way-finding tasks, is most efficiently acquired using maps. Tactile maps are perceivable by blind and visually impaired people. By adding assisting utterances linked to the exploration activities of the user, several limitations of tactile maps (resulting from the sequentiality of their exploration and restrictions in adding labels for names or categories) can be overcome. A goal of the development of the VAVETaM system is to generate situated assisting utterances for the free exploration of virtual-environment tactile maps of urban areas presented with haptic humancomputer interfaces. Based on a corpus study, we identified a set of message classes that are systematically used by humans that have the role of assisting agent in such an haptic exploration scenario. The classified messages provide information for identifying map entities (e.g., names) or specify different types of spatial information. A first
Linking Spatial Haptic Perception to Linguistic Representations
347
experiment on the usefulness of providing spatial information through verbalization confirms the assumption that such utterances contribute to the acquisition of survey knowledge for users of tactile maps. To automatize the generation of situated assisting utterances, a structured representation that (a) underlies the presentation of the map with the haptic device, (b) enables the detection of the movements the user performs in relation to the entities on the map, and (c) provides semantic information supporting the verbalization is needed. A hybrid representation including a spatial-geometric and a propositional layer is well-fitted for this task. To support the selection of information based on the entities in the haptic focus of the user’s exploration, we use the object-centered representation formalism of referential nets that also supports linking the propositional layer to a spatial-geometric layer. We present representations for the content of the messages use for assisting utterances and give an example for generating preverbal messages based on referential nets.
References 1. De Felice, F., Renna, F., Attolico, G., Distante, A.: A haptic/acoustic application to allow blind the access to spatial information. In: World Haptics Conference, pp. 310–315 (2007) 2. Espinosa, M.A., Ungar, S., Ochaita, E., Blades, M., Spencer, C.: Comparing methods for introducing blind and visually impaired people to unfamiliar urban environments. Journal of Environmental Psychology 18, 277–287 (1998) 3. Giudice, N., Bakdash, J., Legge, G., Roy, R.: Spatial learning and navigation using a virtual verbal display. ACM Transactions on Applied Perception 7(1) (Article 3), 3:1–3:33 (2010) 4. Golledge, R.G., Rice, M.T., Jacobson, R.D.: Multimodal interfaces for representing and accessing geospatial information. In: Rana, S., Sharma, J. (eds.) Frontiers of Geographic Information Technology, pp. 181–208. Springer, Heidelberg (2006) 5. Guhe, M., Habel, C., Tschander, L.: Incremental generation of interconnected preverbal messages. In: Pechmann, T., Habel, C. (eds.) Multidisciplinary Approaches to Language Production. Trends in Linguistics, pp. 7–52. De Gruyter, Berlin (2004) 6. Habel, C.: Prinzipien der Referentialit¨ at. Springer, Heidelberg (1986) 7. Habel, C.: Incremental generation of multimodal route instructions. In: Papers from the 2003 AAAI Spring Symposium on Natural Language Generation in Spoken and Written Dialogue, Stanford, CA, pp. 44–51 (2003) 8. Habel, C., Kerzel, M., Lohmann, K.: Verbal assistance in tactile-map explorations: A case for visual representations and reasoning. In: Proceedings of AAAI Workshop on Visual Representations and Reasoning 2010, Atlanta, GA (2010) 9. Jacobson, R.D.: Navigating maps with little or no sight: An audio-tactile approach. In: Proceedings of the Workshop on Content Visualization and Intermedia Representations (CVIR), Montreal, pp. 95–102 (1998) 10. Jacobson, R.D.: Representing spatial information through multimodal interfaces. In: Sixth International Conference on Information Visualisation, IV 2002 (2002) 11. Kostopoulos, K., Moustakas, K., Tzovaras, D., Nikolakis, G.: Haptic access to conventional 2D maps for the visually impaired. Journal on Multimodal User Interfaces 1(2), 13–19 (2007)
348
K. Lohmann, C. Eschenbach, and C. Habel
12. Krueger, M.W., Gilden, D.: KnowWhere: an audio/spatial interface for blind people. In: Ballas, J. (ed.) Proceeding of the International Conference on Auditory Display (ICAD). International Community for Auditory Display, Palo Alto (1997) 13. Lederman, S.J., Klatzky, R.L.: Haptic perception: A tutorial. Attention, Perception, and Psychophysics 71(7), 1439–1459 (2009) 14. Levelt, W.: Speaking: From intention to articulation. The MIT Press, Cambridge (1989) 15. Lobben, A.K.: Tasks, strategies, and cognitive processes associated with navigational map reading: A review perspective. The Professional Geographer 56(2), 270– 281 (2004) 16. Lohmann, K., Kerzel, M., Habel, C.: Generating verbal assistance for tactile-map explorations. In: van der Sluis, I., Bergmann, K., van Hooijdonk, C., Theune, M. (eds.) Proceedings of the 3rd Workshop on Multimodal Output Generation 2010, Dublin (2010) 17. Loomis, J.M., Klatzky, R.L., Lederman, S.J.: Similarity of tactual and visual picture recognition with limited field of view. Perception 20(2), 167–177 (1991) 18. Magnusson, C., Rassmus-Gr¨ ohn, K.: A dynamic haptic-audio traffic environment. In: Proc. of Eurohaptics, pp. 5–7 (2004) 19. Miele, J.A., Landau, S., Gilden, D.: Talking TMAP: Automated generation of audio-tactile maps using Smith-Kettlewell’s TMAP software. British Journal of Visual Impairment 24(2), 93–100 (2006) 20. Mou, W., McNamara, T.P.: Intrinsic frames of reference in spatial memory. Journal of Experimental Psychology: Learning, Memory, and Cognition 28(1), 162–170 (2002) 21. Palmer, S.E.: Fundamental aspects of cognitive representation. In: Rosch, E., Lloyd, B.B. (eds.) Cognition and Categorization, pp. 259–303. Lawrence Erlbaum, Hillsdale (1978) 22. Parente, P., Bishop, G.: BATS: the blind audio tactile mapping system. In: Proceedings of the ACM Southeast Regional Conference, Savannah, GA (2003) 23. Parkes, D.: Audio tactile systems for designing and learning complex environments as a vision impaired person: static and dynamic spatial information access. In: James, S., Hedberg, J. (eds.) Learning Environment Technology: Selected Papers from LETA 1994, vol. 94, pp. 219–223. AJET Publications, Canberra (1994) 24. Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000) 25. Roy, D., Reiter, E.: Connecting language to the world. Artificial Intelligence 167(12), 1–12 (2005) 26. Siegel, A.W., White, S.H.: The development of spatial representations of large-scale environments. In: Reese, H. (ed.) Advances in Child Development and Behavior, vol. 10, pp. 9–55. Academic Press, New York (1975) 27. Tappe, H., Habel, C.: Verbalization of dynamic sketch maps: Layers of representation and their interaction. In: Proc. of 20th Ann. Meeting of the Cog. Sci. Society (CogSci-1998), Madison, WI (1998) 28. Taylor, H.A., Tversky, B.: Perspective in spatial descriptions. Journal of Memory and Language 35(3), 371–391 (1996) 29. Tversky, B.: Cognitive maps, cognitive collages, and spatial mental models. In: Campari, I., Frank, A.U. (eds.) COSIT 1993. LNCS, vol. 716, pp. 14–24. Springer, Heidelberg (1993) 30. Ungar, S.: Cognitive mapping without visual experience. In: Kitchin, R., Freundschuh, S. (eds.) Cognitive Mapping: Past, Present and Future, pp. 221–248. Routledge, London (2000)
Linking Spatial Haptic Perception to Linguistic Representations
349
31. Ungar, S., Blades, M., Spencer, C.: The role of tactile maps in mobility training. British Journal of Visual Impairment 11(2), 59–61 (1993) 32. Waller, D.: Individual differences in spatial learning from computer-simulated environments. Journal of Experimental Psychology: Applied 6(4), 307–321 (2000) 33. Wang, R.: Spatial processing and view-dependent representations. In: Mast, F., J¨ ancke, L. (eds.) Spatial Processing in Navigation, Imagery and Perception, pp. 49–65. Springer, Heidelberg (2007) 34. Wang, Z., Li, B., Hedgpeth, T., Haven, T.: Instant tactile-audio map: enabling access to digital maps for people with visual impairment. In: Proceeding of the 11th International ACM SIGACCESS Conference on Computers and Accessibility, Pittsburg, PA, pp. 43–50 (2009) 35. Werner, S., Krieg-Br¨ uckner, B., Mallot, H.A., Schweizer, K., Freksa, C.: Spatial cognition: The role of landmark, route, and survey knowledge in human and robot navigation. In: Jarke, M., Pasedach, K., Pohl, K. (eds.) Informatik 1997, pp. 41–50. Springer, Berlin (1997) 36. Winter, S., Tomko, M., Elias, B., Sester, M.: Landmark hierarchies in context. Environment and Planning B: Planning and Design 35(3), 381–398 (2008) 37. Zeng, L., Weber, G.: Audio-haptic browser for a geographical information system. In: Miesenberger, K., Zagler, W., Karschmer, A. (eds.) Computers Helping People with Special Needs, Part II, pp. 466–473 (2010)
Analyzing the Spatial-Semantic Interaction of Points of Interest in Volunteered Geographic Information Christoph Mülligann1 , Krzysztof Janowicz2, Mao Ye3 , and Wang-Chien Lee3 1
2
Institute for Geoinformatics, University of Münster, Germany [email protected] Department of Geography, University of California, Santa Barbara, USA [email protected] 3 Department of Computer Science and Engineering, Pennsylvania State University, USA {mxy177,wlee}@cse.psu.edu
Abstract. With the increasing success and commercial integration of Volunteered Geographic Information (VGI), the focus shifts away from coverage to data quality and homogeneity. Within the last years, several studies have been published analyzing the positional accuracy of features, completeness of specific attributes, or the topological consistency of line and polygon features. However, most of these studies do not take geographic feature types into account. This is for two reasons. First, and in contrast to street networks, choosing a reference set is difficult. Second, we lack the measures to quantify the degree of feature type miscategorization. In this work, we present a methodology to analyze the spatial-semantic interaction of point features in Volunteered Geographic Information. Feature types in VGI can be considered special in both, the way they are formed and the way they are applied. Given that they reflect community agreement more accurately than top-down approaches, we argue that they should be used as the primary basis for assessing spatial-semantic interaction. We present a case study on a spatial and semantic subset of OpenStreetMap, and introduce a novel semantic similarity measure based on the change history of OpenStreetMap elements. Our results set the stage for systems that assist VGI contributors in suggesting the types of new features, cleaning up existing data, and integrating data from different sources.
1
Introduction
The rise of Volunteered Geographic Information (VGI) as coined by Goodchild[1] is closely tied to projects such as OpenStreetMap (OSM)1 or Wikimapia2 . These projects provide open platforms for volunteers to contribute geographic data and make them accessible for others under an open license. Volunteered information 1 2
http://www.openstreetmap.org http://wikimapia.org/
M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 350–370, 2011. c Springer-Verlag Berlin Heidelberg 2011
Analyzing the Spatial-Semantic Interaction of POI in VGI
351
is acquired and maintained in a different style compared to data provided by professional authorities. Instead of being defined in a top-down manner, geographic feature types in OSM are the result of informal and continuous discussions within the community3 . Consequently, contributors assign different category tags to features of similar types depending on their local VGI community, previous experience, used software, personal cognition of geographic space and changes of the OSM typing schema. Due to these factors, tags representing feature types change frequently. With the increasing success of VGI and its integration with projects such as Wikipedia or even commercial products, quality control becomes equally important to mere coverage. Several researchers have studied the quality of Volunteered Geographic Information over the last years [2,3,4]. Tools assisting users in constraint checking, attribute enrichment, or the cleaning of large data sets become more important [5,6,7]. However, most studies do not take geographic feature types into account. In contrast to assessing data quality based on street networks or buildings, choosing reference data for feature types is difficult. Commercial routing data sets can be used to discover missing, dis-placed, or attributeincomplete streets. Similarly, aerial photography or topographic maps can be used as references for features such as buildings or water bodies. There is no such gold standard for Points Of Interest (POI) feature types such as Restaurant, Pub, or Theater. Arguing that a feature tagged as Pub is mis-categorized because it is specified as Bar in a commercial data set is troublesome. This problem is not specific to VGI but a long term challenge in harvesting data across multiple gazetteers. Geo-ontologies have been proposed to make the various typing schemata explicit. Besides reference data for comparison, analyzing the usage of feature types in VGI requires measures to quantify the degree of miscategorization. For instance, confusing pubs with bars is different from tagging a grocery store as a pub. Semantic similarity has been proposed as a measure to determine the difference between feature type definitions [8]. Analyzing the usage and implicit meaning of feature type tags is more than just an academic exercise. Intuitively, we expect a pub to be surrounded by other places that afford drinking alcohol, having a snack, or meeting friends, even though not all of these functions need to be offered by each facility. A waste basket, on the contrary, can be expected to be uniformly distributed within a commercial zoning area. The knowledge which types of features clump together and which most likely do not, can be used to improve VGI. A contributor uploading a fire station POI next to an existing one, may be automatically notified by the user interface that this type of features is not likely to clump together and asked to double check. Using similarity measures, such point patterns can also span the semantic dimension. Pubs are likely to occur next to nightclubs and cafés, but rather unlikely to be grouped around nursing homes. Discovering whether a specific feature, such as a point of interest, is already present before it gets duplicated by another contributor would free the resources of many volunteered editors that constantly work on cleaning up OpenStreetMap 3
http://wiki.openstreetmap.org/wiki/Map_Features
352
C. Mülligann et al.
data. Similarly, these editors also change category tags to make them match the latest community agreements or ensure that taxonomies are not confused with partonomies. Such assistant tools and rule systems have been frequently described as the next step in understanding and making use of VGI [4,9,10]. To assist users by suggesting the most likely feature type tags, or notify them if a similar feature already exists in the vicinity, requires the understanding of the spatial as well as the semantic patterns in OSM. In this work, we set the theoretical ground for developing such assistant tools. We present a spatial analysis methodology that identifies spatial-semantic patterns in OSM data and highlight how our approach can be used for tag recommendation and data cleaning. In contrast to existing work on geospatial semantics, we do not require a topdown ontology of geographic feature types, but derive feature type similarity bottom-up from the change history of existing OSM data. The remainder of this paper is structured as follows. In section 2 we review statistics and measures that underlie our approach. Section 3 describes the development of concept variograms and spatial-semantic point pattern analysis. Both methods require semantic similarity values between feature types. The procedure of deriving these similarities in a bottom-up fashion from OSM data is explained in section 4. After giving an overview of the data set used for the case study (section 5) we describe the results (section 6) and discuss their implications (section 7). Finally, in section 8, we conclude by summarizing our work and point out directions for further work.
2
Related Work
This section introduces the statistical underpinning for our spatial-semantic interaction methodology and points to related work on semantic similarity measurement relevant for the understanding of our research. The geostatistics used in our methodology are well-established within the field of Geographic Information Science. In case of VGI, however, we lack this kind of well-established approaches. Kuhn [11] uses the hot water metaphor to picture the urgent need for models specialized on VGI. VGI represents a catalyst, or hot water to GIScience once it is well understood and handled. Both directions of research, the understanding and the handling of VGI need to be integrated. This work is meant to be part of that integration task. In contrast to a social [12] or producer -centered [13] view on the topic, we aim at forming a computational basis for the interpretation of VGI datasets. 2.1
Spatial Analysis
Variograms. plot the expected difference between the values measured at two different locations versus their spatial distance. It is applied to describe the spatial dependency of continuous processes. Ahlqvist and Shortridge [14] argue that such a process can also underlie categorical variables like land use classes and introduce semantic variograms to analyze landscape heterogeneity. To do
Analyzing the Spatial-Semantic Interaction of POI in VGI
353
so, they replace the differences between observed numerical variables by a lookup table containing semantic similarities (usually values within the range [0, 1]) for each pair of categorical land uses values. The semantic semivariance is then defined as a function of distance (or lag) h; see equation 1: γSD (h) =
N (h) 1 2 sd [c (uα ) ; c (uα + h)] 2N (h) α=1
(1)
where N is the number of location pairs separated by spatial lag h, while sd [c (uα ) ; c (uα + h)] is the semantic distance between the categorical land use value of points uα and uα + h from the look-up table. Even though point data as well as grids may underlie the computation of variograms, they model fields and not point patterns. Variograms are used to interpolate values between measurement points, e.g. using Kriging [15], whereas the distribution of the measurement points as such is not targeted. Point Pattern Analysis aims to reveal whether points in a study area are, e.g., clumped, randomly, or regularly distributed. A popular model for a stationary spatial point process is Ripley’s K [16]; see equation 2: K (s) = λ−1 E
(2)
where E is the number of occurrences within distance s of an arbitrary point and λ is the the intensity, i.e., the expected number of points, overall density, or average occurrence rate respectively [17]. The function is monotonically nondecreasing [17]. Hence the minimal increase of 0 between two distances s1 and s2 means that no additional points are expected when increasing the radius by s2 − s1 with regard to an arbitrary point. Accordingly a strong increase up to a distance sx indicates a clustering within this radius. There are superimposed functions that ease visual interpretation of thresholds like sx (cp. the L index or the linearized version of Ripley’s K respectively [18]). However, for a comparison of expected versus observed occurrences this additional step is not necessary. To our best knowledge, there is no existing methodology to account for semantic aspects in addition to the spatial distribution of point patterns. However, Diggle et al. [19] introduced a second-moment spatio-temporal measure for point-processes in which the spatio-temporal occurrence is called an event; see equation 3: K (s, t) = λ−1 E (3) where E is the number of events occurring within distance s and time t of an arbitrary event, and λ is the intensity, i.e., the expected number of events per unit space per unit time. Following the spatial definition of Ripley’s K, the estimator (s, t) [19] can be computed from existing data by equation 4: K (s, t) = |A| T (n (n − 1))−1 K
j=i
wij vij I (dij ≤ s) I (uij ≤ t)
(4)
354
C. Mülligann et al.
where |A| is the area of a polygon enclosing the spatial domain of interest, T the analogous temporal interval. n is the number of points, I the indicator function and dij and uij are the spatial and temporal differences. wij ,vij are weights applied for the correction of edge-effects. As diagnostic measure for the actual strength of K (s, t), Diggle et al. propose the functions 5 and 6. (s, t) = K (s, t) − K (s) K (t) D (5) 0 (s, t) = D
(s, t) D (s) K (t) K
(6)
(s) and K (t) are the independent spatial and temporal components of the K describes the absolute difference between the spatiounderlying point process. D 0 its temporal and an assumed independent spatial and temporal process, D magnitude with regard to an expected number of occurrences in a spatial- and temporal-only process. Space-time interaction is therefore described as a spaceand time-dependent factor that measures the influence of the combined point process versus the independent point processes. 2.2
Semantic Similarity
Due to their analogy to spatial proximity functions, semantic similarity measures have been widely studied and applied in GIScience [20,21,14,22,8,23]. Most of these measures are hybrid in a sense that they combine different approaches to similarity, such as features, regions in a multi-dimensional space, or network distances. However, these approaches rely on existing ontologies or scene graphs for comparison. The OSM data set discussed in this work lacks a formal specification of feature types. It also does not support multiple types per feature which excludes classical bag of words approaches. In contrast to such set-theoretic approaches, Eck et al. [24] identified probabilistic measures, the association strength in particular, as adequate for normalization purposes because it measures the deviation of observed from expected co-occurrences. Set-theoretic measures like the inclusion index or the Jaccard index return the relative overlap of two sets, still being prone to the absolute number of tags in each of the sets [24]. The association strength, proximity index, or probabilistic affinity index, respectively [25,26,27], is the ratio of the observed number of co-occurrences cij and the expected number of co-occurrences eij between tags i and j; see equation 7: si sj eij = (7) m si and sj are the total numbers of occurrences for each, i and j, and m is the total number of documents or bag of words respectively. For cij simAS = (8) eij greater than 1 the number of co-occurrences is higher than expected for assumed statistical independency, lower otherwise.
Analyzing the Spatial-Semantic Interaction of POI in VGI
3
355
Two Models for Spatial-Semantic Interaction
In this section we explain the changes applied to semantic variograms and the di 0 related to spatio-temporal point processes. While semantic agnostic measure D variograms reflect a field view on geographic space where point features are only considered measurement locations for a spatial process such as land cover, Diggle’s diagnostics are based on the model of a point process where the distribution of occurrences in two dimensions is the only relevant information. Semantic variograms have been used to study land cover grids so far. The combination of semantics with variogram is reasonable in this case because land cover is present at any location. Despite the limited amount of classes, the underlying process can be considered continuous. When applying variograms to points of interest, those properties are not given anymore. Given a subset of POI, e.g. amenities, most of the space in between would be void or of no relevance to amenities. Even more, the amenities themselves are not modeled as two-dimensional features. Variograms by nature do not allow for these kinds of situations, because the process to be modeled is considered ubiquitous. Nevertheless, beyond those theoretical reservations, the computation and careful interpretation of variograms is possible and straightforward compared to the 0 statistic. Investigating both approaches gives us the chance to understand D their limitations and which patterns they help to uncover. Fig. 1 shows the basic steps that precede the computation of our spatialsemantic interaction models. Concept variograms and second-order analysis both require spatial and semantic distances as input data. In our work, the semantic distance is derived from a similarity matrix which also defines the POI to be selected for analysis. All computations were performed by the statistical language R4 . In particular we modified functions from the gstat 5 and the splancs 6 packages.
Fig. 1. Workflow of the spatial-semantic interaction analysis 4 5 6
http://www.r-project.org/ http://www.gstat.org/ http://www.maths.lancs.ac.uk/∼rowlings/Splancs/
356
3.1
C. Mülligann et al.
Concept Variograms
For the characterization of a certain concept ck , or categorical value, respectively, we aim at extracting only those spatial-semantic relationships that are relevant for ck . We achieve this by applying the following change to the semantic variogram definition (cp. section 2): ck γSD (h) =
N (h,ck ) 1 2 sd [c (uα ) ; c (uα + h)] 2N (h, ck ) α=1
(9)
where N (h, ck ) is the number of point pairs separated by spatial length h and fulfilling the condition c (pi ) ∨ c (pj ) = ck for each point pair (pi , pj ). Therefore ck γSD (h) can be either considered a semantic-enabled version of Goovaerts’ indicator variograms [28] or a restricted form of the semantic variogram definition by Ahlqvist et al [14]. 3.2
Second-Order Analysis of Spatial-Semantic Clustering
The modifications applied to the second-moment spatio-temporal measure for point-processes by Diggle et al.[19] consist of replacing the temporal by a semantic component as well as restricting the point-pairs contributing to K to those that at least have one point with value sk . We also changed the notion of E to the more neutral term occurrence since event only applies to the temporal domain. The altered K-function is then given by: ck KSD (s, sd) = λ−1 Eck
(10)
where SD is the semantic distance matrix in use, sd the semantic distance, ck the selected concept and Eck the number of further occurrences within spatial distance s and semantic distance sd with respect to occurrences having the categorical value ck . Accordingly, we propose the following as an estimator for function (10): ck (s, sd) = |A| SDrange (nc (n − 1))−1 K k SD wij I (dij ≤ s) I (sd [c (pi ) ; c (pj )] ≤ sd) I (c (pi ) ∨ c (pj ) = ck ) (11) j=i
where SDrange is the range of semantic similarity values in the look-up table and nck is the number of points with c (pi ) = ck . Note that no correction of edge-effects (s, t)) is applied to the categorical values due to a lack of metrical spaces (vij in K for this kind of pair-wise distances. Hence, the notion of an edge is not meaningful here. In that regard another modification has to be applied. In contrast to time, semantic distance has a fixed range, usually values from [0, 1]. Since we want to 0 statistic, the case of POI having a examine the whole semantic range of the D maximum dissimilarity of 1 needs to be addressed separately. Reaching this value, ck (sd) and consequently D 0 becomes 0. Thereby, all remaining POI are added to K SD
Analyzing the Spatial-Semantic Interaction of POI in VGI
357
those values within the last similarity interval would be ignored automatically. The traditional approach does not face such issue because the spatial or temporal dimension is seldom captured up to the maximum of the dataset. For semantic distances, however, we define SDrange as an open interval [0, 1). 0 statistic reflects the underlying To get a better understanding of how the D spatial-semantic interaction, we created different simulated point patterns based on a simple four-step similarity scale for a test concept ck . Out of those we show two patterns and their corresponding plots in fig. 2. In the following they will be called pattern Pattern A (upper) and pattern Pattern B (lower).
0 plots for two different spatial-semantic patterns. Light red dots indicate Fig. 2. D instances of the concept ck to be examined.
Pattern A represents perfect spatial-semantic autocorrelation - perfect in the sense that spatial distance to instances of ck (light red dots) is always relative to the semantic distance. Since semantic similarity decreases in both spatial dimensions, outgoing from the four ck POI, we observe a quadratic drop in the 0 values in the two-dimensional 0 plot. In the following, we focus on the high D D
358
C. Mülligann et al.
(spatial-semantic) interval of [0, 0], [1, 0.3], i.e., the values within spatial distance 0 - 1 and semantic distance 0 - 0.3. The expected number of co-occurrences in this interval is low. One the one hand, there is no spatial clustering around ck POI; the distribution of all POIs is regular. On the other hand, the fraction of POIs of type ck is only 4%. In other words, in a random distribution of the very same set of points it would be quite unlikely that ck POI would appear right next to each other. However, in the observed distribution ck only occurs as a 0 plots shows that this phenomenon is 20 times more likely single cluster. The D to be characteristic than a random distribution. However, as soon as we extent the spatial-semantic interval in either dimension this factor decreases: finding ck POI within a special distance of 2 is more probable than within distance 1, and finding ck as well as similar POI (dissimilarity 0.33) is more probable than only ck POI within distance 1. Pattern B is a counter-example for spatial autocorrelation. First, ck POI do not form spatial clusters themselves, second the surrounding POI are mostly 0 has negative values in the spatial and semantic proximity dissimilar. Hence, D of ck POI, i.e., the independent spatial and semantic clustering is stronger than the combined point process. There are two clusters where medium similarity 0 is highest for POI with medium appear only next to a ck POI. Therefore, D similarity. This behavior is important for our analysis with regard to VGI and Geographic Information in general. The total occurrences of feature types in a geo-dataset may vary for several reasons, e.g., different interests of voluntary mappers, heterogeneous coverage, or ground truth. The interaction signature of a geographic feature type, however, should not be biased by the number of instance occurrences because these do not play a role on the conceptual level. Pattern B shows 0 statistic accounts for the that in the domain of a particular geo-dataset the D diagnosticity of feature types, i.e., features within a certain similarity range, here 0.3 to 0.6. 0 plots would look exactly the same except from the Finally, note that the D last semantic interval dropping to 0 if POI with dissimilarity 1, i.e., no similarity, would have been incorporated. By not doing so, we are able to visualize the semantic interval (0.9, 1.0) and show that no change occurs there compared to the interval (0.7, 0.9]. Points of interest with no similarity still have strong effect 0 though, because they affect interaction trends of the spatial axis through on D K (s) (cp. eq. 5 and 6).
4
Deriving Similarity from the OpenStreetMap History
Introducing semantics into geostatistical models is a key contribution of this work. However, we cannot derive similarity values from formal feature types as these do not exist for VGI and, therefore, have to assess pair-wise similarities between each type. Our case-study is restricted to a particular subset of elements in OSM, namely those that have a key called amenity. A key in OSM can be considered a superconcept, its values the subconcepts. Currently the community agrees on 71
Analyzing the Spatial-Semantic Interaction of POI in VGI
359
different amenity values described in the OSM wiki7 . By convention these keyvalue pairs are meant to be applied uniquely, i.e., an OSM node is not supposed to have more than one amenity tag. Therefore, we cannot use bag-of-wordsbased similarity measures between different amenity values. Instead, we obtain the history set (in form of a bag of words) for each OSM element. OpenStreetMap offers diff files which list all elements that were subjects to change, i.e., creation, modification, or deletion, within a certain time frame. We use this diff function to compute the history set for all elements. For instance, an element x that was created as a cafe, then changed to restaurant, then changed back to cafe, and finally labeled bar, would contain the tags cafe, restaurant, and bar as a bag of words. The number of changes or their sequence is not recorded. For our similarity measure, we assume that such type changes occur due to semantic confusion of types by VGI contributors. Based on the history sets, we create a co-occurrence matrix C with cij containing the number of elements that have been both, tag i and tag j during their history. The diagonal entries of C contain the total number of i / j elements. Next, we apply the association strength measure to compute the expected number of co-occurrences (cp. eq. 8). In order to get values within [0, 1], simAS is normalized following equation 12): sim = 1 −
1 . 1 + simAS
(12)
Maximum dissimilarity is marked by a value of 1, while maximum similarity takes the value 0. A value of 0.5 reflects statistical independency.
5
Data and Case Study
This case study examines amenities in London as a spatial-semantic subset of the OpenStreetMap dataset. The semantic similarities of OpenStreetMap amenities are derived from the whole world dataset to achieve a higher degree of significance. In the following, we will describe the spatial and semantic components of amenity points of interest. 5.1
Amenity Points of Interest in OpenStreetMap
Amenity POI may be mapped as nodes or ways in OpenStreetMap. In the first case they are modeled as point features, as polygons otherwise. For our case study, the bounding box for the London dataset is set to (51.4158,-0.331), (51.6011,0.0796). Data was retrieved from OSM’s extended API (see requests8 ). All polygon features were converted to point features after retrieval by a centroid function. Thereby, they can be used in our methodology and we do not loose valuable information. The final dataset contains 20,765 POI with 64 out of 71 different amenity values being present. Table 1 shows their tag counts. 7 8
http://wiki.openstreetmap.org/wiki/Map_Features#Amenity http://xapi.openstreetmap.org/api/0.6/node[amenity=*][bbox=-0.331,51.4158,0.0796,51.6011], http://xapi.openstreetmap.org/api/0.6/way[amenity=*][bbox=-0.331,51.4158,0.0796,51.6011]
360
C. Mülligann et al. Table 1. Numbers of amenity tags in the London dataset
tag arts_centre bench bureau_de_change car_sharing college dentist fast_food fuel kindergarten parking post_box public_building school stripclub theatre vending_machine
5.2
# 50 219 20 600 111 77 708 201 42 1225 3086 121 1358 1 111 3
tag atm bicycle_parking bus_station car_wash community_centre doctors ferry_terminal grave_yard library pharmacy post_office recycling shelter studio toilets veterinary
# 330 1479 20 15 62 144 19 16 171 275 284 397 17 8 214 9
tag bank bicycle_rental cafe cinema courthouse drinking_water fire_station grit_bin marketplace place_of_worship prison restaurant social_centre taxi townhall waste_basket
# 464 343 1273 61 36 10 68 18 32 1356 6 1855 1 44 19 225
tag bar biergarten car_rental clock crematorium embassy fountain hospital nightclub police pub sauna social_facility telephone university waste_disposal
# 235 3 16 11 1 61 46 97 56 89 2556 1 5 1274 107 3
Semantic Similarity of Amenities in OpenStreetMap
The total number of features tagged with one of the 71 amenities values is 3,247,409, considering the whole world (state: February 2011). Out of these, 30,538 OSM elements have been subject to tag changes. For illustration purpose, table 2 shows a subset of the resulting co-occurrence matrix. The diagonal entries show the total number of occurrences of the corresponding element. The similarity values computed based on table 2 are shown in table 3. Due to the low number of overall changes to the amenity dataset, we tested each co-occurrence for statistical significance. The test was carried out as a χ2 test of the 2x2 contingency table of each tag pair. While the test statistic itself lacks features for semantic similarity, its p-value shows how reliable the raw data is. It turns out that for 25.5% of all co-occurrences the strength of association is not significant on a 95% confidence level. Therefore we assume that our similarity measure has an accuracy of at least 74%. Table 2. Examples for amenity co-occurrence in OpenStreetMap history sets bar
cafe cinema community_centre recycling theatre waste_basket
bar 16799 392 2 cafe 392 57343 6 cinema 2 6 11808 community_centre 1 1 3 recycling 1 10 0 theatre 2 5 104 waste_basket 0 3 0
1 1 3 2306 0 11 0
1 10 0 0 60309 2 122
2 5 104 11 2 11569 0
0 3 0 0 122 0 19976
In general, and taking into account that these values have been automatically derived from VGI without any pre-processing, the similarity values are plausible. However, the number of completely dissimilar tags is higher than expected. For instance, the number of tags that have a similarity value lower than 0.1 is 59 for
Analyzing the Spatial-Semantic Interaction of POI in VGI
361
Table 3. Selected normalized association strengths of OpenStreetMap amenities bar cafe cinema community_centre recycling theatre waste_basket
bar 1 0.57 0.03 0.08 0 0.03 0
cafe cinema community_centre recycling theatre waste_basket 0.57 0.03 0.08 0 0.03 0 1 0.03 0.02 0.01 0.02 0.01 0.03 1 0.26 0 0.71 0 0.02 0.26 1 0 0.57 0 0.01 0 0 1 0.01 0.25 0.02 0.71 0.57 0.01 1 0 0.01 0 0 0.25 0 1
bar, 61 for cafe, 62 for cinema, 51 for communtiy_centre,67 for recycling, 56 for theatre, and 68 for waste_basket (from an overall number of 71 amenity values). This leads to a coarse semantic granularity with respect to similar category tags. Additionally, the results are influenced by partonomic (e.g. for parking), linguistic (e.g. bank and bench), and lexical relations (e.g. watering_place and ferry_terminal). While partonomic relations may be considered a valuable influence on semantic similarity between geographic features, we have to treat the others as errors. Fortunately, in almost all cases, the linguistic or lexical bias comes along with a strong semantic association, (e.g., for theatre and cinema), or impacts tags that have a very low overall change rate (e.g. bench). Nonetheless, this shows that semantic similarities computed out of such history sets should not be applied without prior manual inspection. Finally, concept variograms and point pattern analysis require a dissimilarity values. Therefore, all values were inverted by dissimilarity = 1 − similarity.
6
Results
This section presents the results of applying the concept variograms as well as the spatial-semantic point pattern analysis to the OpenStreetMap POI data set 0 statistics were created for all 64 amenity for London. Concept variograms and D tags. We selected four amenities, namely bar, cafe, post_office and theater to show a spectrum of spatial-semantic interaction and possible interpretations. Fig. 3 shows a map view of all POI in a narrower bounding box, where a warmer color indicates higher similarity to a particular tag. The data set contains 235 bars, 1273 cafés, 284 post offices, and 111 theaters. Whereas bars and cafés, as well as tags similar to them, are equally prominent in the city center, theaters appear less often and only in a certain region with similar tags. Post offices are regularly spread over the whole area and rarely cluster with similar tags. Analyzing the spatial autocorrelation with respect to semantic similarities gives a first assessment of the qualitative differences between the four amenities; see fig. 4. post_office and theatre show nearly no spatial autocorrelation. An increase of semivariance can only be observed for bar and cafe up to a distance of 300 m. post_office shows the lowest overall similarity, followed by theater, cafe and bar. These values (on the y-axis) are called the sill of a variogram. The range of all four variograms, i.e., the maximum distance up to
362
C. Mülligann et al.
Fig. 3. Similarity values of four amenity tags in a subregion of the London OSM data
which spatial autocorrelation is observed, is roughly between 700 and 1000 m. Therefore, we used the latter value as a threshold for the second-order analysis of spatial-semantic interaction. 0 plots for the four selected amenities. Bars show spatialFig. 5 shows D semantic interaction on small spatial and semantic scale, i.e., less than 300 m and below 0.4 dissimilarity. The same applies to cafés regarding their spatial component. The semantic tolerance for interaction appears to be higher than the one for bars here. Theaters show a completely different pattern. The magnitude of interaction highly correlates with spatial and semantic distance. Especially the decrease of interaction in the spatial dimension is smoother for theatres than for cafés and bars. Post offices differ from the other three amenities in showing negative spatial-semantic interaction, i.e., the independent spatial and semantic clustering is stronger than the spatial-semantic one. While dissimilar features have zero interaction with post offices at any distance, negative interaction increases for more similar and closer features. The strength of interaction in general is high for bars and theatres and comparably low for cafes and post offices.
Analyzing the Spatial-Semantic Interaction of POI in VGI
363
Fig. 4. Concept variograms for four amenity tags in the London dataset
7
Discussion
In this section we discuss the results form the case study and focus on the interpretation of the introduced spatial-semantic interaction models (section 7.1). Subsequently four different application scenarios in the scope of VGI are presented (section 7.2). 7.1
Interpretation of the Results
0 plots have the potential of The examples in section 6 demonstrate that D revealing more information about spatial-semantic interaction than concept variograms. They explicitly plot spatial-semantic interaction on both scales. Therefore, we can observe that, e.g., bars cluster only with very similar amenities, whereas cafés seem to appear in a more diversified environment. Post offices are regularly spaced, primarily with themselves but also with slightly dissimilar features like post boxes. This results in negative spatial-semantic interaction as 0 plot we can observe that it occurs in Pattern B (cp. section 3). From the D it is more characteristic for post offices to be surrounded by dissimilar than by similar features - due to their public supply function they appear in all kinds of environments. Theaters seem to be clustered with similar amenities on a much higher spatial and semantic scale that is not captured by the corresponding concept variogram. Nevertheless, theaters can be considered the same interaction type as cafés and bars, if examined on a smaller spatial scale; see fig. 5. Therefore, they correspond
364
C. Mülligann et al.
0 plots for four amenity tags in the London dataset showing the magnitude Fig. 5. D of spatial-semantic interaction at different spatial and semantic scales
to the prototypical distribution of Pattern A in section 3. In contrast to cafés and bars, theatres only clump in London’s city center. Also similar amenities co-occur with theaters under high diagnosticity even for greater distances. When forming a spatial-semantic cluster of certain size, we can assume that a geographic feature has a function that is related to the magnitude of the cluster. A cafe or bar may be important to a block or certain street, whereas a theatre leaves its interaction traces in the whole city center. The concept variogram of theatres does not reflect the situation described above. There are too many completely dissimilar POI that hide the contribution of similar ones to a possible smaller-scale cluster. This shows the advantage of the point pattern analysis to incorporate the diagnosticity of POI within a certain semantic range. Beyond that, we cannot consider POI to have an underlying continuous spatial process of theatreness. It is rather the spatial pattern of theatres, intertwined with the spatial patterns of other amenities, that is characteristic for the geographic feature type theatre. By comparing the results of
Analyzing the Spatial-Semantic Interaction of POI in VGI
365
0 plot of theatres at a smaller scale Fig. 6. D
point pattern analysis and concepts variograms we are able to show that the theoretical reservations mentioned in section 3 have practical relevance. The resolution of the semantic dimension was chosen according to the granularity of the similarity measure (i.e, in 4 intervals). However, the accuracy of 0 is likely to increase if the semantic similarities are distributed over the whole D range of possible values. For example, the similarity of cafe to bbq and cafe to dentist is 0, and intuitively that could be considered reasonable in terms of their confusion possibility (which is underlying our similarity measure). However, keeping in mind that 0 is the minimal possible similarity we may want to distinguish both cases when assessing semantics in general. So far, the similarity measure applied is rather conservative. Dissimilar features are strongly 0 plots are not as penalized, which results in coarser semantics. Consequently, D smooth as in the original spatio-temporal application proposed by Diggle. 7.2
Application of Spatial-Semantic Interaction Models in VGI
Concept variograms and second-order analysis represent the spatial-semantic interaction signature of geographic feature types in a POI dataset. In section 7.1 0 statistics reveal more information than concept variograms. we argue that the D Therefore, we will focus on application scenarios that use the second-order analysis. Their implementation as software remains future work. The basis for an application of interaction signatures in VGI is a plausibility measure for an individual POI with respect to its feature type. Plausibility in our terminology is different from probability in the sense that we do not aim at predicting feature types in analogy to geostatistical interpolation. It can be computed for an arbitrary location s and feature type ck based on the comparison 0 statistic of a single POI with type ck at s and the D 0 statistic of between the D
366
C. Mülligann et al.
Fig. 7. Second-order analysis in action: two candidates for car wash locations in London are checked for plausibility by comparing their spatial-semantic interaction signature with the existing one. The green dot is a real car wash location extracted from Google Maps. The red dot simulates a duplicate tag. (map rendered by Quantum GIS, bounding box: 51.501,-0.171; 51.557,0.023).
0 plots. The second all ck in the dataset. Fig. 7 shows a real world example of D plot represents the case of a candidate with low plausibility, the third a case of high plausibility with regard to the first plot. Future work will focus on the 0 statistics of individuals and feature types as numerical comparison between D well as their normalization in order to derive a meaningful plausibility measure. Taking the above methodology as a starting point we envision the following application scenarios: Tag recommendation. Selecting the appropriate tag is a common problem for voluntary mappers. On the one hand, they want to reuse common tags to make sure their POI will be found and rendered. Checking frequency statistics such as taginfo9 for OSM can be helpful in that regard. On the other hand, contributors 9
http://taginfo.openstreetmap.org/
Analyzing the Spatial-Semantic Interaction of POI in VGI
367
want to use tags that best describe the corresponding real world entity. This requires browsing and searching the used vocabulary and finally deciding on a 0 statistic we can add a criterion tag based on its textual description. With the D such as “which tag is plausible at a certain location?” Plausibility values for different feature types and arbitrary locations can be ranked to suggest tags by comparing their interaction signature with the local environment. Based on the assumption that the underlying dataset is of reasonable quality, users will more likely select tags from the head of the ranking than from its tail. Hence, the second-order analysis supports mappers by reducing the semantic search space. Data cleaning. Plausibility can also be applied for cleaning up existing data. Cases of very low plausibility may be forwarded to editors who can check for duplicates or vandalism. For example, a post office tagged next to another post office may be assigned a very low plausibility value, because of its high positive 0 value in the near and similar spectrum (cp. section 7.1). It is more likely that D a mapper tagged the very same post office a second time. Fig. 7 depicts such an example of duplicate identification for car wash locations. However, the second-moment measure should be understood as decision support method for users rather than machine processing. There may still be cases of close post offices that are correct, as well as duplicate bars in a neighborhood of bars and nightclubs. The identification and removal of wrong or redundant data can be guided by our measures but requires manual interaction. Coverage recommendation. In analogy to the reduction of the semantic search space through tag recommendation, voluntary mappers can also be supported by reducing the search space in the literal sense. A tool that identifies areas in which a certain feature type is likely to occur but not present in the dataset could direct the mapping activities of volunteers to areas where coverage strongly differs by feature type. The possibility of making a valuable contribution can thereby be assessed beforehand. The scenario presented in fig. 7 can be considered the result of coverage recommendation even though such service would rather point to an area in the vicinity of the green dot than its exact location. 0 statistic for coverage recommendation needs to cope with two Using the D problems though. Firstly the influence of a feature type ck on its own interaction signature must be eliminated. Otherwise high plausibilities can only be expected in the border regions of the area where ck is actually present. Secondly the second-order analysis models a point process, in contrast to the result of a coverage recommendation, which would be an area. Therefor a sampling of test locations is needed that accounts for the density of instances of ck itself. Uncovering implicit partonomy. Given the huge amount of data, it becomes difficult to evaluate how voluntary mappers tag specific locations in comparisons to others, i.e., what users tag in contrast to what they mean, in an aggregated manner. For example, POI tagged as school could either represent school
368
C. Mülligann et al.
grounds or school buildings. In the latter case it is likely that the regular spacing on city-scale is accompanied by a strong clustering on the city-block-scale (because several buildings jointly form the complex which is commonly considered a school). Whereas tag usage (in this example) could solely be revealed by Ripley’s K, a spatial-semantic interaction model is required as soon as different but similar types of schools, e.g., boarding school, public school, or elementary 0 plots can uncover implicit partonomic assumptions that school are present. D should be made explicit by either proposing a new tag to the community or providing better descriptions to be considered by mappers in the future. In the 0 plot shows no clustering with similar POI. car wash example (cp. fig. 7) the D The red dot can be identified as a duplicate, because car wash facilities are not modeled as building complexes by VGI contributors.
8
Conclusion
In this paper we describe a methodology to characterize the spatial-semantic interaction of points of interest in OpenStreetMap. Inspired by Diggle’s [19] second-moment spatio-temporal measure, we combine point pattern analysis as originally proposed by Ripley’s [16] with semantic similarity. The resulting spatial-semantic interaction is a measure for the likelihood of features of a certain type to co-occur within a certain semantic and spatial range. The feature type similarities required for our work are not computed from top-down geoontologies, but automatically generated bottom-up based on the change history of OpenStreetMap elements. Our methodology sets the theoretical ground for tools to support users in contributing and cleaning up VGI. Users contributing new features may get automatic feature type recommendations based on the location of the new POI and the spatial-semantic interaction within its vicinity. Features that are unlikely to co-occur with other features may be discovered and forwarded to editors. At the same time, our work has implications on geospatial semantics research in general and geo-ontologies in specific. Instead of aiming at top-down domain ontologies that describe feature types such as pubs by characteristics like having tables, walls, or menus, we argue for a local, bottom-up approach based on their spatial and temporal characteristics. Pubs clump together with other features such as nightclubs or cafés and while they may have different opening hours, they are between those of cafés and nightclubs. Both approaches do not contradict and should be combined. However, it is rather space and time that shape our conceptualization of the world than bags of attributes [29]. As a long-term vision, by examining patterns of spatial-semantic (and temporal [30]) interaction, we aim at extracting prototypical properties of particular feature types, in order to generate unique semantic signatures. Besides integrating the temporal component as well, future work will especially focus on more formal methodologies for validating our results in terms of statistical significance (cp. Diggle’s U and residual statistics [19]) and sampling distributions. Our approach can also be improved by resources such as
Analyzing the Spatial-Semantic Interaction of POI in VGI
369
WordNet10 to disambiguate and map terms from different repositories containing user-generated bags of words and the inclusion of data from location-based social networks like foursquare11 or whrrl12 . Acknowledgements. The authors would like to thank Edzer Pebesma, Ashton Shortridge, Ola Ahlqvist, Mike Goodchild, and Peifeng Yin for their fruitful feedback and advice.
References 1. Goodchild, M.F.: Citizens as sensors: the world of volunteered geography. GeoJournal 69(4), 211–221 (2007) 2. Zielstra, D., Zipf, A.: A comparative study of proprietary geodata and volunteered geographic information for Germany. In: 13th AGILE International Conference on Geographic Information Science (2010) 3. Mooney, P., Corcoran, P., Winstanley, A.C.: Towards quality metrics for openstreetmap. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 514–517. ACM, New York (2010) 4. Goodchild, M.F., Glennon, J.A.: Crowdsourcing geographic information for disaster response: a research frontier. International Journal of Digital Earth 3(3), 231–241 (2010) 5. Scheider, S., Possin, J.: Affordance-based algorithms for categorization of road network data. Technical report. University of Münster, Germany (2010) 6. Werder, S., Kieler, B., Sester, M.: Semi-automatic interpretation of buildings and settlement areas in user-generated spatial data. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2010, pp. 330–339. ACM, New York (2010) 7. Trame, J., Keßler, C.: Exploring the lineage of volunteered geographic information with heat maps. In: GeoViz 2011, Hamburg, Germany (2011) 8. Janowicz, K., Keßler, C., Schwarz, M., Wilkes, M., Panov, I., Espeter, M., Bäumer, B.: Algorithm, Implementation and Application of the SIM-DL Similarity Server. In: Fonseca, F.T., Rodríguez, A., Levashkin, S. (eds.) GeoS 2007. LNCS, vol. 4853, pp. 128–145. Springer, Heidelberg (2007) 9. Zook, M., Graham, M., Shelton, T., Gorman, S.: Volunteered geographic information and crowdsourcing disaster relief: A case study of the haitian earthquake. World Medical & Health Policy 2(2), 231–241 (2010) 10. O’Sullivan, D., Unwin, D.: Geographic Information Analysis. Wiley, Chichester (2010) 11. Kuhn, W.: Volunteered geographic information and GIScience. In: NCGIA, UC Santa Barbara, pp. 13–14 (2007) 12. Elwood, S.: Geographic information science: emerging research on the societal implications of the geospatial web. Progress in Human Geography 34(3), 349–357 (2010) 13. Coleman, D., Georgiadou, Y., Labonte, J.: Volunteered Geographic Information: the nature and motivation of produsers. International Journal of Spatial Data Infrastructures Research 4, 332–358 (2009) 10 11 12
http://wordnet.princeton.edu/ http://www.foursquare.com/ http://whrrl.com/
370
C. Mülligann et al.
14. Ahlqvist, O., Shortridge, A.: Characterizing land cover structure with semantic variograms. In: Progress in Spatial Data Handling, pp. 401–415 (2006) 15. Cressie, N.: Statistics for Spatial Data (Wiley Series in Probability and Statistics). Wiley-Interscience, Hoboken (1993) 16. Ripley, B.: The second-order analysis of stationary point processes. Journal of Applied Probability 13(2), 255–266 (1976) 17. Daley, D., Vere-Jones, D.: An introduction to the theory of point processes. Springer Series in Statistics (1988) 18. Besag, J.: Contribution to the discussion of Dr. Ripley’s paper. JR Stat. Soc. B 39, 193–195 (1977) 19. Diggle, P., Chetwynd, A., Häggkvist, R., Morris, S.: Second-order analysis of spacetime clustering. Statistical Methods in Medical Research 4(2), 124 (1995) 20. Rodríguez, A., Egenhofer, M.: Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure. International Journal of Geographical Information Science 18(3), 229–256 (2004) 21. Li, B., Fonseca, F.: Tdd - a comprehensive model for qualitative spatial similarity assessment. Spatial Cognition and Computation 6(1), 31–62 (2006) 22. Raubal, M., Adams, A.: The semantic web needs more cognition. Semantic Web Journal 1(1-2), 69–74 (2010) 23. Schwering, A., Raubal, M.: Spatial relations for semantic similarity measurement. In: Akoka, J., Liddle, S.W., Song, I.Y., Bertolotto, M., Comyn-Wattiau, I., van den Heuvel, W.J., Kolp, M., Trujillo, J., Kop, C., Mayr, H. (eds.) ER Workshops 2005. LNCS, vol. 3770, pp. 259–269. Springer, Heidelberg (2005) 24. Van Eck, N.J., Waltman, L.: How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology 60, 1635–1651 (2009) 25. Peters, H.P.F., van Raan, A.F.J.: Co-word-based science maps of chemical engineering. part i: Representations by direct multidimensional scaling. Research Policy 22(1), 23–45 (1993) 26. Rip, A., Courtial, J.: Co-word maps of biotechnology: An example of cognitive scientometrics. Scientometrics 6, 381–400 (1984), doi:10.1007/BF02025827 27. Zitt, M., Bassecoulard, E., Okubo, Y.: Shadows of the past in international cooperation: Collaboration profiles of the top five producers of science. Scientometrics 47, 627–657 (2000), doi:10.1023/A:1005632319799 28. Goovaerts, P.: Geostatistics for natural resources evaluation. Oxford University Press, USA (1997) 29. Janowicz, K.: The role of space and time for knowledge organization on the semantic web. Semantic Web Journal 1(1-2), 25–32 (2010) 30. Ye, M., Shou, D., Lee, W.C., Yin, P., Janowicz, K.: On the semantic annotation of places in location-based social networks. In: ACM SIGKDD (forthcoming, 2011)
A Model of Spatial Reference Frames in Language Thora Tenbrink and Werner Kuhn University of Bremen, University of Münster, Germany [email protected], [email protected]
Abstract. We provide a systematic model of spatial reference frames. The model captures concepts underlying natural language expressions in English that represent both external and internal as well as static and dynamic relationships between entities. Our implementation in the functional language Haskell generates valid English sentences from situations and reference frames. Spatial reference frames are represented by the spatial roles of locatum, relatum, and, optionally, vantage, together with a directional system. Locatum, relatum, and vantage can be filled by entities taking on the discourse roles of speaker, addressee, and participant (grammatically expressed by first, second, and third person). Each of these roles may remain unspecified in a linguistic description. Keywords: reference frames, spatial relations, motion, conceptual modeling, natural language.
1 Introduction Spatial descriptions represent a major challenge for natural language interpretation as well as conceptual models. The literature provides a vast range of approaches focusing on diverse aspects and pursuing a variety of aims. For example, RetzSchmidt (1988) presents a useful account of diverse reference frames, clarifies a number of confusions and ambiguities, and provides an outline of a dialogue system utilizing the introduced distinctions. Herrmann (1990) as well as Levinson (1996) both suggest (fairly equivalent) schematic approaches capturing the most frequent types of spatial reference frames in a more systematic way than hitherto available. Both Levelt (1996) and Frank (1998) emphasize the role of perspective choice and mental rotation in their accounts. Frank (1998) in particular proposes a formalization with respect to the mental operations required to assign spatial regions to objects from various perspectives. Talmy (2000) embeds a thorough discussion of the diversity of conceptual reference frames within his comprehensive cognitive grammar theory, capturing a much wider range of spatial (and other conceptually crucial) terms than most other approaches. This list could be extended considerably. Altogether, this intricate work has provided deep insights into humans' understanding of their spatial surroundings, which as such has been shown to be at the center of human cognition in many crucial respects (e.g., Miller & Johnson-Laird, 1976; Lakoff & Johnson, 1980). The present paper adds to this body of literature by introducing a systematic conceptual model that represents conceptual reference frames underlying English language usage by employing simple spatial relationships between entities consistently. M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 371–390, 2011. © Springer-Verlag Berlin Heidelberg 2011
372
T. Tenbrink and W. Kuhn
The framework directly builds on Levinson's (1996) approach but extends it in various respects. It represents absolute reference frames consistently with intrinsic and relative frames, and it integrates the systematic difference language makes between topologically internal and external relationships. Crucially, the framework is capable of modeling dynamic spatial concepts in just the same way as the static reference frames usually focused on in most accounts. The model achieves this by separating roles (such as a vantage providing a perspective) from properties and affordances (such as having an intrinsic orientation). Furthermore, it distinguishes spatial and discourse roles and abstracts these from concrete linguistic expressions. This allows for the integration of those crucial distinctions and conceptions that have been identified in the earlier literature by various authors as just cited, and many others, in multiple ways. We thus propose a uniform and simple model that is flexible enough to account for a wide range of interrelated concepts.
2 Spatial Reference Frames 2.1 Basic Framework Levinson (1996) proposed a systematic framework that distinguishes between three basic spatial reference frames called absolute, relative, and intrinsic. This framework is now widely adopted by researchers for the interpretation of a particular kind of spatial descriptions, namely those that involve the high degree of conceptual complexity that is represented by these frames. Our model captures these three basic reference frames via a uniform set of obligatory spatial roles, namely locatum, relatum, and directional system. Additionally, there is the optional role of vantage, providing a perspective. Figure 1 introduces these basic roles in the form of symbols that will be used throughout this paper to represent roles within a situation. The conceptual reference frame that underlies a spatial description is defined by the relations between these roles. It assigns concrete entities from the situational context to the roles of locatum and relatum (and possibly vantage), as well as a set of linguistic terms to the directional system. For the latter, we will here be concerned with two options only, namely the so-called projective terms (front, back, right, left), and compass terms (north, west, south, east). Both of these sets partition a spatial plane using four different directions, which our symbols represent as an abstract cross. Introducing abstract roles supports the identification of (and differentiation between) implicit and explicit conceptual participants of a linguistic description. This enables a consistent identification of reference frames underlying spatial descriptions. In the case of a fully specified (explicit) description this results in a direct association between the description and one specific type of reference frame. However, natural discourse often leaves conceptual participants implicit, resulting in an underspecified spatial description. In this case, our model allows for the identification of the range of reference frames that are compatible with a description. The spatial roles of locatum, relatum, and vantage are filled by entities (which can be objects, people, or places). These spatial role fillers can also fill discourse roles, namely those of speaker, addressee, and participant. These three distinct discourse
A Model of Spatial Reference Frames in Language
373
roles correspond grammatically to first person (speaker: "I"), second person (addressee: "you"), and third person (an entity other than speaker or addressee: "he, she, it") (Herrmann, 1990). This three-fold distinction has systematic repercussions on the linguistic expression of the underlying reference frames, as will be shown below.
Fig. 1. Depictions for spatial roles as basic elements in our model. The circle represents the relatum, the square the locatum, the cross the directional system, the arrow the perspective, and the triangle another entity within a scene (filling the role of vantage, for example). The three roles represented on the left are obligatory for intrinsic, relative, and absolute reference frames, while the two on the right are optional.
Apart from the basic differentiation between absolute, relative, and intrinsic reference frames, which will be discussed in Sections 2.2 through 2.4, further options emerge. While the standard situation in Levinson's approach is that objects are spatially separate (external reference frames, see below), they may also be located inside of one another (internal reference frames, Section 2.5). Further linguistic and conceptual options arise from various motion concepts (Section 2.6). In the static case (without the involvement of motion), the role of locatum is filled by the entity that is currently being described (which may itself, as a shorthand, be referred to as locatum); and the role of relatum is filled by another entity in relation to which the locatum is being described. In intrinsic and relative reference frames, another entity may provide the basis for determining the perspective by filling the (optional) spatial role of vantage. Alternatively, a perspective may be conveyed by motion. In each case, the perspective provides a vector that determines the assignment of projective terms (e.g., which side should be referred to as left) within the directional system, independently of whether the (actual or potential) vantage of a person is involved. The entities filling the roles of locatum, relatum, and vantage need not be individual objects or persons. Tenbrink & Moratz (2003), for instance, discuss the role of a group of similar objects filling the role of relatum in a static external relative reference frame. This specific case exemplifies the increasing complexity, which would be multiplied if other roles and reference frames were affected in this way. We therefore restrict our current discussion to individual entities. Furthermore, our model assumes the simplest spatial extension possible for the spatial roles, namely point-like except where some extension is needed. For example, in an internal reference frame (Section 2.5), the relatum needs to be extended in order to contain the locatum. In practice, of course, the entities filling the spatial roles are never point-like. This, again, leads to further complexities in the assignment of reference frames, as shown, for instance, by Herskovits (1986) and Eshuis (2003). The exact position of the locatum relative to the relatum (for instance, whether an object is conceived of as being directly or rather diagonally in front of another, or how close it is) depends on a variety of factors including the (functional) relationships
374
T. Tenbrink and W. Kuhn
between objects (e.g., Coventry & Garrod, 2004; Carlson-Radvansky et al., 1999), their size (Talmy, 2000), and the situational context (Bateman et al., 2007). This is true for all types of reference frames. In the following, representations will reflect prototypical or "ideal" spatial relationships (Herskovits, 1986): front and north are associated with 0° relative to the relatum, right and east with 90°, back and south with 180°, and left and west with 270°. In actual discourse, this is almost never precisely true, but this association provides a suitable abstraction of the relevant qualitative distinctions. More precise spatial distinctions have been modelled, for instance, by Freksa (1992), Regier and Carlson (2001), Moratz and Tenbrink (2006), and Moratz and Ragni (2008), focusing on different psychological, formal, or discourse-related aspects. Concerning the distance of objects to each other, it can be observed that the use of projective terms typically implies a direct (uninterrupted) adjacency relationship between locatum and relatum (Talmy, 2000; Pribbenow, 1991). For example, if object A is described as left of object B, there is no further object between A and B. In contrast, this is not the case for spatial descriptions using compass directions. Apart from this qualitative effect of proximity with projective terms, there are no further constraints on spatial distance. Having clarified the general properties of our framework, we will now introduce the specific cases, starting with Levinson's three basic reference frames: intrinsic, relative, and absolute. All of these represent static external situations.
a.
b.
c.
Fig. 2. Basic reference frames, represented schematically. a: Intrinsic case. b: Relative case (in which the perspective is provided by the vantage, depicted by a triangle). c: Absolute case.
2.2 Static External Intrinsic Reference Frames In the (static) intrinsic case, the relative position of the locatum with respect to the relatum is described by referring to the relatum's intrinsic properties such as front or back. Therefore, one can say: (1)
There is a box in front of me.
This example represents a case in which the relatum is the speaker and the locatum is an entity other than speaker or addressee (a box). The perspective is supplied by the speaker's front or view direction, i.e., the speaker provides the vantage. In Figure 2a, this idea is represented by the arrow coinciding with the relatum, with the directional system imposed on both. The front direction is thus provided by the speaker's view direction, the right direction by the speaker's right, and so forth, yielding the order front-right-back-left in clockwise direction. Any entity with the potential to provide a direction may serve as relatum in an intrinsic reference frame, including objects with
A Model of Spatial Reference Frames in Language
375
functional parts (chairs or cars) and the like (Herrmann, 1990).1 The other options for filling the roles are as follows. (2) (3) (4) (5)
There is a box in front of you. There is a box in front of the chair. I am in front of you. You are in front of me.
Together these examples illustrate the three distinct cases of relatum (first person: (1); second person: (2); third person: (3)); as well as the three distinct cases of locatum (first person: (4); second person: (5); third person: (1)). Since the relatum coincides with the vantage, there are no additional options for filling this role within an intrinsic reference frame. 2.3 Static External Relative Reference Frames Unlike the intrinsic case, the relative case is based on a different entity (other than the relatum) providing a perspective. In (6) the relatum (the ball) does not possess an intrinsic front. To interpret such an utterance, the underlying perspective needs to be identified, based on the speaker's or the addressee's vantage, or on a different entity that provides a basis for a view direction (see Figure 2b). The perspective allows for the assignment of a directional system to the relatum, i.e., determines where the front, left, back, and right sides of the relatum will be. (6)
There is a box to the right of the ball (from my vantage (point)2).
The other options for filling the roles of relatum, locatum, and vantage can be spelled out as follows. While (6) shows first person as vantage, (7) exemplifies second person, and (8) third person, respectively. (9) provides the case of first person as locatum, (10) gives second person as locatum, and (6) third person as locatum. The relatum is represented by the first person in (11), second person in (12), and third person in (6), respectively. (7) (8) (9) (10) 1
2
There is a box to the right of the ball (from your vantage). There is a box to the right of the ball (from the chair's vantage). I am to the right of the ball (from your vantage). You are to the right of the ball (from my vantage).
Not all objects provide all directions (front, back, right, and left), even if they are asymmetric, such as pencils whose tip may provide a "front" to some speakers. Furthermore, Tyler & Evans (2003) point out that even entities which have no inherent orientation at all can sometimes be used for reference of this kind (i.e., without an external observer), as in Sarah stood in front of the tree. Quite exceptionally, the front side of the locatum (rather than the relatum) is used here to determine the direction. Tyler and Evans trace this phenomenon back to what Clark (1973) called the "canonical encounter", i.e., a face-to-face interaction transferred, in this case, to the tree. This seems likely since the effect only appears with the front direction; the locatum's back, left, and right sides cannot be used in this way. In our model, "vantage" is the technical term for a particular spatial role. In natural language, speakers would be more likely to refer to it as "vantage point" or "point of view", if they chose to specify it at all, which is only rarely the case (Herrmann & Grabowski, 1994).
376
T. Tenbrink and W. Kuhn
(11) (12)
There is a box to my right (from your vantage). There is a box to your right (from my vantage).
As these examples demonstrate, not all conceivable ways of filling the roles are equally likely to occur in natural discourse. Example (6) is natural, since the speaker of this description uses their own vantage, which is a normal thing to do. Using the addressee's vantage, as in (7), is also natural; which of these two options is chosen depends on various discourse factors such as the relationship between the people involved (Herrmann & Grabowski, 1994; Schober, 1993). In contrast, describing a scene from another entity's vantage as in (8) may need a particular reason for doing so. Moreover, it is untypical for a speaker to describe their own position from the addressee's vantage as in (9), or vice versa as in (10), or to describe the location of an object in relation to one's own body from the addressee's vantage as in (11), or vice versa as in (12). However, discourse situations exist in which these kinds of descriptions become relevant and might be used, since they belong to the general repertory available to speakers. In the following, we will restrict our account to a subset of possible cases out of the general system in which the roles of locatum, relatum, and vantage can theoretically be filled by all three options of speaker, addressee, or participant. There is one further complication worth mentioning. With the front-back axis used in example (13) below, relative reference frames are somewhat ambiguous. As Hill (1982) demonstrates, two conceptual alternatives are conceivable (see Figure 3). In English, the relation in front of usually expresses that the locatum is closer to the vantage than the relatum, yielding the order front-left-back-right in clockwise direction – notably, the inverse of the order reported above for intrinsic reference frames. However, the opposite may also be the case. In other languages such as Hausa, the opposite is the preferred interpretation (Hill, 1982). Then the locatum is further away from the vantage than the relatum, and the same order (front-right-back-left) is maintained as with intrinsic reference frames. In the following we will assume inverse ordering of directions for relative reference frames as a default, which is generally accepted as the more typical interpretation in English, leaving the alternatives implicit. (13)
There is a box in front of the ball (from my vantage).
Fig. 3. Two possible interpretations of the front-back axis in a relative reference frame: With in front of, the locatum (box) may be (a) closer to or (b) more distant from the vantage than the relatum
A Model of Spatial Reference Frames in Language
377
2.4 Static External Absolute Reference Frames In the absolute case, ubiquitous orientation systems provide a culturally shared basis for determining the directional system (Levinson, 1996). These include compass directions (north, east, south, west, established in clockwise order as shown on maps) as well as, in other languages, environmental features (uphill, downhill, upriver, downriver, which may be less stably established). For example, if the north direction is towards the top of the page, the following is consistent with the depiction in Figure 2c: (14)
There is a box east of the ball.
Since absolute reference frames presuppose a directional system that is already present within the discourse context via its anchoring in the culture, no further perspective is needed to establish an assignment of directions. 2.5 Internal Relationships Levinson's framework is geared toward (and typically applied to) external relationships, i.e., relations between objects that are spatially separate, as in the examples given so far. Does it equally account for cases in which the locatum is positioned inside of the relatum, yielding an internal relationship? Language sometimes distinguishes between these two topological cases grammatically (Miller & Johnson-Laird, 1976; Talmy, 2000), as seen from the distinction between the external example (15) and the internal relationship expressed in (16): (15) (16)
The box is in front of the car. The box is in the front of the car.
In internal relationships, the relatum is conceptually divided into parts that are described by projective terms, sometimes explicitly so by referring to sides (such as "on the left/right side", Carroll, 1993:30). As with external relationships, the directional system underlying such a description can be assigned in different ways. In (16), represented by Figure 4a, the directional system is based on the relatum's intrinsic parts (or perspective); this yields a clear internal intrinsic case in which the relatum encompasses the locatum. Again, as with external intrinsic reference frames, directions are assigned as front-right-back-left in clockwise order. Internal relative cases are based on an observer's vantage. For instance, if the relatum room in example (17) has no intrinsic parts of its own (e.g., a room with several doors), a perspective may be derived from the speaker looking into the room, imposing a directional system on the room. If Figure 4b is taken to represent example (17), the relatum corresponds to the room and the locatum to the box. The exact position of the vantage is not reflected linguistically in internal relative reference frames; it may be located inside or outside the relatum (or at the borderline, standing in the door, for example). Since intrinsic sides are typically ascribed to objects by the way humans interact with them (Herrmann, 1990), internal relative reference frames may sometimes not be distinguishable from internal intrinsic ones. (17)
The box is in the back of the room.
378
T. Tenbrink and W. Kuhn
The interpretation in terms of a relative internal reference frame entails the assignment of regions in the same way as with external relative frames, namely frontleft-back-right in clockwise order (where front corresponds to the region closest to the vantage). Furthermore, regions may also be partitioned into internal (relative) sections by adopting a global perspective (Carroll, 1993). The observed region can be a specific assembly of objects that are perceived as belonging together or being relevant for the discourse situation (Gorniak & Roy, 2004), or any other kind of region that is within the limits of perception. For example, in German it is possible to say: (18)
Dort hinten steht eine Kiste. [lit., "There in the back stands a box."]
Here, the visual field is partitioned into regions in relation to the position of the speaker. Then, the area close to the observer is referred to as vorne (front), and the area more distant from the speaker within the visual field is referred to as hinten (back) (see Tenbrink, 2007, for discussion of syntactical patterns). Paralleling example (17), example (18) also corresponds to the situation in Figure 4b if the circle (relatum) represents the speaker's visual field, the square represents the box, and the view direction (the arrow) is derived from the speaker. Finally, the internal absolute case is straightforward, as it employs a ubiquitous directional system, both within and outside of any relatum. In example (19), the town is the locatum (represented by the square in Figure 4c) and the country is the relatum (represented as the big circle). (19)
The town is in the east of the country.
Fig. 4. Internal relationships: the relatum (represented by the big circle) is large enough to contain the smaller locatum (the square). a: Intrinsic case. b: Relative case, with the entity providing a perspective (i.e., vantage) positioned either inside or outside of the relatum. c: Absolute case.
2.6 Motion So far, the discussion has focused on static relationships between objects, which have been described as conceptually primary (Svorou, 1994:22). When the entities in question are in motion, several distinct effects emerge. Motion can be expressed by a range of spatial terms, some of which are semantically dynamic, while others resemble static expressions (Miller & Johnson-Laird, 1976); for instance, projective terms may be used dynamically just as well as statically (Retz-Schmidt, 1988:102). Motion can provide an independent perspective (Svorou, 1994; Fillmore, 1997), and
A Model of Spatial Reference Frames in Language
379
motion descriptions can reflect the same three types of reference frames as static descriptions (Levinson, 2003:96f.). However, depending on which object (or role in the present framework) is affected by the motion event there may be quite different effects. For example, an object may undergo change with respect to its own former position or extension (Brugman & Lakoff, 1988). To our knowledge, these observations have not been integrated comprehensively in any framework, nor have the effects of motion on reference frames been explored in much detail. We propose that the introduction of motion allows for the roles of locatum and relatum to be filled by different entities at different times, resulting in a system of reference frame options that is far more complex than the static situation reveals, yet utilizes the same underlying conceptual patterns. The following account first addresses motion as perspective; then the various effects of motion on the roles of relatum and locatum will be spelled out. 2.6.1 Motion (or Sequence) as Perspective Directed movement may in some cases provide a perspective for intrinsic and relative reference frames. Then the directional system is imposed on the relatum not on the basis of perception (a view direction), but on the basis of the direction of movement. In such cases the roles of relatum and locatum can be specified in the same manner as with static situations, since the movement does not affect their relative position (see also Talmy, 2000). Example (20) represents an (external) intrinsic case that is schematically illustrated by Figure 5a below. Here the relatum (ball) and the locatum (mouse) remain in a stable spatial relationship to each other, without requiring an additional vantage, as the described movement provides a basis for the directional system, i.e., the interpretation of "in front of". (20) (21)
The mouse is running in front of a ball rolling down the hill. The wheel is rolling towards the box placed to the right of the ball.
Example (21) represents a relative case, shown in Figure 5b below. It involves two spatial concepts: a movement (of the wheel) towards the box, and the location of the box (locatum) to the right of the ball (relatum). The movement description of the first spatial concept provides the basis for assigning a directional system for the second spatial concept. In other words, the direction of movement within the scene fills the role of perspective in the description of a static spatial relationship. Another possibility for a relative reference frame is that the direction of movement encompasses both relatum and locatum (Figure 5c)3. In example (22) both the relatum (ball) and the locatum (box) are floating at the same speed, and therefore remain in a stable relationship to each other. As before, the direction of movement within the scene fills the role of perspective in the description of a static spatial relationship. Similarly, concepts of sequence (with or without movement) may provide a (functional) direction; example (23) appears to be valid no matter how Peter and Mary are currently oriented, and thus conceptually equivalent to example (22). (22) (23) 3
The box is floating in the river, in front of the ball. Peter is in front of Mary in the queue.
This is one way of interpreting this specific movement case. See Tenbrink (2011) for a different interpretation within a slightly modified model.
380
T. Tenbrink and W. Kuhn
a.
b.
c.
Fig. 5. Movement inducing a perspective for intrinsic and relative reference frames. The direction of movement is indicated by a thin arrow. a: Intrinsic case; the locatum is in front of the relatum, which is currently moving and therefore capable of providing a perspective. b: Relative case with external movement; the right side of the relatum is assigned by the "perspective" of another moving entity in motion. c: Relative case with surrounding movement (or sequence).
2.6.2 Motion from Anywhere to Locatum: All Reference Frames Spatial terms sometimes refer to the destination point (or region) of a motion trajectory, as in the following examples: (24) (25) (26) (27) (28) (29)
The box should be placed in front of me. Put the box to the right of the ball. Put the box to the east of the ball. Place the box in the front of the car. Place the box in the back of the room. Place the box in the east area of the town.
Similar to example (21), all of these descriptions involve two spatial concepts. Here, the first concept a) concerns a movement of the box, starting from an unknown position, and the second b) concerns the definition of the future position of the box relative to a relatum. In such cases, the entity in focus (the box) no longer continually represents the locatum; the reference frame underlying spatial concept b) only holds at time t1 after completing the movement trajectory of a), but not at time t0 before or while the motion occurs. At time t1, reference frames are established that are equivalent to the static reference frames described above; the difference is due to the nature of the verb (dynamic rather than static). All three kinds of basic reference frames can be used in this way, both externally and internally. After completing the movement, example (24) can be interpreted in terms of a dynamic external intrinsic reference frame, with the new location of the box representing the role of locatum as defined by its relation to the relatum (the speaker). (25) depends on an external perspective (which the context will provide), yielding a dynamic external relative reference frame. The dynamic case of an absolute reference frame is shown in (26). (27) and (28) are examples for intrinsic and relative dynamic internal reference frames, and (29) gives the dynamic internal absolute case. All of these cases are straightforwardly represented by the schemata depicted in Figures 2 and 4, showing in this case the end position of the movement at time t1. The start position of the moving object and the trajectory of movement are irrelevant in each of these cases, since the
A Model of Spatial Reference Frames in Language
381
relatum and perspective (if any) are defined independently of the motion event, and the locatum is defined only by the end point of the trajectory. 2.6.3 Motion from Vantage to Locatum in a Dynamic Relative Reference Frame Example (30) is similar to the examples just discussed in that, again, two spatial concepts are involved: a) a movement (by the speaker to the box), and b) the definition of the position of the box as being to the right of the ball in a relative reference frame (as in example (25)). However, in this example, the speaker is also a likely vantage4 for the perspective used in b), given at time t0, prior to the motion event described in a). Then the motion event a) starts from the vantage position at time t0. The two other objects remain unaffected by the motion in a) and can thus straightforwardly (and without considerations of time) be described as relatum (ball) and locatum (box). This situation is represented in Figure 6, which shows how the entity providing a perspective at time t0 moves towards the position of the locatum. (30)
I will go to the box to the right of the ball.
Fig. 6. Dynamic relative reference frame: Movement from vantage to locatum
Now consider the following, describing basically the same situation except that the locatum is a place (the Aristotelian notion of a location with the potential to be occupied by an object) rather than an object: (31) (32)
I'm going to a place to the right of the ball. I'm going to the right of the ball.
Again, the perspective can only be defined from an external position, for example the speaker's position at time t0, prior to movement, yielding a dynamic relative reference frame. The end point of the trajectory – the place to the right of the ball – at time t1 corresponds to the role of locatum, as in example (25) above. In example (31) this place is linguistically represented explicitly, but the implicit case in (32) appears to be pragmatically equivalent and perhaps more natural. Again, the trajectory of the entity that provides the perspective leads from the position of vantage to that of the locatum as depicted in Figure 6. Note that the moving entity may change its orientation during the movement without changing the definition of the goal location (locatum); the perspective relies on the position of the oriented entity at the time of the description (t0). 4
Alternatively, the direction of movement itself may provide the perspective as in example (22) above.
382
T. Tenbrink and W. Kuhn
2.6.4 Motion from Relatum and Vantage to Locatum in an External Intrinsic Reference Frame So far, all examples contained an explicit relatum, rendering the underlying spatial relationship unambiguous (except for underspecification of perspective). However, neither in static nor in dynamic spatial descriptions does this have to be the case. In the examples of static relationships described in Sections 2.2 and 2.3, the relatum could unproblematically remain implicit as in example (33) below, without changing the intended reference frame. But how can the dynamic examples (34) and (35) be interpreted in the present model of reference frames? (33) (34) (35)
There is a box on the right. I'm going to the right. I'm going right.
Conceivably, the spatial relation underlying a description like (34) is the same as in example (32), using the dynamic version of a relative reference frame, and omitting the relatum (ball). A more likely explanation, however, may be that no additional relatum is intended at all, and the utterance merely expresses a case of self-movement towards a right direction – equivalent to example (35) which can only be interpreted in this sense. This can be modelled as the dynamic version of an external intrinsic reference frame: the relatum is reflexive (cf. Brugman & Lakoff, 1988) and corresponds to the vantage, i.e., the speaker's position at time t0, providing the direction of movement. This idea can be best illustrated by starting with the front direction as illustrated in Figure 7 (a and b). The schema in Figure 7a shows the static intrinsic case; the locatum (square) is described with respect to the relatum (circle) which also provides the perspective (big arrow). A corresponding description is example (1) above, repeated here for convenience: (36)
There is a box in front of me.
This is directly mirrored by (37) and – if the end position of the movement is not defined by an object but simply a place – also by (38) (schematically depicted by Figure 7b). Again, these two utterances involve two spatial descriptions each: the goal of the speaker's movement is specified by a noun ("the box" in (37); "a position" in (38)), and the location of these goals is then defined by a static spatial description ("in front of me"). However, essentially the same spatial situation as in (38) can in English be addressed in a shorter form, namely by (39) using an expression that is semantically dynamic (also called directional, cf. Winterboer et al., in press), leaving the end point of the trajectory implicit. Figure 7c shows the situation for a movement towards the right with respect to the start position of the mover, as in examples (34) and (35) above. (37) (38) (39)
I'm going to the box in front of me. I'm going to a position in front of me. I'm going forward.
A Model of Spatial Reference Frames in Language
383
Movement from the vantage and relatum as described so far may or may not involve a re-orientation of the moving entity. Example (40), in contrast, gives an explicit description of a re-orientation; this is expressed by the verb turn. Here, the situation is reversed in that the re-orientation may or may not also imply a movement to a new position. If uttered in a route context, it usually expresses re-orientation combined with a continued movement straight on, yielding a trajectory resembling a quarter of a circle. (40)
a.
I'm turning (to the) right.
b.
c.
Fig. 7. Intrinsic case: Movement from start position (vantage and relatum) to end position (locatum). a: Static intrinsic reference frame (for comparison). b: Forward movement. c: Movement to the right of the view direction at the start of the movement.
2.6.5 Motion from Relatum (not Vantage) to Locatum: Dynamic Relative and Absolute Reference Frames Another kind of dynamic relative reference frame (distinct from the kind described in Section 2.6.3 above) emerges if directionals are used to describe the movement of objects relative to their own previous position as described from an external vantage. In example (41), both the vantage and the relatum are unspecified and need to be derived from the context (see Jörding & Wachsmuth, 2002, for an inspiring study exploiting this underdeterminacy). The context may provide possible interpretations for a relatum similar to examples (25) and (27) above. However, the object's original position may also serve this role; then the object is moved to the right of its own position at time t0. As for perspective, it is perhaps most likely that the speaker is using their own vantage (which remains unchanged through the time of the movement), which then yields a situation as depicted in Figure 8a. Other vantages are equally possible. The end position of the movement at time t1 then again corresponds to the role of locatum. (41)
Move the box to the right.
If a non-oriented entity is moved in a forward direction as in example (42), the moved object (the box) might move from its own position at time t0 (the relatum) to a position (the locatum at time t1) forward (or: in front) of the relatum, using a perspective provided by a different entity (possibly the speaker in example (42)), as shown in Figure 8b. (42)
Move the box forward.
384
T. Tenbrink and W. Kuhn
a.
b.
Fig. 8. Dynamic relative reference frames. a. Movement of an object from the relatum (start point) to the locatum (end point). b. The object is moved forward with respect to its own earlier position, using an external vantage determining the directional system.
Alternatively, the perspective (which in this case determines the direction of movement) may be provided by an externally defined type of sequence or movement, as in Figure 5c, example (22) above. Example (43) illustrates that, in this case, no further entity (such as the speaker) is required for interpretation of the direction of forward movement. The box moves to a new position that is further in the front of the ordered sequence or conveyor belt than its previous position (cf. Figure 9). (43)
The box is moved forward in the ordered sequence / on the conveyor belt.
Fig. 9. Relative reference frame providing a direction of movement. An entity is moved from the position of the relatum (its own earlier position, which the new position is related to) to that of the locatum, based on the encompassing perspective given by external movement or sequence.
However, as with the lateral axis, other interpretations are available as well, filling the lexically unspecified roles of relatum and perspective in different ways. Imagine, for instance, a situation in which objects are arranged in order to be photographed. Then an instruction to move the box forward could be interpreted to mean moving the object towards the area in front of the camera, with the camera filling the roles of vantage and relatum, yielding a dynamic intrinsic reference frame similar to example (24) above. The end position of the movement then again becomes the locatum at time t1. 2.7 Summary of Spatial Reference Frames Spatial reference frames have been distinguished in the present framework along the following lines:
A Model of Spatial Reference Frames in Language
• • • •
385
intrinsic, relative, or absolute concepts external or internal relationships between entities static or dynamic situations For dynamic situations: o o o o
Movement direction as perspective Movement from anywhere to locatum Movement from vantage to locatum Movement from relatum to locatum
The distinctions can be combined almost non-restrictively. Further complexities arise by the choice of axis (frontal vs. lateral) as well as perspective (speaker, addressee, or other) and type of relatum (an object or person, a group of objects, etc.). Each of these kinds of variability deserves attention in its own right, as reflected in the vast amount of research literature in this area (see Tenbrink, 2007 for an overview). For instance, if the relatum consists of several objects (such as a group of same-class objects), this may have several repercussions on the language used (cf. Tenbrink & Moratz, 2003).
3 Implementation The goal of our implementation of the model is a simulation that generates valid sentences in English (and ultimately other languages) from spatial situations and discourse roles, using appropriate types of reference frames from the available set of options. Alternatively, one might look for implementations that generate possible reference frames from given situations and linguistic descriptions, or that generate possible situations from linguistic descriptions along with reference frames. A suitable formalization tool for our current purposes is the functional language Haskell (see www.haskell.org), which has been used successfully for simulations of other phenomena, such as transportation (Kuhn 2007) and observation (Kuhn 2009). Haskell can capture the role-based nature of our model particularly well, as it allows for distinguishing between types of entities and the roles they fill. The lack of this distinction in existing models of reference frames motivated the work presented here. In order to test the completeness and adequacy of the role-based model, we have as a first step implemented the simulation, before producing a more refined ontology. The simulation consists of a small set of rules to produce English sentences from situations described as role assignments, with associated discourse roles. Situations are records of locatum, relatum, and optionally vantage. Discourse roles are assignments of entities to the roles of speaker, addressee, and participant. Each role slot is filled by one or more entities, which are described as records with noun, position, footprint, heading, and motion direction. The geometric properties are represented internally in simple raster coordinates local to situations. Somewhat surprisingly, the only analytical procedure required is a simple (one line of code) function to determine the direction from relatum to locatum as seen from the vantage of a situation. The only other interesting rule is the one to determine the preposition (such as “in front of”) from this relative direction, using the frame of
386
T. Tenbrink and W. Kuhn
reference type. All sentences can be generated as a field (“There is a box in front of…”) or object (“The box is in front of…”) representation. Our implementation reproduces sentences 1 to 24 of the examples in this paper, a few of them with minor grammatical variations (such as “to the right of me” rather than “to my right”). The spatial referencing for the locatum of all remaining dynamic situations (25 to 43) is also correctly reproduced, though no effort has been made to capture their dynamic verb phrases (involving put, place, move, go, turn, etc.), as these are independent of spatial referencing. The main point about these examples is that a movement direction can supply a perspective, that spatial roles can be defined at certain times (prior to or after movement), and that they can be filled by abstract places rather than objects or people. Excerpts from the simulation code are given in the Appendix. The current version of the complete code can be inspected and downloaded from http://musil.uni-muenster.de/resources.
4 Conclusions In this paper we have extended widely used accounts of spatial reference frames by integrating dynamic cases and some further fundamental distinctions made in language. By using abstract roles that are filled by entities in a discourse context, our model consistently captures a wider range of spatial descriptions than has been proposed in earlier approaches. Moreover, we have proposed an implementation in the form of a simulation generating our example sentences. Various applications of this framework are conceivable. Natural language generation systems can profit from our approach just as well as computational implementations of spatial descriptions. Moreover, a range of controversies in the literature on this complex topic may be reconciled by realizing the diversity of spatial concepts (static and dynamic, non-projective and projective, etc.) that may potentially support temporal descriptions as outlined in Tenbrink (2011). This is true for the wellresearched English language, which is the basis for the current framework, but also for other cultures and languages, which have only partly been explored so far with respect to their spatiotemporal conceptualizations. For future work, we therefore target an extension of the simulation to other languages, but also to more than four directions, and to three-dimensional as well as temporal situations. The roles will be generalized to allow for multiple fillers, such as several relata or addressees. The simulation will also be lifted to an ontology of spatial referencing, tied into an upper level ontology like DOLCE and/or GUM. Apart from the theoretical insights to be gained from this, it will provide a backbone to models of spatial referencing in areas like robotics, indoor navigation, or choreography, where resorting to geodetic reference systems is often impractical or insufficient. Rather than representing an account of spatial referencing per se, this framework is intended as a basis for further exploration. One major purpose is to facilitate further discussion by providing a comprehensible toolbox for research within the domain of space, based on a more flexible and integrative representation of spatial relationships than has been available before. This toolbox may be employed and further explored also for those cases that are not currently directly represented by the available models.
A Model of Spatial Reference Frames in Language
387
It supports systematic explorations concerning the extent to which particular spatial models are transferred in a language to the temporal domain (cf. Tenbrink, 2011), highlighting universal as well as idiosyncratic principles in cross-linguistic research. As research progresses and further cognitively relevant distinctions are revealed, these can be incrementally incorporated using the proposed roles and relations as basic ingredients. Finally, beyond the description of general principles of conceptualization, the framework can be used as a tool for analysis of discourse expressing concepts of space and, furthermore, of time, contrasting speakers' pragmatic choices in actual language usage with the generally available repertory of a language. Acknowledgements. Funding by the DFG to the first author, project I5-[DiaSpace], SFB/TR 8 Spatial Cognition, and to the second author, speaker of the IRTG Semantic Integration of Geospatial Information, is gratefully acknowledged. Joana Hois has provided invaluable advice in the development of the conceptual framework. Comments from four anonymous reviewers and many colleagues in the SFB/TR 8 and IRTG helped us improve the model and its presentation.
Appendix Without further explanation of Haskell syntax (which, at this level, is largely selfexplanatory), we present illustrative excerpts from our simulation code. They constitute more than half of the entire code (not counting the example data). First, we list the main declarations: Positions are cells and directions are vectors in a simple raster: type Position = (Int, Int) type Footprint = [Position] type Direction = (Int, Int)
Directional systems are ordered lists of direction names: projective = ["front", "right", "back", "left"] inverse = ["back", "right", "front", "left"] compass = ["north", "east", "south", "west"]
Reference frames have a type and an associated directional system: data Frame = Intrinsic DirectionalSystem | Relative DirectionalSystem | Absolute DirectionalSystem
Spatial situations assign spatial roles to entities: data type type type
Situation Locatum = Relatum = Vantage =
= Situation Locatum Relatum (Maybe Vantage) Entity Entity Entity
Secondly, we show the small set of computations: Directions (as unit vectors) can be computed from Positions: fromTo :: Position -> Position -> Direction fromTo p1 p2 = (signum(fst p2-fst p1), signum(snd p2-snd p1))
388
T. Tenbrink and W. Kuhn
A direction seen from another is obtained by a vector rotation: rotate:: Direction -> Direction -> Direction rotate d1 d2 = (snd d2 * fst d1 - fst d2 * snd d1, fst d2 * fst d1 + snd d2 * snd d1)
A situation is internal if the locatum is contained in the relatum: internal (Situation locatum relatum vantage) = (position locatum) `elem` (footprint relatum)
The perspective defines the direction of the first element of the directional system. It is taken from the heading or motion of the relatum or vantage: perspective (Situation locatum relatum Nothing) = if (motion relatum) == (0,0) then heading relatum else motion relatum perspective (Situation locatum relatum (Just vantage)) = if (motion vantage) == (0,0) then heading vantage else motion vantage
Finally, the preposition of a sentence is computed from a situation and reference frame as follows: preposition situation frame = if internal situation then case frame of (Absolute directionalSystem) -> "in the " ++ directionalSystem!!(quadrant (direction situation)) ++ " of " (Intrinsic directionalSystem) -> case fst (direction situation) of 0 -> "in the " ++ directionalSystem!!(quadrant(direction situation))++" of " 1 -> "on the " ++ directionalSystem!!(quadrant(direction situation))++" side of " (Relative directionalSystem) -> case fst (direction situation) of 0 -> "in the " ++ directionalSystem!!(quadrant(direction situation))++" of " 1 -> "on the " ++ directionalSystem!!(quadrant(direction situation))++" side of " else case frame of (Absolute directionalSystem) -> directionalSystem!!(quadrant(direction situation))++" of " (Intrinsic directionalSystem) -> case (direction situation) of (0,1) -> "in " ++ directionalSystem!!(quadrant (direction situation))++" of " (1,0) -> "to the " ++ directionalSystem!!(quadrant (direction situation))++" of " (0,-1) -> "behind " ++ directionalSystem!!(quadrant (direction situation)) (-1,0) -> "to the " ++ directionalSystem!!(quadrant (direction situation))++" of " (Relative directionalSystem) -> case (direction situation) of (0,1) -> "behind " (1,0) -> "to the " ++ directionalSystem!!(quadrant (direction situation))++" of " (0,-1) -> "in " ++ directionalSystem!!(quadrant (direction situation))++" of " (-1,0) -> "to the " ++ directionalSystem!!(quadrant (direction situation))++" of ".
A Model of Spatial Reference Frames in Language
389
References 1. Bateman, J., Hois, J., Ross, R.J., Tenbrink, T.: A Linguistic Ontology of Space for Natural Language Processing. Artificial Intelligence 174, 1027–1071 (2010) 2. Bateman, J., Tenbrink, T., Farrar, S.: The Role of Conceptual and Linguistic Ontologies in Discourse. Discourse Processes 44(3), 175–213 (2007) 3. Brugman, C., Lakoff, G.: Cognitive topology and lexical networks. In: Small, S.L., Cottrell, G.W., Tanenhaus, M.K. (eds.) Lexical Ambiguity Resolution, pp. 477–508. Morgan Kaufmann, San Mateo (1988) 4. Carlson-Radvansky, L.A., Covey, E.S., Lattanzi, K.M.: “What” effects on "where": Functional influences on spatial relations. Psychological Science 10, 516–521 (1999) 5. Carroll, M.: Changing place in English and German: language-specific preferences in the conceptualization of spatial relations. In: Nuyts, J., Pederson, E. (eds.) Language and Conceptualization, pp. 137–161. Cambridge University Press, Cambridge (1997) 6. Clark, H.H.: Space, time, semantics, and the child. In: Moore, T.E. (ed.) Cognitive Development and the Acquisition of Language, pp. 27–63. Academic Press, N.Y (1973) 7. Coventry, K.R., Garrod, S.C.: Saying, seeing and acting: The psychological semantics of spatial prepositions. Psychology Press, Hove (2004) 8. Eshuis, R.: Memory for Locations Relative to Objects: Axes and the Categorization of Regions. In: van der Zee, E., Slack, J. (eds.) Representing Direction in Language and Space, pp. 226–254. Oxford University Press, Oxford (2003) 9. Fillmore, C.J.: Lectures on Deixis. Indiana, Bloomington (1997) 10. Frank, A.U.: Formal Models for Cognition - Taxonomy of Spatial Location Description and Frames of Reference. In: Freksa, C., Habel, C., Wender, K.F. (eds.) Spatial Cognition 1998. LNCS (LNAI), vol. 1404, pp. 293–312. Springer, Heidelberg (1998) 11. Freksa, C.: Using Orientation Information for Qualitative Spatial Reasoning. In: Frank, A.U., Campari, I., Formentini, U. (eds.) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, pp. 162–178. Springer, Berlin (1992) 12. Gorniak, P., Roy, D.: Grounded Semantic Composition for Visual Scenes. Journal of Artificial Intelligence Research 21, 429–470 (2004) 13. Habel, C., Eschenbach, C.: Abstract Structures in Spatial Cognition. In: Freksa, C., Jantzen, M., Valk, R. (eds.) Foundations of Computer Science - Potential - Theory – Cognition, pp. 369–378. Springer, Berlin (1997) 14. Halliday, M.A.K., Matthiessen, C.M.I.M.: Construing experience: A language-based approach to cognition. Continuum, London (1999) 15. Herrmann, T.: Vor, hinter, rechts und links: das 6H-Modell. Psychologische Studien zum sprachlichen Lokalisieren. Zeitschrift für Literaturwissenschaft und Linguistik 78, 117– 140 (1990) 16. Herrmann, T., Grabowski, J.: Sprechen. Psychologie der Sprachproduktion. Spektrum, Heidelberg (1994) 17. Herskovits, A.: Language and spatial cognition. Cambridge University Press, Cambridge (1986) 18. Hill, C.: Up/down, front/back, left/right. A contrastive study of Hausa and English. In: Weissenborn, J., Klein, W. (eds.) Here and There. Cross-linguistic Studies on Deixis and Demonstration, pp. 13–42. Benjamins, Amsterdam (1982) 19. Jörding, T., Wachsmuth, I.: An Anthropomorphic Agent for the Use of Spatial Language. In: Coventry, K.R., Olivier, P. (eds.) Spatial Language: Cognitive and Computational Aspects, pp. 69–85. Kluwer, Dordrecht (2002)
390
T. Tenbrink and W. Kuhn
20. Kuhn, W.: An Image-Schematic Account of Spatial Categories. In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 152–168. Springer, Heidelberg (2007) 21. Kuhn, W.: A Functional Ontology of Observation and Measurement. In: Janowicz, K., Raubal, M., Levashkin, S. (eds.) GeoS 2009. LNCS, vol. 5892, pp. 26–43. Springer, Heidelberg (2009) 22. Lakoff, G., Johnson, M.: Metaphors we live by. University of Chicago Press, Chicago (1980) 23. Langacker, R.W.: Grammar and Conceptualization. Mouton de Gruyter, Berlin (1999) 24. Levelt, W.J.M.: Perspective Taking and Ellipsis in Spatial Descriptions. In: Bloom, P., Peterson, M.A., Nadel, L., Garrett, M.F. (eds.) Language and Space, pp. 77–107. MIT Press, Cambridge (1996) 25. Levinson, S.C.: Frames of reference and Molyneux’s question: Crosslinguistic evidence. In: Bloom, P., Peterson, M.A., Nadel, L., Garrett, M.F. (eds.) Language and Space, pp. 109–169. MIT Press, Cambridge (1996) 26. Levinson, S.C.: Space in Language and Cognition. Cambridge University Press, Cambridge (2003) 27. Miller, G.A., Johnson-Laird, P.N.: Language and Perception. Cambridge University Press, Cambridge (1976) 28. Moratz, R., Ragni, M.: Qualitative spatial reasoning about relative point position. Journal of Visual Languages and Computing 19, 75–98 (2008) 29. Moratz, R., Tenbrink, T.: Spatial reference in linguistic human-robot interaction: Iterative, empirically supported development of a model of projective relations. Spatial Cognition and Computation 6(1), 63–106 (2006) 30. Pederson, E.: How many reference frames? In: Freksa, C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition III: Routes and Navigation, Human Memory and Learning, Spatial Representation and Spatial Learning, pp. 287–304. Springer, Berlin (2003) 31. Pribbenow, S.: Zur Verarbeitung von Lokalisierungsausdrücken in einem hybriden System. Dissertation, Fachbereich Informatik der Universität Hamburg (1991) 32. Regier, T., Carlson, L.: Grounding spatial language in perception: An empirical and computational investigation. Journal of Experimental Psychology: General 130(2), 273– 298 (2001) 33. Retz-Schmidt, G.: Various views on spatial prepositions. AI Magazine 9(2), 95–105 (1988) 34. Schober, M.F.: Spatial Perspective-Taking in Conversation. Cognition 47, 1–24 (1993) 35. Svorou, S.: The Grammar of Space. Benjamins, Amsterdam (1994) 36. Talmy, L.: Toward a Cognitive Semantics. MIT Press, Cambridge (2000) 37. Tenbrink, T.: Space, Time, and the Use of Language. Mouton de Gruyter, Berlin (2007) 38. Tenbrink, T.: Reference frames of space and time in language. Journal of Pragmatics 43(3), 704–722 (2011) 39. Tenbrink, T., Moratz, R.: Group-based Spatial Reference in Linguistic Human-Robot Interaction. In: Proceedings of EuroCogSci 2003: The European Cognitive Science Conference, Osnabrück,Germany, September 10-13, pp. 325–330 (2003) 40. Tyler, A., Evans, V.: The Semantics of English Prepositions: Spatial Sciences, Embodied Meaning, and Cognition. Cambridge University Press, Cambridge (2003) 41. Winterboer, A., Tenbrink, T., Moratz, R.: Spatial Directionals for Robot Navigation. In: Dimitrova-Vulchanova, M., van der Zee, E. (eds.) Motion Encoding in Spatial Language. Oxford University Press, Oxford (in press)
Universality, Language-Variability and Individuality: Defining Linguistic Building Blocks for Spatial Relations Kristin Stock and Claudia Cialone Centre for Geospatial Science, University of Nottingham, UK {Kristin.Stock,Claudia.Cialone}@nottingham.ac.uk
Abstract. Most approaches to the description of spatial relations for use in spatial querying attempt to describe a set of spatial relations that are universally understood by users. While this method has proved successful for expert users of geographic information, it is less useful for non-experts. Furthermore, while some work has implied the universal nature of spatial relations, a large amount of linguistic evidence shows that many spatial relations vary fundamentally across languages. Natural Semantic Metalanguage (NSM) is a body of linguistic research that has identified the few specific spatial relations that are universal across languages. We show how these spatial relations can be used to describe a range of more complex spatial relations, including some from non-IndoEuropean languages that cannot readily be described with the usual spatial operators. Thus we propose that NSM is a tool that may be useful for the development of the next generation of spatial querying tools, supporting multilingual environments with widely differing ways of talking about space. Keywords: linguistics, spatial relations, multilingualism, Natural Semantic Metalanguage, spatial query.
1 Introduction Research into cognitive and linguistics aspects of spatial relations has been carried out for a number of years with the aim of developing tools for effective spatial querying. While some work acknowledges and explores potential language differences [1] [2], most nevertheless assumes either directly or implicitly that a set of spatial relations exists that are (or can become) universally understood across languages and across individuals who speak the same language [3] [4]. This paper argues that there is a very small set of spatial relations that are universal in their use across languages and by individuals, and a much larger (potentially infinite) set of spatial relations that are linguistically-variable and/or individual. Most of the spatial relations that have been studied in the context of spatial querying fall into the latter category. Furthermore, we assert that most of the fixed sets of spatial relations used for spatial querying (for example, the Oracle spatial operators1) are not universally understood, unless users are taught to understand them in a particular way. 1
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14255/sdo_operat.htm
M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 391–412, 2011. © Springer-Verlag Berlin Heidelberg 2011
392
K. Stock and C. Cialone
In addition to presenting the argument for a small set of universal spatial relations that can support a much larger set of linguistically-variable and individually-variable spatial relations that are dynamically user-described, this paper shows how a set of semantic primitives (referred to as Natural Semantic Metalanguage or NSM) identified in a linguistic research programme spanning 40 years first theorized by the linguist Andrzej Bogusławski [5] and empirically developed further by the linguists Anna Wierzbicka and Cliff Goddard [6] can be used to define more complex spatial relations, as supported by our experiments and examples included herein. The definitions can also incorporate different spatial cognitive models (some of which are also linguistically-variable), such as bird’s-eye views vs. interactive (moving through) views of the landscape, and different reference systems. By linguistic variability, we mean that a spatial relation term in one language either has no equivalent term in some languages (for example, the lack of egocentric, projective spatial relations such as to the left and to the right in some non-IndoEuropean languages of the South Pacific and America [7]) or has a roughly equivalent term in another language, but varies slightly in its meaning (e.g., the Russian term for crosses, perekhodit’2 as better illustrated in section 5.2 below). By individual variability, we mean that a spatial relation term may mean different things to different people, possibly only in some contexts. For example, the well studied road crosses the park phrase can be interpreted as meaning goes in from one side, crosses the centre and goes out of the other side (as is generally the case in the experiments by Mark and Egenhofer [1] [2] [3] [4]) but has also been interpreted as zig-zags by one of our colleagues. Such individual variations depend on background, experience, and education. For example these spatial relations are often understood similarly by geospatial professionals (even across languages), but less so by naïve users. The motivation for our work is the development of multilingual natural language spatial querying interfaces that are intuitive and assume no expert knowledge. We aim to define a launching pad for the development of an approach that reflects human thinking (including its vagueness, context-sensitivity, individuality, linguistic variability). In this paper, we discuss language-variability, rather than cultural-variability. Our work is based on linguistics, and the languages that people use to interact with each other and describe the world, not specifically on their culture, although in many cases there may be a relationship between the two. We also do not make any claims about the relationship between language and thought as discussed in the Sapir-Whorf Hypothesis [8] [9]. Our work explores the ways in which people use language to describe particular spatial relations, and while their culture may make them more or less likely to be aware of particular spatial relations, our work has not explored this. In this research, we are not concerned with the semantics of geographic features (such as hydrographic features like seas and rivers, or physiographic features like mountains, hills and deserts), but with spatial relations. Spatial relations include a wide range of referential expressions, prepositions, verbs of motion and deixes, adverbs intended in a general sense to express relations between figures (2D and 3D 2
The Cyrillic alphabet, where necessary, has been transliterated following the library of Congress Charts except for the examples in 5.7 where the NSM is left in Cyrillic. http://www.loc.gov.
Universality, Language-Variability and Individuality
393
objects and people) and between these and the surrounding space or spatial context. Examples of such relations are many, and include right, left, in front of, through, in, on, on the other side of, here, inside, close, far, surrounds, alongside. The research proposed here is the result of a combination of geospatial work, and linguistic investigation. So far difficulties have been identified in defining spatial queries in the geographic field that are understood and processed by machines without cross-linguistic research underneath. This is why the work described in this paper stems from the research of well-recognized linguists and applies it in a spatial context. The research in this paper does not attempt to defend the linguistic thesis of NSM but intends to bridge the gap between the way humans express themselves and the way machines process information. This paper is structured as follows. Section 2 begins by summarising work that provides evidence of the universality of spatial relations, or that conducts work in which such an assumption is implicit. Section 3 provides evidence against the universality of some spatial relations, with examples of spatial relations that do not exist across a range of languages, or that exist with different semantics. In Section 4, we summarise linguistic work on NSM that includes the identification of a set of linguistically universal spatial relations, and in Section 5, we illustrate how these NSM primitives may be used to express the non-universal spatial relations, with empirical evidence. In Section 6 we discuss the work and future direction.
2 Arguments for the Universality of Spatial Relations Most current approaches to spatial querying provide a fixed set of spatial operators conceived to be universal and unambiguous. For example, Oracle provides spatial operators such as ANYINTERACT, COVERED BY, OVERLAPBYDISJOINT etc.3, ESRI tools provide a wide range of spatial relation operators including RELATE, INTERSECTS, CROSSES4, and the Open Geospatial Consortium Spatial Filter Encoding Standard [10] together with those mentioned for the ESRI tools, also includes BEYOND. These spatial operators have their roots in various theoretical research endeavours underlying universality in cognition, imagination and language ([11],[12]). Computational approaches have focused on unambiguous spatial operators by combining spatial primitives thought to be universal (e.g. in and on) [13] [14] [15]. Much of the research on this issue recognises that metric (quantitative) approaches to spatial relations are difficult for users to work with, and instead proposes either a balance between metrics and topology [16] or simply the use of topological spatial relations, [17] which may be manipulated using qualitative spatial reasoning as in Talmy [18]. The most prominent examples include the Regional Connection Calculus [19] and the 9-Intersection Model [20] [21]. Both theories are defined as cognitively
3
4
http://download.oracle.com/docs/cd/E11882_01/appdev.112/e11830/sdo_locator.htm# CFACCEEG http://webhelp.esri.com/arcgisserver/9.3/java/index.htm#geodatabases/spatial_relationships. htm
394
K. Stock and C. Cialone
adequate [22] for Geographic Information Systems (GIS), linguistics, computing science and related fields. Mark and Egenhofer [3] [4] support their theoretical work with a series of experiments, and although they do not claim linguistic universality their work aims to determine whether the 9-Intersection Model adequately represents intuitive ways of thinking about spatial relations (for example, by grouping similar types of relations together). Also, their experiments adopt statistical measures that aim to find a common understanding of the types of spatial relations that can be linguistically expressed and reflected in the 9-Intersection Model. Specifically, this work shows two things. First of all, that individuals within the linguistic samples examined demonstrate a broad similarity in semantic groupings of spatial relations; and secondly that the relations grouped by the 9-Intersection Model correspond well with the ones most often chosen by the subjects. The 9-Intersection work was later extended topologically as in [23] [24] but also considering metric parameters, including angles and precise distances, thus allowing descriptive refinement of spatial relations [25] [26]. However, although metrics can be an important factor when expressing information concerning direction, the 9Intersection Model could not be suitable for the representation of all of the spatial concepts if not extended to include orientation [27], which is itself crucial from a cross-linguistic point of view. What is more, previous work has proved useful in establishing a generic set of spatial relations that are suitable for use in spatial querying likely (from a statistical perspective) to be understood in a common way by many users. However, the general approach of adopting a common set of spatial relations, even if there is evidence that they are commonly understood to some degree, has limitations in that it does not allow individual differences in interpretation to be expressed, and also does not cater for the spatial relations that appear in some languages and not others, or that have subtly different meanings in different languages. Furthermore, previous approaches emphasize what it is possible to say under certain topological conditions, which is important, but not what is instinctually and habitually said in everyday dialogue.
3 Evidence of Language Dependence and Individuality of Spatial Relations In contrast to the usual spatial querying approaches that focus on an attempt to develop a common set of spatial operators, other research in cognitive science and linguistics has shown that there are variations in spatial relation concepts. 3.1 Linguistic Relativity and Spatial Relations Much work on linguistic relativity investigates the notions of grammar and syntax and asserts that they are of dubious universality across languages [28] [29]. This is further extended by some criticisms [30] of Chomsky’s theories on generative and universal grammar [31]. However, we do not discuss the Chomskian case further in our work, since grammar and syntax are not our focus in this work.
Universality, Language-Variability and Individuality
395
The issue of linguistic relativity (related to spatial relations or not) has been supported by significant theoretical and empirical work. The psycholinguist Levinson [7] [32] is a leading thinker in this area, and provides an extensive range of linguistic and non-linguistic examples, supplemented by controlled cognitive experimentations [33]. For example, he investigated the extent to which speakers of different languages think about or remember spatial relations differently. The results confirm his initial intuitions of cognitive dependence on linguistic and cultural constraints. Levinson claims that high level concepts, packaged in lexical meanings, may vary from language to language; but that these may be unpacked into their low level components, which are candidates as universal concepts [7]. 3.2 Reference Systems An important area of linguistic variation occurs in the reference systems that people use to describe spatial relations in the world. Levinson, notwithstanding further complications related to this issue, distinguishes between three main reference systems a) intrinsic, related to an object’s properties, such as the house is in front of the building; b) absolute, related to geo-cardinal coordinates such as the hill to the north of the house; c) relative, viewer-centred such as the house to my left. While most of the Indo-European languages, including those used by Western cultures and also Indian and Iranian sub-branches use a combination of intrinsic and relative reference systems, many non-Indo-European languages tend to use the absolute system. In this system, orientation is expressed in cardinal terms (for example north and south) always having a fixed spatial schema in mind without using body-centred expressions such as to the right of or in front of and at the back of [7] [32].5 These spatial relations change according to the rotation of the figure object (in case of binary relation, e.g., ‘me and the cat’) or the viewpoint (in case of a ternary relation, e.g. ‘me/the viewer, the cat and the tree’). This means that either saying ‘the cat is on my left’ or, ‘the cat is to the left of the tree’ I will always be meaning, if no intrinsic specification is given for a third object, that ‘from this view point, the cat is further left in the visual field than the tree’. If you move your view point or the object then the cat is to the right of me or of the tree. For example, Aboriginal Australian languages such as the Guugu Yimithirr spoken in the Hopevale community (in Queensland) tend to use only the absolute reference system, independent of the viewer position (allocentric) [7] [32]. Tzeltal, the Mexican aboriginal language spoken in the Tenejapa community in Highland Chiapas and Belhare, a particular Tibeto-Burman language (in Eastern Nepal) exploits two interchangeable reference systems: the absolute and the intrinsic. The absolute system involves an idealized plane with fixed points: uphill, downhill and across (the latter meaning in the middle or at the same level). It is suggested that some communities do this because their language makes them constantly aware of their external environment, this being the focus of their rural activities and cultural
5
To the left, to the right etc are also projective spatial relations but this Section focuses on the frame of reference used, rather than the division between topological and projective spatial relations.
396
K. Stock and C. Cialone
traditions [34]. Sometimes the intrinsic and the relative systems can be used as well but this may be more related to the overture to the Western cash economies [7] [32]. Balinese, the Malayo-Polynesian language spoken on the Island of Bali seems to follow this approach too. They combine the more traditional absolute and intrinsic systems with sporadic use of the relative system, and this is thought to derive from contact with the Indonesian language. Some experiments showed that very young children (4-5 years old) preferred absolute referential frames, whereas older ones developed an egocentric view [35]. Therefore, users of absolute reference systems might find the common sets of spatial relations used for spatial querying limiting and counter-intuitive. Also, the use of absolute reference systems is dynamic, depending on the position of the person (for example, that table that was to the North of me when I was talking to my friend), and therefore have a temporal and situational element as well [9]. The handling of such reference systems requires a radical rethink of current spatial querying approaches if they are to operate in an environment with multiple natural languages. 3.3 Topology and Spatial Relations In addition to the variations in reference systems across languages, there is also evidence of relativity in the use of topology. For example, in Korean, it is possible to describe tight and loose containment. This means that attention is not given to the surface (the ground object) but rather to the verb itself (the motion), Korean being a verb-framed language [18] [36]. There is no equivalent concept in many languages, including English (further investigated in section 5.4). Tzeltal has on the one hand an all-purpose ‘positional’ preposition ta used to express location at a generic place which has a number of English corresponding propositions such as at, in, on, from, over, below. On the other hand, Tzeltal encodes the specificity of shape in the predicate and topology becomes more complex. For example single terms in Tzeltal exist to express spatial static concepts such as mounted on top of (Kajal), hanging down from (Jo’kol), and inserted inside of (tik’il) [37].6 As for the concept of containment in its own right (for example, the English in/inside of), Brown shows that to indicate in with the meaning of inside a container in Tzeltal, one is compelled to use a variable dispositional predicate (pachal, waxal, chepel) whose specific use depends on the attributes of the ground-container [38]. The examples above show clear divergence between spatial relations. There are also spatial relations that seem similar across languages, but are subtly different. The English spatial relation alongside (see section 5.2) seems to mean for English speakers (geospatially non-experts), simply static vicinity or better mere standing side-by-side in one point (as next to or side by side). However, this can be translated in Italian by non-experts in a number of ways of which some seem to express more dynamicity, probably because the Italian word itself obliges the speaker/viewer to move his/her eye along the surface of the object considered (e.g., lungo) and not to focus on a fixed position. 6
Levinson (2003), describes the Mayan Guugu Yimithirr as possessing more than 300 predicate roots indicating spatial configuration.
Universality, Language-Variability and Individuality
397
The English spatial relation (or spatial verb) crosses apparently corresponds to the Russian perekhodit’ and a number of other terms (see section 5.2). However, the Russian term, seems to focus on a range of topological possibilities of motion between a figure and a ground that the English crosses does not consider. Finally, there are examples of differences in semantics for spatial relations within one language. In English (as well as in Russian) there are vague interpretations of the meaning of traditional spatial relations such as intersects and crosses but also of other spatial relations such as surrounds (see section 5.3) among non-experts [38]. From the current cross-disciplinary debate two issues emerge: on the one hand there are topological models based on natural language, which seem to support the hypothesis of universalism; on the other hand there are systems of orientation and their different natural language expressions which appear to challenge the previous assumptions. However close linguistic analysis, together with the general difficulty in finding research reporting a firm example of natural language similarity across pairs of languages able to justify universalism, seems to support the view that natural languages inevitably tend to vary in regard to their use of a range of spatial relations.
4 A Set of Universal Spatial Relations The previous two Sections have explored the question of whether spatial relations are universal or linguistically and individually variable . However, we assert that it is not the case that all spatial relations are conceptually equal, but that some are universal while others are not. In fact, a significant body of linguistic research corroborated by linguistic experiments worldwide provides evidence for the view that only a few spatial relations are indeed universal, while the rest are not. This work is referred to as Natural Semantic Metalanguage (NSM) [5] [39] [40] [41], and does not attempt to constrain into formal models the peculiarities of different natural languages or to define cognitive models of people’s thought. NSM aims to identify the basic semantic primitives (not just spatial ones) of any natural language. Hypotheses have been advanced that venture into a definition of NSM as a lingua mentalis [42] (a representation of language as it s thought) as opposed to linguae vocales, or linguae gentium (a representation of language as it is spoken). By Lingua Mentalis it is then intended an outwardly spoken representation of our ‘inner thoughts’. Wierzbicka refers to the latter as deep as well as immensely complex ‘internal representations’ that should ‘correspond fairly closely to the surface form of the sentence’ [sic] [42]. However, linguistic analysis of these ‘internal representations’ at the level of language as it is spoken (a surface and culturallybound language) is not enough. The latter in fact needs to broken down to the level of language-independent atomic universal indefinibilia, which means it inevitably needs to be brought to the level of a lingua mentalis, to better grasp the semantic kernel of these ‘internal representations’. Yet, no claim is made on the morphological similarity of brain structures as the NSM’s research is not a neuro-scientific study of the brain rather a semantic and empirical analysis of the mind (and in fact the very Latin definition lingua mentalis refers to the language of mind not to the structures of brain) nor it is claimed easy to represent any complex concept (such as overlaps) in simple
398
K. Stock and C. Cialone
terms. But the empirical evidence demonstrates that a number of languages do have these atomic semantic units as building blocks to generate any other more complex linguistic expression [42]. The languages analyzed come from a range of linguistic families, from Mandarin Chinese to Niger-Congo; from Papuan to Tangkic; to the most common IndoEuropean languages such as English, Italian and Spanish [6].7 In addition to appearing in all languages studied, these semantic primitives were not further reducible to other concepts. Table 1 shows the 638 NSM semantic primitives in English. Table 1. NSM semantic primes9 Substantives: Relational substantives: Determiners: Quantifiers: Evaluators: Descriptors: Mental predicates: Speech: Actions, events, movement, contact: Location, existence, possession, specification: Life and death: Time:
Space: "Logical" concepts: Intensifier: Similarity:
I, YOU, SOMEONE, PEOPLE,SOMETHING/THING, BODY KIND, PART THIS, THE SAME, OTHER/ELSE ONE, TWO, SOME, ALL, MUCH/MANY GOOD, BAD BIG, SMALL THINK, KNOW, WANT, FEEL, SEE, HEAR SAY, WORDS, TRUE DO, HAPPEN, MOVE, TOUCH BE(SOMEWHERE),THERE IS / EXIST, HAVE, BE(SOMEONE/ SOMETHING) LIVE, DIE WHEN/TIME, NOW, BEFORE, AFTER, A LONG TIME, A SHORT TIME, FOR SOME TIME, MOMENT WHERE/PLACE, HERE, ABOVE, BELOW, FAR, NEAR, SIDE, INSIDE NOT, MAYBE, CAN, BECAUSE, IF VERY, MORE LIKE/AS
Specifically, the semantic primitives that can be considered spatial in nature include: BE(SOMEWHERE), MOVE, TOUCH, WHERE/PLACE, HERE, ABOVE, BELOW, FAR, NEAR, SIDE, INSIDE. These spatial relation primitives were studied in detail cross-linguistically to establish whether they were indeed universal [6] [41] [42] [43] 7
8
9
SM studies include English, Russian, French, Spanish, Italian, German, Portuguese, Polish, Danish, Ewe, Amharic, Malay, Japanese, Chinese, Korean, Mangaaba-Mbula (PNG), East Cree, Yankunytjatjara, Arrernte, Maori, Lao and others. On the NSM official website these seem to be now 64. However, no published work refers to the last added prime LITTLE/FEW. So we conform to the publications on the issue up to 2008/2009. The primes separated by a forward slash are linguistic allolexes or semantic variants (depending on the context etc.,e.g. THERE IS is a variation of EXIST; whereas the primes in brackets are specifications of the prime functor, e.g., BE(SOMEONE), BE (SOMEWHERE).
Universality, Language-Variability and Individuality
399
[44] [45] [46] and applied to different disciplines for example to media and technology in order to simplify analysis of spatial integration in digital images [47]. In the last few decades the search for universal lexical primitives has been at the foreground of semantic investigation in various fields [48] [49]. Moreover, there is a plethora of investigations more focused on lexical spatial primitives and the fuzzy ways these are understood cognitively and expressed linguistically [1] [2] [3] [4] [13] [14] [15] [50]. However these are not sufficiently tested over a reasonably large amount of languages to be defined as universal. In contrast, the NSM spatial relations are empirically proved to be multilingual. NSM uses semantic explications to describe the meaning of a word using universal primitives, allowing the meaning to be determined on the basis of those primitives. By relying on a specified, minimal metalanguage, the approach aims to maximize explicitness, clarity and translatability. NSM explications can, however, incorporate vagueness and subjectivity [6]. NSM is based on a number of assumptions or premises: a) that semantic analysis must be conducted in natural language, rather than technical formalisms, because the latter must usually be explained in terms of the former to be sufficiently clear to humans in any case; b) that the full meaning of any semantically complex expression can be expressed in terms of its equivalent paraphrase composed only of simpler meanings than the original, and that every language must have an irreducible core of semantic primitives and a syntax governing how they can be combined; c) that the NSMs of all human languages are essentially isomorphic, meaning that they share the same set of semantic primitives and that the exponents share a common set of combinatorial properties. NSM has been subject to criticism, mainly by those who link it to objectivism, logical atomism, positivism, abstractness and the issue of reductive paraphrasing that occupied linguists and philosophers for centuries as described by Goddard in [39]. However, these criticisms have been extensively addressed as follows: a) The NSM semantic primitives are linguistically embodied conceptual primes, but are not intended to reflect any objective reality; b) The NSM primitives respond to concrete word meanings in ordinary language not abstract; c) The NSM position does not define how people acquire semantic primitives; d) There is no claim that people compose their linguistic thinking from semantic primitives, or that comprehension involves real-time decomposition down to the level of semantic primitives; e) reductive paraphrasing inevitably leads to a point in which no further substitution is possible and only primitives remain, leaving the problem of how to define the primitives themselves [39]. The conclusion given is that further definition is not possible and ‘we necessarily have to stop at primitive terms which are undefined’ [51]. The use of NSM in the current context may also be criticized on the basis that it uses language (symbols) to represent semantics, and these may not reflect conceptualizations or referents, as illustrated in the well-known semantic triangle [52]. However, the use of language is at least as valid as other methods proposed (that use different types of symbols), and it could be argued that language is more likely to reflect human conceptualizations than the more formal representations often used as alternatives.
400
K. Stock and C. Cialone
5 Using NSM Spatial Relation Primitives as Building Blocks for Non-universal Spatial Relations Our thesis is that the NSM semantic primitives can be used as building blocks to allow non-expert users to describe spatial relations in a way that is intuitive (confirmed by our experiments [38]); that allows them to express what the spatial relation they choose means in their terms, and does not require them to adhere to a particular meaning of a term or set of terms (other than the semantic primitives, which have been found to be universal); and that can be applied in a multilingual environment. The semantic primitives are available in a number of different languages, and can be applied in any language to describe spatial relations that are not in the set of semantic primitives. This paper does not report on our experiments themselves. The reader is referred to [38] for more information about those experiments. The explications contained herein are examples that we have developed to illustrate the use of NSM for a range of different spatial relations, including some that are complex, and others that are widely linguistically variable. Nevertheless, this work is informed by our experiments, involving: • •
For English: a group of second year English-speaking GIS students, a group of PhD students with varying disciplinary backgrounds and a small number of other individuals with varying professional background. For Italian and Russian: some native speakers with other backgrounds.
During the course of this work, lack of agreement and difficulty expressing spatial relations was found, often leading to the use of diagrams or images, further supporting the lack of clarity in the meaning of spatial relation terms. In this Section, we demonstrate how NSM can be used to describe a range of different and complex spatial relations, including many of those presented in the previous Sections as being different from the typical examples from Indo-European languages. Our goal in doing this is to show that NSM may be a tool to develop a radically different approach to spatial querying that allows the query to accommodate different models of space and different understandings of spatial relations, and that therefore may be more suitable for naïve users in a multilingual environment. We do not confine our conception of spatial querying to the traditional use in desktop GIS, but instead consider its potential for a range of applications including web-based geospatial portals; hand held devices and satellite navigation systems. Some of these applications require the consideration of movement and height, as well as the more traditional spatial relations that consider a 2 dimensional, static, birds-eye view. The current approach to NSM is not a way to provide clarity in the definition of spatial relations. The current work is one of the possible approaches to assist nonexpert users in formulating geospatial queries in a way that is more congenial to their own mind and for the machine to understand and process these to return an efficient result. In the example explications shown, we include references to non-NSM concepts. The reason for this is that the semantics of many spatial relation terms depend on the geographic features to which they apply [18] [53] [54], and it seems that the most
Universality, Language-Variability and Individuality
401
practical way to include these geographic features is to embed appropriate ontologies/thesauri concepts in the NSM explications. NSM explications that define geographic features are long and difficult to write (often involving the use of molecules, which are NSM concepts that are not considered semantic primitives, but are still important across many languages [41] [55]), so the use of NSM in this way is not considered viable. However, NSM is useful for describing actions, processes and relationships. We thereby combine the use of NSM semantic primitives with multilingual ontologies/thesauri (for example, GEMET10). In the examples below, terms that could come from an appropriate thesaurus are marked with < > symbols. It is considered valid in NSM to use different conjugations and related forms of the semantic primitives (allolexes) to express variations in case, subject, object etc., in order to make the explications more readable. This is because languages vary widely in their handling of these grammatical aspects, but this does not invalidate the nature of the semantic primitives [6] [41]. 5.1 The Road Crosses the Park We begin with the popular example from Mark and Egenhofer [1] [2] [3] [4], further explored by Riedemann [56]. We provide 2 examples of explications that were written by real, non-expert people who were participants in our experiments ((1) and (2)), to illustrate the kinds of explications that can be written by non-experts with only a 5 minute explanation of NSM. These user explications do not necessarily describe the meaning of crosses that conforms to the dominant view discovered by Mark and Egenhofer. There might be, in fact, a degree of conceptual variation in the interpretation of ‘crosses’ that is proper for different individuals or communities. The user defined explications are followed by an interpretation that conforms to the more traditional spatial querying model (3). (1)
IN THIS PLACE THIS IS INSIDE THIS THIS SAME TOUCHES THIS SAME ON TWO SIDES.
(2)
FOR SOME TIME PART OF THIS IS IN THIS PLACE THIS PLACE WHERE THIS SAME EXISTS IS THIS SAME
Example (2) illustrates a common tendency for non-experts to interchange time and space in describing spatial relations using NSM. While conventional spatial querying approaches usually take a bird’s eye view and exclude temporality in the direct description of spatial relations, natural language often describes spatial relations in terms of a person moving through the landscape [18]. This example shows how NSM may be used to describe spatial relations using time, space or a mixture of the two, and many of our experiments showed a combined approach. This is thought to be emphasized by the lack of available vocabulary in NSM, in that sometimes the only way to describe a relation is through time and movement. However, this might also happen because some spatial and temporal primes are found to be analogous in human conceptualization (e.g. in some languages ABOVE with BEFORE and BELOW with 10
http://www.eionet.europa.eu/gemet
402
K. Stock and C. Cialone
AFTER; A LONG TIME with FAR, A SHORT TIME with NEAR) [6]. For example, Evans and Green attest that in Mandarin Chinese, the concept before is always conceptualized spatially as upper (so, ABOVE according to our arguments) while after as lower (so, BELOW) [57].11 (3)
PART OF THIS IS INSIDE THIS PART OF THIS SAME IS NOT INSIDE THIS SAME ON THIS SIDE PART OF THIS SAME IS NOT INSIDE THIS SAME ON THE OTHER SIDE
Example (3) provides only one possible way to describe the spatial relation crosses in the way geospatial experts might think of it. There are many other ways. These examples include the use of deixes (in which the spatial relation requires contextual information, as in THIS SIDE, OTHER SIDE), another common way of expressing spatial relations using natural language. 5.2 Spatial Concepts That Differ Subtly across Languages: Crosses, Alongside Across languages the concept of crosses may vary subtly. An example12 is the corresponding term in Russian for the English crosses. In English crosses gives some topological understanding of going through an area (see Section 5.1 above). In Russian the meaning of going through is refined by the type of motion: a) if the movement is not continuous (not bidirectional and habitual) but finished (unidirectional not habitual) and without transportation: pereidti; b) continuous and without transportation: perekhodit’;13 c) finished and by some means of transport: pereekhat’; d) continuous and by some means of transport: pereezdit’. In NSM these subtleties can be expressed in the cases below: a) A person (you) crossing the park walking in a finished movement: (4)
11
ТЫ НА ОДНОЙ СТОРОНE ЭТОГО <ПАРКА> ТЫ ДВИГАЕШЬСЯ НА ДРУГУЮ СТОРОНУ ЭТОГО САМОГО <ПАРКА> ТЫ НЕ ВНУТРИ ЭТОГО САМОГО <ПАРКА> СЕЙЧАС ПОСЛЕ ЭТОГО ТЫ НЕ ДВИГАЕШЬСЯ ВНУТРИ ЭТОГО САМОГО<ПАРКА> ДРУГОЕ ВРЕМЯ14
In Italian to make another example, Early Middle Ages (to have a common reference) is found in history books as Alto (high) Medioevo, and Late Middle Ages as Basso (low) Medioevo. 12 The Russian and Italian examples listed below come from one of the authors of this paper and are based on her knowledge of these languages and on unofficial experiments conducted on native Russian and Italian speakers and on some document corpora as the cross-linguistic research on spatial relations started after its English experimentation, but further official cross-linguistic user testing is due. 13 Subtle differences exist within the semantic structures of Russian itself where crosses is also expressed by non-experts as peresekat’, whose correct English version should be intersects. 14 English Versions: YOU ARE ON ONE SIDE OF THIS /YOU MOVE TO THE OTHER SIDE OF THIS SAME / YOU ARE NOT INSIDE THIS SAME NOW/ AFTER YOU DO NOT MOVE INSIDE THIS SAME OTHER TIMES.
Universality, Language-Variability and Individuality
403
b) A person crossing the park walking in a continuous movement: (5)
ТЫ НА ОДНОЙ СТОРОНE ЭТОГО <ПАРКА> ТЫ ДВИГАЕШЬСЯ НА ДРУГУЮ СТОРОНУ ЭТОГО САМОГО <ПАРКА> ТЫ НЕ ВНУТРИ ЭТОГО САМОГО <ПАРКА> СЕЙЧАС ПОСЛЕ ЭТОГО ТЫ ДВИГАЕШЬСЯ ВНУТРИ ЭТОГО САМОГО <ПАРКА> ДРУГОЕ ВРЕМЯ15
c) A person crossing the park by some means of transport in a finished movement: (6)
ТЫ ВНУТРИ ЧЕГО-ТО ЭТА ВЕЩЬ ДВИГАЕТСЯ ТЫ НА ОДНОЙ СТОРОНE ЭТОГО <ПАРКА> ВНУТРИ ЭТОЙ САМOЙ ВЕЩИ ТЫ ДВИГАЕШЬСЯ НА ДРУГУЮ СТОРОНУ ЭТОГО САМОГО <ПАРКА> ВНУТРИ ЭТОЙ САМOЙ ВЕЩИ ТЫ НЕ ВНУТРИ ЭТОГО САМОГО <ПАРКА> СЕЙЧАС ПОСЛЕ ЭТОГО ТЫ НЕ ДВИГАЕШЬСЯ ВНУТРИ ЭТОЙ САМOЙ ВЕЩИ ВНУТРИ ЭТОГО САМОГО <ПАРКА> ДРУГОЕ ВРЕМЯ16
d) A person crossing the park by some means of transport in a continuous movement: (7)
ТЫ ВНУТРИ ЧЕГО-ТО ЭТA ВЕЩЬ ДВИГАЕТСЯ ТЫ НА ОДНОЙ СТОРОНE ЭТОГО <ПАРКА> ВНУТРИ ЭТОЙ САМOЙ ВЕЩИ ТЫ ДВИГАЕШЬСЯ НА ДРУГУЮ СТОРОНУ ЭТОГО САМОГО <ПАРКА> ВНУТРИ ЭТОЙ САМOЙ ВЕЩИ ТЫ НЕ ВНУТРИ ЭТОГО САМОГО <ПАРКА> СЕЙЧАС ПОСЛЕ ЭТОГО ТЫ ДВИГАЕШЬСЯ ВНУТРИ ЭТОЙ САМOЙ ВЕЩИ ВНУТРИ ЭТОГО САМОГО <ПАРКА> ДРУГОЕ ВРЕМЯ17
The spatial relation alongside is another example of cross-linguistic subtle semantic difference. In English alongside is more static. In Italian there are different ways nonexperts can translate it, e.g. vicino a (expressing static vicinity), or lungo (which can 15
16
17
YOU ARE ON ONE SIDE OF THIS / YOU MOVE TO THE OTHER SIDE OF THIS SAME / YOU ARE NOT INSIDE THIS SAME NOW/ AFTER YOU MOVE INSIDE THIS SAME OTHER TIMES. YOU ARE INSIDE SOME THING/ THIS THING MOVES/ YOU ARE ON ONE SIDE OF THIS INSIDE THIS SAME THING/ YOU MOVE INSIDE THIS SAME THING TO THE OTHER SIDE OF THIS SAME / YOU ARE NOT INSIDE THIS SAME NOW/ AFTER YOU DO NOT MOVE INSIDE THIS SAME THING INSIDE THIS SAME OTHER TIMES. YOU ARE INSIDE SOME THING/ THIS THING MOVES/ YOU ARE ON ONE SIDE OF THIS INSIDE THIS SAME THING/ YOU MOVE TO THE OTHER SIDE OF THIS SAME INSIDE THIS SAME THING / YOU ARE NOT INSIDE THIS SAME NOW/ AFTER YOU MOVE INSIDE THIS SAME THING INSIDE THIS SAME OTHER TIMES.
404
K. Stock and C. Cialone
express conceptual dynamicity). In traditional spatial querying, the only way to express alongside would be by using next to, more similar to the Italian vicino a, thus losing dynamicity. Alongside and next to in English have a distinctive trace, though, that is of temporal nature. Alongside in fact for some speakers conveys, semantically, a prolonged adjacency of objects one by the side of the other, one of which is likely to be linear in nature; whereas next to conveys the immediacy of two objects that are side by side.18 For this reason the use of the time phrase FOR SOME TIME could be used to clarify the meaning of alongside with respect to the other similar spatial relation. Using NSM we are applying an approach that is flexible enough to be valid for a number of languages and possible queries expressed by non-experts. Considering the example, the river is alongside the park possible explications are shown in (8) to (9): (8)
THIS EXISTS NEAR THIS , MAYBE PART OF THIS TOUCHES ONE SIDE OF THIS SAME FOR SOME TIME MAYBE ALL THIS TOUCHES ONE SIDE OF THIS SAME FOR SOME TIME MAYBE PART OF THIS TOUCHES ONE OTHER SIDE OF THIS SAME FOR SOME TIME MAYBE ALL THIS TOUCHES ONE OTHER SIDE OF THIS SAME FOR SOME TIME
In Italian NSM, the dynamic shade would be expressed as: (9)
QUESTO SI MUOVE VICINO A QUESTO FORSE PARTE DI QUESTO SI MUOVE SU UN LATO DI QUESTO STESSO PER UN PO’ DI TEMPO FORSE TUTTO QUESTO SI MUOVE SU UN LATO DI QUESTO STESSO PER UN PO’ DI TEMPO FORSE PARTE DI QUESTO SI MUOVE SU UN ALTRO LATO DI QUESTO STESSO PER UN PO’ DI TEMPO FORSE TUTTO QUESTO SI MUOVE SU UN ALTRO LATO DI QUESTO STESSO PER UN PO’ DI TEMPO 19
This dynamicity could be expressed in English if predicate EXISTS is substituted with MOVES. Many other ways to express this are possible.
18
19
Not by chance in discourse analysis other uses of next to imply immediacy and not prolongation in succession, e.g. next of kin (a family relative immediately close), next week (as opposed to in two/three weeks). English version: THIS MOVES NEAR THIS , MAYBE PART OF THIS MOVES ON ONE SIDE OF THIS SAME FOR SOME TIME, MAYBE ALL THIS MOVES ON ONE SIDE OF THIS SAME FOR SOME TIME, MAYBE PART OF THIS MOVES ON ONE OTHER SIDE OF THIS SAME FOR SOME TIME, MAYBE ALL THIS MOVES ON ONE OTHER SIDE OF THIS SAME FOR SOME TIME.
Universality, Language-Variability and Individuality
405
5.3 Ambiguous Spatial Concepts in English: Surrounds Dolbear and Hart [58] discuss the difficulty of the semantics involved in the spatial relation surrounds, in the example the wall surrounds the field, because while the field may be described as inside the wall using a conventional system of spatial operators, breaks in the wall are possible without the spatial relation becoming invalid, but the traditional (mathematically formal) inside spatial relation would not permit this. The same problem does not arise in NSM. (10) THIS IS INSIDE THIS <WALL> MAYBE THIS SAME <WALL> TOUCHES THIS SAME ON MANY SIDES MAYBE THIS SAME <WALL> IS VERY NEAR THIS SAME ON MANY SIDES
Another, more egocentric way of expressing this relation is shown in (5). (11) I WAS NOT MOVING BEFORE FOR SOME TIME, I MOVE NEAR THIS <WALL> WHEN I MOVE THIS SAME <WALL> IS ON ONE SIDE OF ME AT THE SAME TIME THIS IS ON THE OTHER SIDE OF ME MAYBE FOR SOME VERY SHORT TIME THIS SAME <WALL> DOES NOT EXIST AFTER SOME TIME, I AM IN THE SAME PLACE AS I WAS BEFORE
Explication (11) also includes the use of deixes, temporality and movement to describe the spatial relation. While this may seem unusual from a traditional spatial querying perspective, it was a common approach found in our experiments with nonexperts. The fifth line encapsulates the notion of breaks in the wall, and while it could be criticized on the basis that it could literally mean that for some of the time the entire wall actually ceases to exist, the use of EXISTS in this way fits with the egocentric view of the explication. In this context, in fact the example works very well to express the semantics of surrounds, as a notion of distance between the wall and field is not necessarily part of the definition (for example, the wall could be right next to the field, or it could be miles away, but it would still surround it if a person travelled along it and returned to the same place. 5.4 Spatial Concepts That Differ Remarkably: Tight and Lose Containment In Korean there is a verb indicating tight fitting Kkita, one indicating loose fitting Nehta. In contrast, English only distinguishes between containment in/inside and support on. Thus, the Korean concept refers not only to placing objects in and on but also around and together, as long as the manner of placing is tight [24][58][59]. NSM explications can be used to express either simple containment and support or more complex fitting arguments. For example the English something is inside something else can be expressed simply as below ((12) to (14)), and alternatives are possible: (12) SOMETHING IS INSIDE SOME OTHER THING
406
K. Stock and C. Cialone
Although other alternatives are possible (involving TOUCH for example). The concept something is lose fit something else could be expressed too: (13) SOMETHING IS INSIDE SOME OTHER THING ALL THE SIDES OF THIS SOMETHING ARE NOT VERY NEAR ALL THE SIDES OF THIS OTHER THING
Or something is tight fit something else can be expressed as: (14) SOMETHING IS INSIDE THIS OTHER THING ALL THE SIDES OF THIS SOMETHING ARE VERY NEAR ALL THE SIDES OF THIS OTHER THING
5.5 Intrinsic and Relative Reference Systems Let us consider the example from Section 3.2 of the house is in front of the building. The example expresses an intrinsic reference system that involves defining the conceptual properties of the object (e.g., shape, motion, use etc) and so their front (in this case the building’s front) and defines a binary relation (2 objects). It could also be said the building is in front of me and the relation would still be binary (1 object and I) and the system would still be intrinsic [7]. Sometimes in natural language this generates confusion. With NSM the example above can be expressed unambiguously as follows: (15) IN THIS PLACE THERE IS A IN THIS SAME PLACE THERE IS A THERE IS ONE SIDE OF THIS SAME WHERE PEOPLE BEFORE WERE NOT INSIDE, AFTER PEOPLE CAN MOVE INSIDE THIS SAME ON THIS SAME SIDE THIS SAME IS ON THIS SAME SIDE OF THIS SAME
However, if it is not possible to define precisely what the objects’ front is as in the ball is in front of the pillar (or still a building but without a distinctive side declared as its front), then the relation requires an external perspective (ternary relation) to be described (e.g., the front with respect to me or another landmark) and the system becomes a relative reference system. In this case the NSM expression is tougher to express. One of the possible ways to express this is as below. (16) THERE IS A FAR FROM ME THERE IS A NEAR ME I CAN SEE THIS SAME BECAUSE OF THIS I CANNOT SEE PART OF THIS
5.6 Absolute Reference Systems The following subsection includes more complex examples ((17) to (20)) of absolute reference systems. Let us consider the example the city is to the north of the valley and its NSM explication:
Universality, Language-Variability and Individuality
407
(17) BEFORE I WAS IN THIS I MOVED MORE NEAR 20 NOW I AM IN THIS
Writing an NSM explication to reflect an absolute reference system based on cardinal directions (north, south, east and west) proves difficult. In this example, we include the concepts of north, south, east and west in the ontologies that support the NSM definitions. In the previous examples, these have been confined to geographic features but, a geographic location ontology (gazetteer) would be required to support the use of NSM for spatial querying. Such a gazetteer would include actual place names, and would allow specific places to be included in NSM spatial queries (for example, see explication (17)). We anticipate the extension of this geographic location ontology to include the concepts of north, south, east and west (only) to support absolute reference systems. However, these cardinal directions would not be used as spatial relations (as in THE NORTH OF THE , because this violates the universal nature of the spatial relations that are allowed in NSM. Different studies have defined a reasoning model for the cardinal directions that is quite efficient as in [60] and [61]. Others have conducted experiments in the field of vernacular geography21 to determine which measures are involved in the use of cardinal directions [62]. However, it should be recognized that the use of cardinal directions in NSM is not to be interpreted mathematically precisely, since natural language understanding of such directions has not found to conform to such mathematical precision. This example shows how rivers west of and near Nottingham might be expressed: (18) I AM IN NOW I MOVE AFTER I MOVE I AM NEAR <WEST> THESE ARE NEAR ME THESE ARE NEAR .
As already discussed in section 3.2, some languages (for example, Belhare or Tzeltal) use an absolute reference system in which a static abstract plane is used where directions are defined as uphill, downhill and across (not to be mistaken with the previous ‘crosses’, this indicates movement from and to respectively fixed East and West, rather than relative to the shape of the geographic feature). The example the well is uphill of the house can be written as: (19) THERE IS A PLACE, NOW, BEFORE, AFTER THIS <WELL> EXISTS IN THIS PLACE THIS EXISTS IN THE SAME PLACE THIS EXISTS IN THIS SAME PLACE THIS SAME <WELL> TOUCHES THIS SAME 20
21
Although the expression MORE NEAR NORTH sounds odd, it is the simplest way to express the idea of ‘more further north’ in NSM. https://gis.cs.cardiff.ac.uk/content/vernacular-geography-0.
408
K. Stock and C. Cialone THIS SAME TOUCHES THIS SAME THIS SAME <WELL> IS ABOVE THIS SAME
Or, to say that the river goes across the hill one can say: (20) THERE IS A THERE IS A THIS TOUCHES THIS IN THIS PLACE THIS TOUCHES THIS IN THIS OTHER PLACE THIS OTHER PLACE IS NOT ABOVE THIS PLACE WHERE THIS TOUCHES THIS THIS OTHER PLACE IS NOT BELOW THIS PLACE WHERE THIS TOUCHES THIS
6 Discussion and Further Work This paper has asserted that NSM may be useful for defining spatial relations in a widely varying multilingual environment for non-expert users, and has shown how various spatial relations can be described using NSM, combined with multilingual ontologies/thesauri, and particularly shows the flexibility of NSM in describing spatial relations from a number of different perspectives. Our work did identify some limitations in the application of NSM to the expression of geospatial relations. Firstly, NSM explications for absolute reference systems using cardinal directions proved difficult to compose without the support of a locationontology. We propose an approach that uses a very limited location-ontology to resolve this (see Examples 17 and 18). Secondly, many explications written by nonexperts turn out to be vague, and not to fully describe the spatial relation that they hold in their minds (on the basis of our experiments, described in detail in [38]). However, there was little evidence that they did not have a precise spatial relation in mind, since they were usually able to draw diagrams that expressed them clearly and to say what was not a valid example. Our view is that if users express a very vague spatial relation and get back a large set of results, they may then expand their query to be more precise in an iterative fashion. Despite these limitations, we believe that NSM offers a useful tool for the expression of spatial relations in a widely linguistically variable, environment; offering an alternative to the standardisation approach and allowing individuals and language groups to retain their own ways of expressing themselves. The ontologies and thesauri that are included in the NSM explications must be multilingual to ensure that the entire approach works across languages. In previous steps of our geo-semantic work we have used the GEMET Thesaurus and another vocabulary –that has developed from the GEOSS Societal Benefit Areas aligned in a flexible geospatial architecture [60] [61]. Following on from the theoretical work described in this paper, we are in the early stages of applying this approach to the development of a natural language, multilingual spatial querying tool that will allow users to write NSM explications combining NSM semantic primitives with multilingual ontologies and thesauri, and submit them as query expressions. We are developing a process that will map these NSM explications to the more conventional
Universality, Language-Variability and Individuality
409
spatial query operators (or often a series of such) in order to submit them to a database or web service to retrieve data. This process involves firstly removing differences of expression in the explications (these may be conceived of as noise) with the use of equivalence rules [37]; applying inference based on properties of the NSM explications (for example, transitivity) and then finally applying spatiallyspecific inference (for example to express how a collection of combined spatial and temporal primitives may be used to infer another kind of spatial relation, expressed in a different way; or to convert the use of deixis into a form that can be processed in an automated way). Finally, we are working on a multilingual NSM grammar validator to validate NSM explications entered by users.
7 Conclusions This paper has described some of the shortcomings of current approaches to spatial querying for non-expert, multilingual users in terms of the ability to describe spatial relations that are complex or linguistically dependent. The paper has proposed Natural Semantic Metalanguage (NSM) as an alternative to the use of a generic set of spatial relation operators that are not well understood by non-experts. It has shown the spatial operators’ limitations in expression in comparison with natural language. The paper has also shown how NSM can be used to describe more complex spatial relations, and has been experimentally supported by previous work which shows that NSM is intuitive and can be used with little training. This demonstration of the ability of NSM to describe spatial relations from various languages and of varying complexity represents a first effort in our ongoing geo-semantic work towards development of a natural language spatial querying tool based around NSM. Acknowledgements. The research described in this paper was accomplished as part of the EU funded EuroGEOSS project and the funder’s support is gratefully acknowledged as are the helpful comments received from the reviewers.
References 1. Mark, D., Egenhofer, M.J.: Modeling Spatial Relations Between Lines and Regions: Combining Formal Mathematical Models and Human Subjects Testing. Cartography and Geographic Information Systems 21(4), 195–212 (1994) 2. Mark, D., Egenhofer, M.: Calibrating the Meanings of Spatial Predicates form Natural Language: Line-Region Relations. In: Sixth International Symposium on Spatial Data Handling, Edimburg, Scotland, pp. 538–553 (1994) 3. Egenhofer, M.J., Mark, D.: Topology of Prototypical Spatial Relations Between lines and Regions in English and Spanish. In: National Center for Geographic Information and Analysis, pp. 245–254 (1995) 4. Mark, D., Comas, D., Egenhofer, M.J., Freundschuh, S.M., Gould, M.D., Nunes, J.: Evaluating and refining Computational Models of spatial Relations through crosslinguistic Human Subject Testing. In: National Center for Geographic Information and Analysis, pp. 553–568 (1995)
410
K. Stock and C. Cialone
5. Bogusławski, A.: On Semantic primitives and meaningfulness. In: Proceedings of a Conference Held in Kazimierz, Signs Language and Culture. Mouton, The Hague (1966) 6. Wiezbicka, A., Goddard, C.: Meaning and Universal Grammar: Theory and Empirical Findings, vol. 2(1). John Benjamins Publishing Company, Amsterdam (2002) 7. Levinson, S.: Language and Space: Exploratons in Cognitive Diversity. Cambridge University Press, UK (2003) 8. Whorf, B.L.: Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. Mass. Technology Press of Massachusetts Institute of Technology, Cambridge (1956) 9. Deutscher, G.: Through the Language Glass: Why the World Looks Different in Other Languages. Metropolitan Books (2010) 10. Vretanos, P.A.: OpenGIS® Filter Encoding Implementation Specification, OGC 04-095, version 1.0.0 (2005) 11. Johnson, D.: The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason. University of Chicago Press, Chicago (1987) 12. Jackendoff, R.: Semantics and cognition. MIT Press, Cambridge (1983) 13. Xu, Y., Kemp, C.: Constructing spatial concepts from universal primitives. In: 32nd Annual Conference of the Cognitive Science Society. Prize for Computational Modeling in Language (2010) 14. Feist, M.I.: On In and On: An Investigation into the Linguistic Encoding of Spatial Scenes. Doctoral dissertation (2000) 15. Feist, M.I.: Space Between Languages. Cognitive Science 32, 1177–1199 (2008) 16. Xu, J.: Formalizing Natural-Language Spatial Relations between Linear Objects with Topological and Metric Properties. International Journal of Geographical Information Science 21(4), 377–395 (2007) 17. Santos, M.Y., Moreira, A.: Topological Spatial Relations between a Spatially Extended Point and a Line for Predicting Movement in Space. In: 10th AGILE International Conference on Geographic Information Science. Aalborg University, Denmark (2007) 18. Talmy, L.: How Language Structures Space. In: Toward a Cognitive Semantics, ch. 2, vol. 2(1), pp. 177–254 (2000) 19. Cohn, A.G., Bennet, B., Gooday, J., Gotts, M.N.: Qualitative Spatial Representation and Reasoning with the Region Connection Calculus. GeoInformatica 1, 275–316 (1997) 20. Chen, J., Li, C., Li, Z., Gold, C.M.: A Voronoi-based 9-Intersection Model for Spatial Relations. International Journal of Geographical Information Science - GIS 15(3), 201–220 (2000) 21. Egenhofer, M., Sharma, G., Mark, D.: A Critical Comparison of the 4-Intersection and 9Intersection Models for Spatial Relations: Formal Analysis. In: Autocarto, MsMaster, R., Armstrong, M. (eds.) (1993) 22. Knauff, M., Rauh, R., Renz, J.: A Cognitive Assessment of Topological Spatial Relations:Results from an Empirical Investigation. In: Frank, A.U. (ed.) COSIT 1997. LNCS, vol. 1329, Springer, Heidelberg (1997) 23. Clementini, E., Di Felice, P., Van Oosterom, P.: A small Set of Formal Topological Relationships Suitable for End-User Interaction. In: Abel, D.J., Ooi, B.-C. (eds.) SSD 1993. LNCS, vol. 692, pp. 277–295. Springer, Heidelberg (1993) 24. OuYang, J., Fu, Q., Liu, D.: A Model for Representing Topological Relations Between Simple Concave Regions. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4487, pp. 160–167. Springer, Heidelberg (2007) 25. Egenhofer, J.M., Rashid, A., Shariff, B.M.: Metric Details for Natural-Language Spatial Relations’. ACM Transactions on Information Systems 16(4), 295–321 (1998)
Universality, Language-Variability and Individuality
411
26. Shariff, A., Egenhofer, J.M., Mark, D.: Natural-Language Spatial Relations between Linear and Areal Objects: the Topology and Metric of English-Language Terms. International Journal of Geographical Information Science 12(3), 215–246 (1998) 27. Shihong, D., Qiao, W., Qiming, Q.: Definitions of Natural-Language Spatial Relations: Combining Topology and Directions. Geospatial Information Science 9(1), 55–64 (2006) 28. Hill, J.H., Manneim, B.: Language and World View. Annual Review of Anthropology 21, 381–406 (1992) 29. Evans, N., Levinson, C.S.: The Myth of Language Universals:Language Diversity and its Importance for Cognitive Science. Behavioral and Brain Sciences (32), 429–448 (2009) 30. Postal, P., Levine, R.D.: A Corrupted Linguistics. In: Collier, P., Horowitz, D. (eds.) The Anti-Chomky Reader, pp. 203–32. Encounter Books (2004) 31. Chomsky, N.: Topics in the Theory of Generative Grammar. Walter de Gruyter, Berlin (1966) 32. Levinson, S., Pederson, E., Danzer, E., Wilkins, K.S., Senft, G.: Semantic Typology and Spatial Conceptualization. Language 24(3), 557–589 (1998) 33. Lucy, J.: Linguistic Relativity. Annual Review of Anthropology 26, 291–312 (1997) 34. Bickel, B.: Grammar and Social Practice: On the Role of ’Culture’. In: Niemeier, S., Dirven, R. (eds.) Linguistic Relativity, pp. 161–193. John Benjamins Publishing Company, Amsterdam (2000) 35. Wassmann, J., Dasen, P.: Balinese Spatial Orientation: Some Empirical Evidence of Moderate Linguistic Relativity. The Journal of the Royal Anthropological Institute 4(4), 689–711 (1998) 36. Slobin, D.I.: Verbalized Events: A Dynamic Approach to Linguistic Relativity and Determinism. In: Niemeier, S., Dirven, R. (eds.) Evidence for Linguistic Relativity, pp. 107–139. John Benjamins Publishing Company, Amsterdam (2000) 37. Brown, P.: The INs and ONs of Tzeltal Locative Expressions: The Semantics of Static Descriptions of Location. Spatial Conceptualization in Mayan Languages (special issue), Linguistics 32, 743–790 (1994) 38. Stock, K.: Describing Special Relations Using Informal Semantics. Extended abstract, presented at GIS Research UK (GISRUK 2010), London, UK, April 14-16 (2010) 39. Goddard, C.: Bad Arguments Against Semantic Primitives. Theoretical Linguistics 24, 129–156 (1998) 40. Goddard, C.: The Search for the Shared Semantic Core of All Languages. In: Goddard, C., Wierzbicka, A. (eds.) Meaning and Universal Grammar - Theory and Empirical Findings, vol. 2. John Benjamins, Amsterdam (2002) 41. Goddard, C.: Cross Linguistic Semantics. John Benjamin Publishing Company, Amsterdam (2008) 42. Wierzbicka, A.: Lingua Mentalis: The Semantics of Natural Language. Sydney, New York, London, Toronto, San Francisco (1980) 43. Peeters, B.: The Syntax of Time and Space Primitives in French. Language Sciences 19, 235–244 (1997) 44. Peeters, B., Junker, M.O., Farrell, P., Perini-Santos, P., Maher, B.: Semantic Primes and Universal Grammar: In: Peeters, B. (ed.) Empirical Evidence from the Romance Languages, vol. 81. John Benjamins Publishing Company, Amsterdam (2006) 45. Tong, M., Yell, M., Goddard, C.: Semantic primitives of time and space in Hong Kong Cantonese. Language Sciences 19, 245–261 (1997) 46. Hill, D., Goddard, C.: Spatial terms, polysemy and possession in Longgu (Solomon Islands). Language Sciences 19, 263–275 (1997)
412
K. Stock and C. Cialone
47. Cross, J.: Digital Images and the Z-Axis. Learning, Media and Technology 35(3), 337–349 (2010) 48. Dahlgren, K.: Naïve Semantics for Natural Language Understanding. Kluwer Academic Publishers, Boston (1988) 49. Schank, R.: Conceptual Dependency: A Theory of Natural Language Understanding. Cognitive Psychology 3(4) (1972) 50. Egenhofer, M.J., Mark, D.: Naïve Geography. In: Kuhn, W., Frank, A.U. (eds.) COSIT 1995. LNCS, vol. 988, Springer, Heidelberg (1995) 51. Arnauld, A., Nicole, P.: Logic, or, the Art of Thinking. In: Translated and edited by Buroker, J.V. (ed.) Cambridge University Press, Cambridge (1996) 52. Ogden, C.K., Richards Richards, T.A.: The Meaning of Meaning: a Study of the Influence of Language upon Thought and the Science of Symbolism. Routledge, London (1927) 53. Montello, D.R., Goodchild, M., Gottsegen, J., Fohl, P.: Where’s Downtown? Behavioral Methods for Determining Referents of Vague Spatial Queries. Spatial Cognition and Computation 3, 185–204 (2003) 54. Twaroch, F., Jones, C.B., Abdelmoty, A.I.: Acquisition of a Vernacular Gazetteer from Web Sources. In: Boll, S., Jones, C.B., Kansa, P., et al. (eds.) Proceedings of the First International, Workshop on Location and the Web, LocWeb, vol. 300, pp. 61–64. ACM, New York (2008) 55. Goddard, C.: Semantic Molecules and Semantic Complexity (with Special Reference to ‘Environmental’ Molecules). Review of Cognitive Linguistics 8(1), 123–155 (2010) 56. Riedemann, C.: Towards Usable Topological Operators at GIS User Interfaces. In: 7th AGILE Conference on Geographic Information Science, Parallel Session 8.1-’Data Usability’, Heraklion, Greece (2004) 57. Evans, V., Green, M.: Congitive Linguistics: An Introduction, Edinburgh (2006) 58. Dolbear, C., Hart, G.: From Theory to Query: Using Ontologies to Make Explicit Imprecise Spatial Relationships for Database Querying. In: International Conference on Spatial Information Theory (COSIT), Melbourne, Australia (2007) 59. McDonough, L., Choi, S., Mandler, J.M.: Understanding Spatial Relations: Flexible infants, lexical adults. Cognitive Psychology 46, 229–259 (2003) 60. Bowermann, M.: Containment, Support and Beyond: Constructing topological Spatial Categories in First Language Acquisition. In: Arnague, M., Hickmann, M., Vieu, L. (eds.) The Categorization of Spatial Entities in Language and Cognition, pp. 177–205. John Benjamins Publishing Company, Amsterdam (2007) 61. Frank, A.U.: Qualitative Spatial Reasoning: Cardinal Directions as an Example. International Journal of Geographical Information Science 10(3), 269–290 (1996) 62. Goyal, R., Egenhofer, M.J.: Cardinal Directions between Extended Spatial Objects. IEEE Transactions on Knowledge and Data Engineering (2000) 63. Hall, M., Jones, C.B.: Quantifying Spatial Prepositions: an Experimental Study. In: ACM GIS 2008, Irvine, CA, USA, November 5-7 (2008) 64. Cialone, C., Stock, K.: The Semantic Management of Environmental Resources within the Interoperable Context of EuroGEOSS: Alignment of GEMET and the GEOSS SBAs. At the European Geoscience Union, EGU 2010, Vienna, Austria, May 02-07 (2010) 65. Stock, K., Cialone, C.: An approach to the management of multiple aligned multilingual ontologies for a geospatial earth observation system. In: Claramunt, C., Levashkin, S., Bertolotto, M. (eds.) GeoS 2011. LNCS, vol. 6631, pp. 52–69. Springer, Heidelberg (2011)
The Semantics of Farsi be: Applying the Principled Polysemy Model Narges Mahpeykar and Andrea Tyler Department of Linguistics, Georgetown University Washington, D.C. 20057-1051 {nm352,tyleran}@georgetown.edu
Abstract. Recent research in cognitive linguistics has shown that a good deal of systematicity is involved in semantic of prepositions, once the many meanings of a single preposition are treated as motivated categories. The aim of this research is to apply a cognitive linguistics analysis for the semantics of a preposition in Farsi. The study takes Tyler and Evans’ [1] approach to polysemy as a means for developing the semantic network of Farsi preposition be with the central meaning similar to English to. For this purpose a large amount of corpus data were analyzed and the semantics of the preposition were studied. The results of the analysis showed that Tyler & Evanss’ Principled polysemy model can be successfully applied to the Farsi preposition be. The analysis sheds light on the semantics of this preposition and highlights the potential ability of this model for the systematic account of prepositions in languages other than English. Keywords: categorization, cognitive linguistics, preposition, prototype, spatial configuration.
meaning,
polysemy
1 Introduction Interacting with objects in the physical space is one of the most basic aspects of human experience and as such pervades human speech. Adequate computational modeling of humans’ natural use of spatial language requires an accurate account of the semantic complexities of spatial language. As Rudriguez & Egenhofer [2] note, a purely mathematical approach, which does not recognize the human element (or how humans think about spatial relations) cannot adequately capture how people understand and talk about space. Along the same lines a number of studies have shown that geometry alone is not sufficient for understanding spatial relations (e.g., [2], [3], [4], [5]). For instance, Garrod and Sanford [6] and Coventry and Garrod [3] carried out a series of experiments investigating English speakers’ understanding of the spatial relationship labeled by in. The participants first saw a prototypical example of the spatial relation labeled by in--a focus element (or in Langacker’s terms the Trajector) surrounded by a ground element (or in Langacker’s terms the Landmark). The scene was of an apple resting at the bottom of a transparent bowl. Participants then saw a sequence of pictures, each of which added more apples to the bowl. In the final pictures, the pile of apples exceeded the rim of the bowl such that several of the M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 413–433, 2011. © Springer-Verlag Berlin Heidelberg 2011
414
N. Mahpeykar and A. Tyler
apples were no longer surrounded by the bowl. Subjects consistently labeled all these representations of the apples (trajectors or TR) as being in the bowl (the landmark or LM). The researchers concluded that a TR being in a LM involves more than simply being enclosed by the LM. They argued that in order to adequately describe the relation labeled by in, a more complex geometry which also involves a functional relationship between the TR and LM is needed. This experimental evidence is coherent with Tyler & Evans’s [1, 7] cognitive linguistic (CL) account of the semantics of English prepositions which emphasizes that the central meaning of each preposition involves the spatial configuration between a TR and a LM as well as a functional element; the functional element represents the humanly salient consequences of the TR and LM being in that particular spatial configuration. In the case of in, the functional element is containment, which involves a number of consequences, including locational control of the TR by the LM. For instance, if the apples and the bowl are in an in relationship and someone moves the bowl, the apples (even those not physically surrounded by the sides of the bowl) move with it. An additional element of complexity in natural use of prepositions stems from the fact that all prepositions have developed a complex set of extended meanings. Many spatial operators have extended meanings which are spatial but which no longer match the original spatial configuration coded by the preposition. To get a glimpse of this complexity consider the following set of meanings for the English preposition over: 1a. The woman nailed a board over the hole in the ceiling: covering 1a’. The woman put her hands over her eyes. 1b. Arlington is over the river from the White House: on the other side 1c. The roofer came over to my house this morning: transfer or moved from point A to point B There is general agreement that the central spatial relation denoted by over involves a TR vertically elevated in relation to the LM. Note that in the use of over in sentence (1a), the board (the TR) is vertically lower than the LM. In sentence (1a’), there is a horizontal spatial relationship between the TR (hands) and the LM (eyes). In sentence (1b) over is indicating a spatial relation in which the TR (Arlington) is located on the other side of a boundary in relation to the LM (the White House).In sentence (1c) over is denoting a spatial relation involving motion along a path, in which the TR’s (the roofer) motion starts at point A (unknown) and ends at point B (my house). In addition, prepositions typically develop non-spatial meanings, such as: 2a. The bartender overserved the college students: beyond (the limit) 2b. There are over 100,000 soldiers deployed in the Middle East: more than 2c. The President chose negotiation over force: preference At the first glance the range of meanings appear to have little in common and seem to form an arbitrary list. Moreover, the many meanings are not obviously related to over’s central spatial sense, i.e. a TR vertically elevated in relation to the LM. Recent advances in CL (e.g. [1], [8], [9]) have demonstrated that polysemy networks are
The Semantics of Farsi be: Applying the Principled Polysemy Model
415
governed by systematic processes of meaning extensions. Virtually all the meanings can ultimately be traced back to the original TR-LM + functional element configuration. These theoretical insights have provided us conceptual tools with which to more fully understand how humans think about spatial relations, the interaction between spatial relations and language, and the systematic processes by which spatial terms develop multiple meanings. Due to the complexities of spatial language, computational linguists and researchers in the geographical information systems (GIS) have been developing qualitative representations of space by using different tools such as spatial relation algebras and metric details for the understanding of spatial relations [2, 10, 11]. In addition, in search for alternatives, researchers in GIS have been inspired by cognitive and linguistic theories about space (e.g. [12]). CL offers tools that more fully describe spatial language and thinking among which are image schemas, metaphorical relations, and multiple construals on a scene. Image schemas are conceptual structures that emerge from our everyday physical embodied experience such as container, path, and balance. They can be extended and metaphorically projected to give rise to new meanings. In their study, Rodgerz and Egenhofer [2] used image schemas as a foundation for constructing a cognitively plausible spatial-relation algebra. The spatial relations associated with the behaviors of the two image schemas containers and surfaces were analyzed and specified in terms of a relation algebra. Human interaction with a set of objects of different sizes in a room space was taken as the basis of the analysis. The algebra provided an inference mechanism for analyzing spatial relations from combination of image schemas in addition to the composition of individuals, representing a sound definition of the behavior of spatial relations. Along the same lines Reiger’s [12] analysis used a path image schema for analyzing a number of closed-class terms from a range of languages including English, Russian and Mextic. Based on the analysis, he offered a connectionist model for the linguistic system of those languages. The model contributes to the search for universals in the domain of human perceptual system, providing a preliminary model of human capacity for categorizing spatial relations. Although both these models moved our ability to model spatial relations forward, they used only a small subset of basic images schemas for motion, i.e. a beginning point, a path, and an end point, surface, and container, for describing spatial relations. One of the key purposes of the current study is to demonstrate a fuller range of concepts from a CL-based analysis of spatial language which we believe may be useful tools for computational modeling of how humans think and talk about space and spatial relations. A second purpose is to demonstrate that the Principled Polysemy model [7], which was originally developed for English prepositions, can be applied to a distantly related language, Farsi, and thus shed light on possible universal attributes of spatial thinking and spatial language. Finally, this study aims to take a CL approach to preposition analysis forward by basing the analysis on more rigorous and quantifiable linguistic evidence obtained through the analysis of naturally occurring, representative data. Most studies in this area have relied on analysts’ intuition or manual data analyses and only recently have corpus data been used for identifying sense distinctions [13, 14]. CL holds that much of language is motivated rather than either arbitrary or fullypredictable [15]. In particular, when a single phonological form is associated with
416
N. Mahpeykar and A. Tyler
multiple meanings, CL argues there is generally a systematic explanation for formmeaning connections. In the CL view, it is suggested that multiple meanings associated with a word are derived from a central or prototypical sense, forming a principled polysemy network rather than a list of arbitrary meanings [16]. According to Tyler and Evans [17], “The mental lexicon is not organized like a dictionary in which each meaning associated with the same phonological form represents an unrelated word. Rather, lexical items are better understood as forming natural categories that participate in organized semantic networks” (p. 259). As Tyler & Evans [1] explain, identifying the core or central sense of the word makes it possible to motivate the particular instances of use, including the peripheral spatial senses and figurative senses. For example, the comparison of the core meaning of the two near-synonyms over and above shows that the difference lies in the potential for contact between the TR and LM. This potential represents the functional element. Tyler & Evans [7] argue that over has a proximal functional element which allows for contact, while above has a distal functional element which requires a relation of separation or distance. The possible contact relation in over gives rises to extensions such as subtle variation in describing hierarchical interpersonal relations. Consider the following: 3a. Ms. Jones is above me at the company 3b. Ms. Jones is over me at the company In sentence (3a), the speaker is indicating that Ms. Jones’s position in the company is probably several rungs higher than the speaker’s and that there is probably not a lot of interpersonal contact between the two. In contrast, sentence (3b) suggests that Ms. Jones is likely to be in a supervisory role vis a vis the speaker, one which likely involves personal interaction between the two. Among the CL approaches to the semantics of prepositions, Tyler and Evans’ [1] model of polysemy networks stands out for providing a replicable methodology for identifying the central meaning, as well as an articulated set of principles accounting for meaning extension; they refer to their model as the Principled Polysemy Model. The model provides one of the most comprehensive set of tools in the CL approach to preposition analysis. It relies on recognized principles of language use and cognitive processing such as pragmatic inferencing, knowledge of force dynamics and experiential correlation. Experiential correlation together with pragmatic inferencing implies that the tight correlation between two events in every day experience can lead to the association of a new meaning with a particular lexical form. The association occurs through the continued use of the form in particular contexts in which the implicature occurs. For instance, in our daily experience we encounter recurring correlation between quantity and vertical elevation such as when objects are added to a pile or liquid is added to a container. In these instances an increase in quantity correlates with increase in height. The ubiquitous observation of these two cooccurring phenomena results in a strong cognitive association between the two. This conceptual association, in turn, is reflected in language such as ‘the prices have gone up’ which refers to an increase in amount rather than a literal increase in vertical elevation [18]. In this model, a spatial scene is the abstract representation of real world spatial configurations as reflected by human conceptual processing. The conceptualization
The Semantics of Farsi be: Applying the Principled Polysemy Model
417
consists of both configurational and functional elements. The configurational element consists of the TR and the LM. In most construals, the TR is smaller and more movable compared to the LM which is typically larger and static. The functional element denotes the humanly relevant, interactive relation between the TR and the LM in a particular spatial configuration. For example in the phrase ‘The water in the pan’ the preposition in denotes a particular spatial relation between the TR ‘The water’ and LM ‘the pan’. The spatial relation described by in designates a relation in which the TR is enclosed by the LM. The functional element which arises from this spatial configuration is that of containment. Containment is understood to involve several typical properties such as locating the entities being contained and obscuring the contained elements from an external viewer. Although the Principled Polysemy Model focused exclusively on English prepositions, Tyler and Evans [1] suggest that since the principles of the model inherently highlight the basic properties of human cognition, the model is likely to be applicable to languages other than English. Studies such as Shakhova and Tyler [9] have taken the hypothesis forward and provided insight to the semantics of the Russian preposition za. In their study, they also investigated how the model could be flexibly augmented when applied to a language with a highly complex system of case markings. The results of the study showed that Tyler and Evans’ [1] Principled Polysemy model can be successfully applied to other languages. The findings also shed light on some of the confusing aspects of the distribution of instrumental and accusative case in Russian, which are crucial in spatial descriptions. The current paper extends the investigation of the universality of the Principled Polysemy model by applying it to be, one of the most highly polysemous prepositions in Farsi. In order to establish the many meanings associated with be, a corpus of 20,000 words was searched, 1000 instances of be were analyzed and 15 different uses of the prepositions were identified. The following sections first report on the methodology used in the study and continue by explaining the semantic analysis of be. The complete network of be and the various senses associated with this preposition are presented at the end of the analysis. The final section of the paper includes a short summary of the results as well as the limitations and implications for future research in the field.
2 Methodology 2.1 The Preposition be There are a number of reasons why the study focuses on only one preposition. A finegrained analysis of the semantics of a single preposition is labor-intensive and requires a considerable amount of time to conduct and space to describe. Hence, the study focuses on a systematic, thorough inspection of a single preposition. Different senses of the preposition are studied and categorized in detail in order to justify the various meanings and to increase awareness of the spatial configuration associated with the preposition used in context. Another intention is to discover whether this indepth study would raise important questions for an extended future project. Most importantly the study explores the potential of combing corpus research with an
418
N. Mahpeykar and A. Tyler
in-depth theoretical cognitive analysis by looking at interesting instances of prepositional use identified during the analysis. The preposition be in Farsi is associated with a broad range of prepositions in English. The Aryanpour Persian-English dictionary [19] lists up to twelve different translations of the preposition in English including: to, for, at, by, into, against. Examples below demonstrate part of the challenge, representing the closest meaning of be and its translation taken from the Aryanpour Persian Dictionary. Table 1. English translations of be
English The river runs into the sea. He was injured by a knife. Please give the book to Pari. He threw the ball against the wall.
Farsi رودﺧﺎﻧﻪ ﺑﻪ درﻳﺎ ﻣﻴﺮﻳﺰد او ﺑﻪ وﺳﻴﻠﻪ ﭼﺎﻗﻮ زﺧﻤﯽ ﺷﺪ ﮐﺘﺎب را ﺑﻪ ﭘﺮﯼ ﺑﺪﻩ ﺗﻮپ را ﺑﻪ دﻳﻮار ﺑﺰن
Due to the high range of English equivalents for this preposition, it has been a challenge for learners of Farsi to use the corresponding English prepositions and particles correctly. So far, few studies have been conducted on the semantics of be as a preposition (e.g. [19], [20]) and no study has looked at Farsi prepositions from a CL perspective. Thus, the focus of the study is to first identify how Farsi speakers conceptualize space in terms of context in which be is used and second to test the applicability of Principled Polysemy Model to Farsi. 2.2 Identifying the Semantic Network for be As mentioned above, this study takes Tyler and Evan’s [1] principled polysemy model for identifying the semantic network associated with Farsi be. According to the model, a key component to the precise analysis of all polysemy networks is to identify the central or prototypical sense referred to as proto-scene. The extended meanings are then derived from the established proto-scene. In this model several steps are proposed for identifying the proto-scene, among which the four following steps are used for this study [1]: 1. Considering the etymological roots of the word including the earliest attested meaning. 2. To study the spatial configuration of TR and LM in various senses in which the spatial particle is used. 3. Considering contrastive sets of spatial particles. A contrastive set can involve aspects of the spatial configuration of the TR and LM, such as the contrast between the English prepositions over versus under. It can also involve variations in functional elements such as English over versus above. The functional element of over involves proximity while under involves distance. 4. Take into account the predominance of the sense in the polysemy network. This entails that most of the complex senses should derive from this sense which constitutes the proto-scene.
The Semantics of Farsi be: Applying the Principled Polysemy Model
419
2.3 Corpus Analysis A large part of the data for analysis was extracted from Bijan-khan: the online corpus of Farsi [22]. The corpus however, was not accompanied by any software for selecting specific lexical categories. Therefore, the first one thousand concordance lines including be were selected and moved to Excel spread sheets. The lines were then sorted according to the first word following the preposition in alphabetic order. This was an appropriate choice for organizing the sentences because the frequency of word patterns and lexical chunks including the preposition were more evident and could lead to further statistical data analysis [13]. The Farsi meanings of be suggested by the Aryanpour Persian dictionary were taken as a starting point for the analysis of the senses in the first one hundred lines of the corpus. The analysis then led to a primary understanding of the senses in the semantic network of be. Following the polysemy model, different spatial configurations of the TR and LM in each example were examined which added to the more precise analysis of the senses. Analyzing one thousand corpus lines and applying the polysemy model led to the identification of fifteen distinct senses of be under five individual clusters.
3 Analysis and Discussion 3.1 The Semantic Network for be The diagram below illustrates the main senses of the Farsi preposition be. In the following sections each sense will be analyzed and explained individually. A complete illustration of the different senses in the semantic network appears at the end of the analysis. Enclosed Path
Proximity
Orientation
Intended Recipient
Purpose
Limit
Fig. 1. Polysemy network for be
3.2 The Proto-Scene The proto-scene of be involves a spatial configuration in which the TR is oriented in respect to a highlighted LM. The TR is thus conceptualized as being asymmetrical
420
N. Mahpeykar and A. Tyler
front and back and the LM is considered to be in front of the TR. The functional element or the interactive relationship between TR and LM is ‘goal'. The orientation element is represented by a ‘nose’ on the TR, pointing towards the LM. The dotted lines around the LM indicate that the LM is highlighted and thus, conceptualized as a goal. The following diagram illustrates the proto scene posited for the central sense of be:
Fig. 2. Proto-scene of be
The central sense of be is similar to the central sense denoted by English to. According to Tyler and Evans [1] the proto-scene of to designates a relation in which the TR is oriented with respect to a highlighted LM. Previous approaches to the analysis of to entailed the assumption that motion is inherent to the spatial-scene of this preposition. Tyler and Evans [1] claims that, unlike these former analyses, path is not a part of the central meaning of to and only orientation is denoted. The Orientation sense of to which does not involve motion is illustrated by the following examples: 1) The church is facing to the west. 2) The compass needle points to the north. Similarly in Farsi the proto-scene of be involves orientation towards the highlighted LM and does not denote the conceptualization of path. The following sentence is an example of the proto-scene of be. The written form is right to left in Farsi. .اﺷﺎرﻩ ﻣﻴﮑﻨﺪ ﺳﺘﺎرﻩ هﺎ is pointing the stars He is pointing to the stars.
ﺑﻪ be
او he
This sense of be is in contrast with the preposition az in Farsi which has a similar meaning to English from. This preposition denotes a sense of the TR orienting away from the LM such as in the following sentence: .ﺁﻣﺪ ﮐﺘﺎب ﺧﺎﻧﻪ came the library He came from the library.
از be
او he
Examining a number of naturally occurring instances of be, we discovered some interesting examples of orientation including be which seem to be metaphorically related to this sense. In these contexts, the TR is oriented or motivated towards a nonphysical LM. Examples below are extracted from the Farsi Corpus and involve the metaphorical sense of orientation. Metaphor is a concept in CL which represents a
The Semantics of Farsi be: Applying the Principled Polysemy Model
421
common cognitive process regularly used in extending meanings [23]. Metaphor is a type of conceptual thinking, and conceptual metaphor maps structure from the source domain to the target domain in a unidirectional way. For instance, in the idiomatic expression ‘I was boiling with anger’, the source domain HOT FLUID is structured in terms of ANGER, which is the target domain. The following sentence illustrates be in a metaphorical sense of orientation: .ﻋﻮض ﮐﺮد
ﻧﺎﻣﺶ را
ﺑﻪ
ادﺑﻴﺎت
اش
ﻋﻼﻗﻪ
his interest changed his nickname literature be Due to his interest in literature he changed his nickname.
ﺑﻪ دﻟﻴﻞ due to
In English, the verb ‘interest’ collocates with the preposition in which denotes containment; the state of being interested is understood as being surrounded by the object of interest (in this case literature). In Farsi however, the conceptualization involves the state of ‘interest’ (the TR) being oriented towards a goal (in this case literature). One motivation for this analysis of be is that the Farsi preposition dar which clearly denotes containment and is not used in contexts such as the above. Instead be seems to occupy the role of orientation similar to English to (e.g. ‘interest to’ instead of ‘interest in’). Other examples of the extension of the orientation sense are illustrated in the examples below: ﻧﺪارﻧﺪ اﻋﺘﻘﺎد ﺧﺪا don’t believe God They don’t believe in God.
ﺑﻪ
ﺁﻧﻬﺎ they
be
In the above sentence the people (TR) believe to/towards God (LM).
ﺑﻪ
ﺟﺸﻦ ﻣﻴﮕﻴﺮﻧﺪ اﻳﻦ روز را اﻣﺎم celebrate this day Imam The faithful to Imam celebrate this day.
be
وﻓﺎداران the faithful
In the above sentence, the people (TR) are faithful towards the Imam or leader (LM). 3.3 Extended Network for be Proximity. Analyzing one thousand corpus lines and applying the polysemy model lead to the identification of fifteen distinct senses of be under five individual clusters. A large number of examples of be in the corpus appear to be somehow related to the notion of proximity. In this sense of be, the configuration of the TR and LM is similar to the proto-scene, except that the TR and LM are proximal to each other. In many instances, the TR (either oriented or not) is located in the vicinity of the highlighted LM and thus, we commonly observe TRs and LMs as being proximal to each other. The following sentence illustrates the proximity sense of be: ﻧﺸﺴﺖ دﺧﺘﺮ sat the girl He sat next to the girl.
ﺑﻪ ﮐﻨﺎر next to
او he
422
N. Mahpeykar and A. Tyler
The senses associated with Proximity are very close to the different senses denoted by English to including Locational, Contact, Attachment and Comparison. Figure 3 illustrates the proximity sense for be. The spatial scene represented in this diagram is very similar to the central sense except that the orientation of the TR is no more an element.
Fig. 3. Proximity sense of be
Contact Sense. This sense which is closely connected to the proximity sense designates a spatial configuration of contact between the TR and LM. According to Tyler and Evans [1] the contact sense basically arose from the experiential correlation between the achievement of a particular goal and contact (or very near contact). The examples below demonstrate this sense of be: .ﮐﺎر ﻣﻴﮑﺮدﻧﺪ ﻣﻌﺪﻧﭽﻴﺎن ﺑﺮاﯼ ﻧﺠﺎت ﺷﺎﻧﻪ ﺑﻪ ﺷﺎﻧﻪ داوﻃﻠﺒﺎن worked the miners to rescue shoulder to shoulder the volunteers The volunteers worked shoulder to shoulder to rescue the miners.
ﺑﻪ ﻟﺒﺎﻧﺶ
ﺑﻪ ﺁراﻣﯽ ﻧﯽ را دﺧﺘﺮﮎ slowly the straw The girl .را ﻧﻮﺷﻴﺪ و ﺗﻤﺎم ﺁب ﺳﻴﺐ ﻧﺰدﻳﮏ ﮐﺮد and drank all the apple juice brought The girl brought the straw slowly to her lips and drank all the apple juice.
be her lips
Like English, Farsi uses expressions equivalent to ‘hand to hand’, ‘toe to toe, ‘cheek to cheek’ all implying some sort of contact and closeness between two proximal entities. Figure 4 illustrates this sense for be.
Fig. 4. Contact sense of be
Attachment Sense. This sense is closely related to the contact sense. When a TR makes contact with a LM, it is not unusual for the TR to become attached to the LM. We observe the same notion playing in English to with sentences such as ‘Nail the board to the tree’ or ‘Staple the notice to the wall’. This sense of be denotes that the TR is joined or attached to the LM and in a way that it becomes part of or is contagious with the LM. The examples below demonstrate this sense of be.
The Semantics of Farsi be: Applying the Principled Polysemy Model
دﺳﺘﻪ ﮐﺘﺎب ﺑﻪ the pile of added on the table be books He added a book to the pile of books on the table.
.اﺿﺎﻓﻪ ﮐﺮد
روﯼ ﻣﻴﺰ
ﻳﮏ ﮐﺘﺎب را
او
a book
he
.ﻧﺼﺐ ﺷﺪﻩ
دﻳﻮار
ﺑﻪ
ﺗﺨﺖ ﺳﻔﻴﺪ
is affixed
the wall
be
the whiteboard
423
The whiteboard is affixed to the wall. The corpus included a number of metaphorical senses which seemed to originate from the attachment sense. In these sentences be denotes a sense of possession or belonging which can be basically understood in terms of our physical experience of holding objects. Being in physical control of objects entails the idea of being in control or closely associated with the object. The following sentence is an example of the metaphorical sense of be: .ﺗﻌﻠﻖ دارﻧﺪ ﮐﺘﺎﺑﺨﺎﻧﻪ belong the library The books belong to the library.
ﺑﻪ be
ﮐﺘﺎﺑﻬﺎ the books
The books (TR) ‘belonging’ to the library (LM) denotes a metaphorical sense of attachment.
Fig. 5. Attachment sense of be
Support Sense. Another sense closely related to the contact sense is the ‘support sense’. In this sense, be denotes a spatial scene including the TR leaning against a LM which is acting as a support for the TR. When a TR is attached to a LM, we understand that the LM is providing some support for the TR. This has become a separate sense associated with be in Farsi. In English this sense is usually denoted by the prepositions against and on. The following examples illustrate the English and Farsi usage of this sense: .و ﻣﻴﮕﺮﻳﺴﺖ ﺗﮑﻴﻪ ﮐﺮدﻩ ﺑﻮد دﻳﻮار ﺑﻪ ﭘﺴﺮﮎ and crying was leaning the wall be the boy The boy was leaning against the wall and crying.
ﺑﻪ
درﺗﻤﺎم ﻃﻮل ﺗﺮم او for the whole semester she .اﮐﺘﻔﺎ ﮐﺮدﻩ ﺑﻮد ﻳﮏ ﮐﺘﺎب درﺳﯽ was relying the same textbook She was relying on the same textbook for the whole semester. be
Figure 6 illustrate this sense for be.
424
N. Mahpeykar and A. Tyler
Fig. 6. Support sense of be
Comparison Sense. This sense designates a relation in which the TR is compared to the LM. According to Tyler and Evans [1] analysis of English to, it seems that this sense arises from the act of comparison which in some ways is related to the proximity sense. This sense derives from our physical comparison of objects in the real world. When we compare two objects in our hands we usually bring them closely together and compare their features. The following example from the corpus demonstrates this sense of be:
ﺑﻪ
ﻧﺴﺒﺖ هﺰﻳﻨﻪ هﺎﯼ ﻣﻮادﻩ ﻏﺬاﺋﯽ اﻣﺴﺎل compared food expenses this year .اﻓﺰاﻳﺶ ﻳﺎﻓﺖ ﺑﻪ ﻃﻮرﭼﺸﻢ ﮔﻴﺮﯼ هﺰﻳﻨﻪ هﺎﯼ دﻳﮕﺮ rose dramatically other expenses This year, food expenses rose dramatically compared to other expenses.
be
Figure 7 illustrates the comparison sense for be. The spatial scene represented in this diagram is similar to the proto-scene except that the vantage point (represented by the eye) has shifted from off-stage to on-stage.
Fig. 7. Comparison sense of be
Intended Recipient Sense. This sense of be is a distinct sense, derived from the proto-scene. In this sense, be introduces a recipient of a particular action. This sense is similar to the sense denoted by English to in prepositional dative construction. In these constructions a possession is transferred from person A to person B and the preposition marks the recipient of the action. English: Farsi:
Maryam gave a book to Ali. Maryam a book be Ali gave.
Farsi is an SOV language and the verb occupies the final position. In this sentence the object ‘a book’ is transferred from the TR or the element in focus ‘Maryam’ to the LM or the element in the background ‘Ali’. The preposition be marks Ali or the recipient of the action. Figure 8 illustrates this sense of be. The black circle represents the object and the dotted line represents the act of transfer from the oriented TR to the LM.
The Semantics of Farsi be: Applying the Principled Polysemy Model
425
Fig. 8. Intended recipient sense of be
In addition to the physical sense of transfer denoted by the verb, some examples in the corpus illustrated the more metaphorical sense including the conduit metaphor. The conduit metaphor described by Reddy [24] involves the conceptualization of communication as an object transferred from the speaker or the stimulus to the listener. The listener understands the message upon ‘receiving’ it. This is why we use sentences such as ‘He received the news from his teacher’ or ‘He got the information from Mary’. The following sentences illustrate the recipient sense of be in a metaphorical way: ﻓﺮﺳﺘﺎد ﻣﺮدم ﺑﻪ ﭘﻴﺎﻣﺶ را sent the people be his message The leader sent his message to the people. اﻣﻴﺪ ﺁورد زﻧﺪﮔﯽ ﻣﺮﻳﻢ ﺑﻪ brought hope Maryam’s life be The child brought hope to Maryam’s life.
رهﺒﺮ the leader
ﮐﻮدﮎ the child
In the above sentences, ‘the message’ and ‘hope’ are considered as objects that can be transferred from the subject (leader/child) to the object (people/ Maryam). Limit Sense. This sense is derived from the proto-scene and denotes that the TR is reaching a limit or end point of an action. Similar to the proto-scene, the TR is oriented towards an entity which in this case seems to be a pre-established limit or end line. The LM is no more a physical goal but an end point conceptualized in form of a scale. In this construal, the LM acts as a scale for which the TR is measured with respect to it. The following examples from the corpus demonstrate this sense of be: .رﺳﻴﺪ درﺟﻪ٢٢ ﺑﻪ دﻣﺎﯼ ﻧﻴﻮ ﻳﻮرﮎ reached 22degrees be New York’s temperature. New York’s temperature reached (got to) 22 degrees.
ﺑﻪ دﻗﻴﻘﻪ١٠ .ﻣﺎﻧﺪﻩ اﺳﺖ ﺑﺎزﯼ left the game be 10 minutes There’s only 10 minutes left to the game.
ﺗﻨﻬﺎ there’s only
In both sentences the TR is expressed in relation to a LM or limit point. In sentence (a), the TR ‘New York’s temperature’ is described in relation to the LM ‘22 degrees’. The LM (thermometer or scale) is acting as a basis for measurement and is in focus. Similarly in sentence (b) the TR ‘the time left’ is measured against the time limit or the LM ‘starting point of the game’ and be denotes the boundary for the starting point. Figure 9 illustrates this sense of be:
426
N. Mahpeykar and A. Tyler
Fig. 9. Limit sense of be
Purpose Sense. This sense derives from the proto scene and denotes that the purpose or intention associated with the TR is highlighted and in focus. This sense of be is similar to the Purpose sense designated by English for. Tyler and Evan’s [1] analysis of for claims that specific construals of the proto scene can result in the LM (or ultimate goal) of the action being no more in focus. In these construals, the TR undergoes a particular action in order to achieve some sort of goal or intention. The diagram below illustrates this sense for be. As previously mentioned, the dotted lines around the object indicate that the entity is in focus. The highlighted entity in the Figure 10 is the intermediate LM or the designated action. The ultimate LM or the goal associated with the action is not in focus here and therefore is not highlighted.
Fig. 10. Purpose sense of be
The examples below demonstrate this sense of be in different contexts of use: .ﺗﺸﻮﻳﻖ ﮐﺮدﻧﺪ اﺟﺮاﯼ ﻋﺎﻟﻴﺸﺎن ﺑﻪ ﺧﺎﻃﺮ هﻨﺮﻣﻨﺪان را applauded their great performance be khatere the artists The audience applauded the artists for their great performance.
ﺣﻀﺎر the audience
ﺑﻪ ﺧﺎﻃﺮ
ﻣﻤﮑﻦ اﺳﺖ
ﭘﺨﺶ ﺑﺮﻧﺎﻣﻪ ﻣﺬهﺒﯽ
Be khatere
might be intended
broadcasting s the religious programs
.ﭼﻬﺮﻩ ﺑﻬﺘﺮﯼ از اﺟﺘﻤﺎع ﺑﺎﺷﺪ ﻧﺸﺎن دادان a better picture of the society showing Broadcasting religious programs might be intended for showing a better picture of the society. In both examples the TR is performing a specific action and is motivated towards achieving some sort of purpose. In sentence (a) the audience (TR) is applauding in order to respect the artists’ great performance. In sentence (b) the religious programs (TR) are shown for the purpose of constructing a particular image of society. Enclosed Path Sense. This sense is a distinct sense derived from the proto-scene. Like the proto-scene the TR is oriented towards the highlighted LM. However, in this sense the space between the TR and LM is conceptualized as being contained instead of an open environment (the dots indicate containment). This presents an interesting
The Semantics of Farsi be: Applying the Principled Polysemy Model
427
contrast in respect to the way potential trajectory or path is denoted in English. English speakers denote trajectory in two different ways: They use to for marking the environment as being conceptually open and through for denoting the trajectory as being conceptually closed. The following examples demonstrate these two different construals in English: 1. John walked to school yesterday = the path between John and school is open. 2. John walked through the tunnel all day = the path between John and tunnel is contained or enclosed. In Farsi be denotes both conceptualizations: open and contained path. When designating a contained path, be accompanies other prepositions such as mian meaning among. Figure 11 illustrates this sense of be:
Fig. 11. Enclosed path sense of be
The example below demonstrates this sense of be in the specific context of use: .و ﺁﻧﻬﺎ را ﺁرام ﮐﺮد رﻓﺖ ﺳﺮﺑﺎزان and calmed them down went the guards Ali went among the guards and calmed them down.
ﺑﻪ ﻣﻴﺎن
ﻋﻠﯽ Ali
be mian
The preposition among indicates that the TR is surrounded by the LM. The LM is multiplex, meaning that it constitutes several individual entities. In the example above, Ali the TR is surrounded by the guards which constitute as multiplex LM and Ali is conceptualized as to be in some ways contained. Means Sense. This sense derives from the enclosed path sense and denotes that the activity associated with the TR is achieved through the particular role associated with the LM. This sense is similar to the sense denoted by prepositions in and through in English as demonstrated in the sentences below: .ﺻﺤﺒﺖ ﮐﺮدن ﺷﺪ ﺑﻪ ﻣﻮﻓﻖ ﻣﺘﺮﺟﻢ ﮐﻤﮏ Speak to managed of an interpreter the help He managed to speak through the help of an interpreter. . ﻧﻮﺷﺘﻪ ﺷﺪﻩ ﺑﻮد اﻧﮕﻠﻴﺴﯽ was written English The text was written in English.
ﺑﻪ be
ﺑﻪ be
او he
ﻣﺘﻦ the text
As explained by Tyler and Evans [1] the spatial configuration of through involves the functional element of path. Consequently, in many occasions path provides an important means for reaching a particular goal. Thus the relationship between following a path and reaching a particular goal gives rise to the conceptualization of means associated with path i.e. the path provides the means for reaching a goal. This
428
N. Mahpeykar and A. Tyler
inference is apparent in various contexts and through pragmatic strengthening it has become conceptualized as a distinct sense called ‘the Means Sense’. Like English through, in certain contexts, be involves a means/path sense. Figure 12 illustrates this sense for be. The box between the TR and LM represents the path:
Fig. 12. Means sense of be
Manner Sense. This sense is closely related to the means sense. While in the means sense be designates a path/means to a certain goal, in this use, be introduces a manner or a quality associated with the TR for obtaining a certain goal. No such sense has yet been associated with any of the English prepositions. Many instances of be denote this sense mainly in cases where be is followed by an adverbs of manner or adjectives describing manner. The following examples demonstrate the context in which this sense of be is used. .راﻩ رﻓﺖ راﺣﺘﯽ ﺑﻪ ﺑﻌﺪ از ﭼﻨﺪ هﻔﺘﻪ ﺗﻤﺮﻳﻦ walked easily be after a few weeks of practice She easily walked after a few weeks of practice. .ﻋﻮض ﮐﺮد ﻧﺎﻣﺶ را changed his name He quickly changed his name.
ﺳﺮﻋﺖ quickly
ﺑﻪ be
او she
او he
Figure 13 demonstrates this sense for be. The curved line between the TR and LM represents the manner.
Fig. 13. Manner sense of be
Container and Boundary-Crossing Sense. The container sense is also related to the enclosed path sense and denotes that the TR is surrounded by a bounded LM. In order to be used in this context, be combines with other preposition including darune and dakhel* which clearly denote containment and are both translated as in in English. The result of this combination (be +darune/dakhel) leads to another sense, closely associated with the containment sense which is the boundary-crossing sense. According to Tyler and Evans [1] analysis of containment senses in English, a number of prepositions including into entail a spatial configuration in which the TR is located on the exterior of the bounded LM while being oriented towards the LM.
The Semantics of Farsi be: Applying the Principled Polysemy Model
429
There is also a functional element of goal associated with this type of prepositions as evident in the following example: .و ﺗﻤﺎم راﻩ ﺑﻪ ﺳﻮﯼ ﺧﻂ ﭘﺎﻳﺎن ﺷﻨﺎ ﮐﺮد ﭘﺮﻳﺪ اﺳﺘﺨﺮ ﺑﻪ درون and swam all the way up to jumped the pool be darune the finish line The boy jumped into the pool and swam all the way up to the finish line.
ﭘﺴﺮﮎ the boy
The boy (TR) crosses the pool’s (LM) exterior boundary. The boy’s the intention or goal is to jump into the pool in order to swim in it. The Farsi prepositions be darune and be rooye denote a sense of boundary crossing similar to the spatial scenes associated with into and onto in English. The following examples from the Farsi dictionary demonstrate this use of be: .رﻓﺖ ﭼﺎﻟﻪ went the hole He went into the hole.
ﺑﻪ درون be darune
او he
.ﻟﻐﺰﻳﺪ زﻣﻴﻦ ﺑﻪ روﯼ ﺻﻨﺪﻟﯽ slipped the floor be rooye the chair The little girl slipped from the chair onto the floor.
از from
دﺧﺘﺮ ﮐﻮﭼﮏ the little girl
In both examples above the TR crosses the exterior boundaries defined for the LM (Ex.1 the hole and Ex.2 the floor). In the second example the TR crosses two boundaries: the boundaries of the source object (the chair) and the boundaries of the goal (the floor). Figure 14 illustrates this sense for be. The dotted arrow pointing from the TR to the LM represents crossing the boundaries of the LM and entering its interior region.
Fig. 14. Boundary-Crossing sense for be
Transformation Sense. This sense is closely related to the previous sense as it also involves crossing the boundaries of the LM. Here, be denotes that the TR is crossing the boundaries of the contained LM through the transformation of its own boundaries. The exterior boundary of the TR is thus transformed into the exterior boundary of the LM. The following examples demonstrate this sense of be:
ﺑﻪ .ﺗﺒﺪﻳﻞ ﺷﺪ ﻳﮏ ﻗﻮﯼ زﻳﺒﺎ turned a beautiful swan be The girl turned into a beautiful swan.
دﺧﺘﺮ the girl
430
N. Mahpeykar and A. Tyler
ﻣﺮﻳﻢ ﻏﻴﺒﺖ ﺧﺎﻃﺮ ﺑﻪ Maryam’s absence because of ﺑﻪ ﺗﻤﺎم ﺟﺸﻦ .ﺗﺒﺪﻳﻞ ﺷﺪ ﻳﮏ واﻗﻌﻪ ﻧﺎﮔﻮار turned a tragedy be the whole party The whole party turned into a tragedy because of Mary’s absence. In example 2, the party (TR) is conceptualized as being a contained entity with enclosed boundaries which is transformed into a tragedy (also conceptualized as bounded) as a result of Mary’s absent. Figure 15 illustrates transformation sense for be.
Fig. 15. Transformation sense for be
Come into Perception Sense. This sense is derived from the boundary crossing sense and denotes that the TR is coming into the perception of the observer. According to Tyler and Evans [1] the TR is perceptually accessible to the observer when the vantage point is located within the boundaries of the LM. As a result of the location of the vantage point, certain perceptual limits are defined in terms of the observers’ visual accessibility. The entities within the boundary are highly accessible while the entities outside the boundary are beyond the perceptual reach. For instance being located in a three-dimensional space such as a room the observer is unable to access a larger scene outside the room. Relying on this conceptualization, in Farsi a number of constructions including ‘be +eye’ are developed with the meaning of be implying some sort of perceptual accessibility. The following examples demonstrate this sense of be: .ﺑﻪ ﭼﺸﻢ ﻧﻤﯽ اﻳﺪ ﮐﻪ ﭘﻮﺷﻴﺪﻩ is not ‘coming+ be+ eye’ she is wearing The dress she is wearing is not eye-catching. .ﺑﻪ ﭼﺸﻢ ﻧﻤﯽﺧﻮرﻧﺪ ﺁدﻣﻬﺎﯼ زﻳﺎدﯼ ‘contact +be+ eye’ not many people You cannot see many people in the streets.
ﻟﺒﺎﺳﯽ the dress در ﺧﻴﺎﺑﺎن in the streets
In the first sentence, the construction ‘come + be +eye’ in the literal sense of the verb denotes that the dress is not located in the perceptual domain of the observer. This conceptualization then leads the metaphorical meaning of the verb ‘eye-catching’ or appealing to the speaker’s visual sense. Similarly in the second sentence, the construction ‘contact +be+ eye’ implies that the entities in this case ‘the people’ are not in the perceptual accessibility of the observer and thus cannot be seen. Figure 16 illustrates this sense for be.
The Semantics of Farsi be: Applying the Principled Polysemy Model
431
Fig. 16. Come into perception sense for be
3.4 The Complete Network for be The different senses for be are demonstrated in figure 17. As explained in the previous section the semantic network for be consists of a prototypical sense from which 5 distinct senses including Proximity, Enclosed path, Limit, Intended recipient and Purpose sense are derived. Two of the senses (Proximity and Enclosed path sense) form clusters of meaning, constituting a larger network. Perception
Transformation
Boundary Attachment Comparison (physical objects)
Containment Means/Path
Support Contact
Enclosed Orientation (physical goal)
Manner
Intended
Proximity
Purpose
Limit
Fig. 17. The complete semantic network for be
4 Conclusion This study demonstrates the possibilities that a CL analysis provides for a fuller representation of how speakers think about spatial relations and use spatial language. The complex semantics of the Farsi preposition be were investigated using the Principled Polysemy model, which draws on a rich set of CL conceptual tools for analyzing space. In order to identify the semantic network for be, 1000 instances of prepositional usage were carefully coded and analyzed. The results of the analysis
432
N. Mahpeykar and A. Tyler
showed that Tyler and Evans’ Principled Polysemy Model can be successfully applied to languages other than English. By applying CL principles, we have been able to provide a systematic, fine-grained analysis of a highly polysemous preposition in Farsi.The analysis showed that many of the meanings associated with this preposition bear similarities with the English preposition to, however, not all the extended meanings share the same spatial scene with this preposition. A number of extended meanings associated with be, showed similarities with a wide range of English prepositions for, on, into, onto, through in their specific senses. One spatial scene associated with be was found to be unique to Farsi ‘the Manner Sense’, as no English preposition is associated with this sense. Moreover, the analysis revealed some interesting comparisons between Farsi and English prepositions. In certain scenes involving a path, English speakers denote an environmental LM by using different prepositions; in contrast, in Farsi these conceptualizations are expressed through using one single preposition. English speakers use to for marking the environmental LM as being conceptually open or unbounded and through for denoting the LM which is traversed as being conceptually bounded. On the other hand, Farsi speakers denote both LM environments using a single preposition be. When designating a contained or bounded path, be accompanies other prepositions such as mian meaning among. The interpretation of the two types of conceptualizations (open and contained path) is determined by the context in which the preposition occurs. Generally, due to the nature of categorization, the model proposed in this study and other similar studies is partly subjective; more research carried out in this area could minimize researcher bias. In addition, a larger sample of corpus data would provide a wider range of uses of the preposition, leading to a more reliable semantic analysis. In spite of these limitations, the current study offers the basis for further studies in the field of cognitive science and computational linguistics. Incorporating in the notion of a functional element as part of a preposition’s central meaning and articulating a constrained methodology to account for meaning extension, the study provides insights for semantic analysis of spatial language in languages other than English. It should not be overlooked that this study and other related studies on cognitive aspects of prepositions can potentially provide insights into other aspects of metaphor research and verb-particle comprehension. The study presented here, combines both an application of a proposed categorization system and the use of corpora for identifying different semantic categories of a particular preposition. Similar case studies of prepositions from other languages have the potential to assist computational linguists in modeling naturally occurring spatial language.
References 1. Tyler, A., Evans, V.: The semantics of English Prepositions: Spatial scenes, Embodied meaning and Cognition. Cambridge University Press, Cambridge (2003) 2. Rudriguez, M., Egenhofer, M.: A Comparison of Inferences about Containers and Surfacesin Small-Scale and Large-Scale Spaces. Journal of Visual Languages and Computing 11, 639–662 (2000) 3. Coventry, K.R., Garrod, S.C.: Saying, Seeing and Acting: The Psychological Semantics of Spatial Prepositions. Essays in Cognitive Psychology Series. Psychology Press, Hove (2004)
The Semantics of Farsi be: Applying the Principled Polysemy Model
433
4. Vandeloise, C.: Force and function in the acquisition of the preposition in. In: Carlson, L. (ed.) Functional Features in Language and Space: Insights from Perception, Categorization, and Development, pp. 219–229. Oxford University Press, NY (2005) 5. Hernández, D.: Qualitative Representation of Spatial Knowledge. LNCS (LNAI), vol. 804. Springer, Heidelberg (1994) 6. Garrod, S.C., Sanford, A.J.: Discourse models as interfaces between language and the spatial world. Journal of Semantics 6, 17–170 (1989) 7. Tyler, A., Evans, V.: Reconsidering prepositional polysemy networks: the case of over. Language 77(4), 724–765 (2001) 8. Nerlich, B., Clark, D.D.: Polysemy and flexibility: Introduction and overview. In: Polysemy: Flexible Patterns of Meaning in Mind and Language. Trends in Linguistics: Studies and Monographs, vol. 142, pp. 3–30. Mouton de Gruyer, Berlin (2003) 9. Shakhova, D., Tyler, A.: Taking the Principled Polysemy Model of Spatial Particles beyond English: the case of Russian za. In: Evans, V., Chilton, P. (eds.) Language, Cognition and Space. Equinox, London (2010) 10. Egenhofer, M., Shariff, R.: Metric Details for Natural-Language Spatial Relations. ACM Transactions on Information Systems 16(4), 295–321 (1998) 11. Mark, D., Egenhofer, M.: Modeling Spatial Relations Between Lines and Regions: Combining Formal Mathematical Models and Human Subjects Testing. Cartography and Geographic Information Systems 21(3), 195–212 (1994) 12. Reiger, T.: A Model of the Human Capacity for Categorizing Spatial Relations. Cognitive Linguistics 6(1), 63–88 (1995) 13. Gries, S.T.: Corpus-based methods and cognitive semantics: The many senses of to run. In: Gries, S.T. (ed.) Corpora in Cognitive Linguistics: Corpus-Based Approaches to Syntax and Lexis, pp. 57–100. DEU: Walter de Gruyter & Co. KG Publishers, Berlin (2006) 14. Deignan, A.: Metaphor and Corpus linguistics. John Benjamins, Philadelphia (2005) 15. Langacker Foundations of cognitive grammar. Theoretical Prerequisities, vol. 1. Stanford University Press, Stanford (1987) 16. Evans, V., Green, M.: Cognitive Linguistics: an Introduction. Edinburgh University Press, Edinburgh (2006) 17. Tyler, A., Evans, V.: Applying Cognitive Linguistics to pedagogical grammar: the case of over. In: Achard, M., Nemeier, S. (eds.) Cognitive Linguistics, Second Language Acquisition and Foreign Language Teaching, pp. 257–280. Mouton de Gruyter, New York (2004) 18. Hopper, P.J., Traugott, E.C.: Grammaticalization, 2nd edn. Cambridge University Press, Cambridge (2003) 19. Aryanpour Persian-English dictionary, http://www.wdgco.com/dic 20. Abolhassani Chime, Z.: An account for compound preposition in Farsi. In: Proceedings of the COLING/ACL 2006, pp. 113–119 (2006) 21. Parsafar, P.: Spatial prepositions in modern Persian, PhD thesis. Yale University (1996) 22. Bijan-khan: The online corpus of Farsi, http://ece.ut.ac.ir/dbrg/Bijankhan 23. Lakoff, G.: Women, Fire and Dangerous Things: What Categories Reveal About the Mind. University of Chicago Press, Chicago (1987) 24. Reddy, M.: The conduit metaphor: a case of frame conflict in our language about language. In: Ortony, A. (ed.) Metaphor and Thought. Cambridge University Press, Cambridge (1979)
On the Explicit and Implicit Spatiotemporal Architecture of Narratives of Personal Experience Blake Stephen Howald1 and E. Graham Katz2 1
2
Ultralingua, Inc., 1313 5th St. SE, Suite 108, Minneapolis, MN 55414-4533 [email protected] Georgetown University, Department of Linguistics, Washington, DC 20057-1051 [email protected]
Abstract. Expanding on recent research into the predictability of explicit linguistic spatial information relative to features of discourse structure, we present the results of several machine learning studies which leverage rhetorical relations, events, temporal information, text sequence, and both explicit and implicit linguistic spatial information in three different corpora of narrative discourses. On average, classifiers predict figure, ground, spatial verb and preposition and frame of reference to 75% accuracy, rhetorical relations to 72% accuracy, and events to 76% accuracy (all values have statistical significance above majority class baselines). These results hold independent of the number of authors, subject matter, length and density of spatial and temporal information. Consequently, we argue for a generalized model of spatiotemporal information in narrative discourse, which not only provides a deeper understanding of the semantics and pragmatics of discourse structure, but also alternative robust approaches to analysis.
1
Introduction
A central organizing feature of a discourse is its rhetorical structure. The rhetorical structure of a discourse is based on the array of rhetorical relations that hold among the clauses of the discourse [1, 2]. These relations characterize the semantic and pragmatic relations that hold among these clauses and have clear interpretive implications, one of the most prominent of which concerns temporal interpretation [3, 4]. For example, the fact that the narration relation holds between the clauses in (1a-c) implies that there is an iconic temporal sequencing between the events described by each of the clauses (e.g., (1a) happened before (1b) and (1b) happened before (1c). (1)
a. Pascale rolled over. b. She pushed herself up. c. And crawled around.
Although it has been claimed that the narration relation also has consequences as to the spatial interpretation (e.g., Asher and Lascarides [2]), there is a clear M. Egenhofer et al. (Eds.): COSIT 2011, LNCS 6899, pp. 434–454, 2011. c Springer-Verlag Berlin Heidelberg 2011
The Spatiotemporal Architecture of Narrative Discourse
435
contrast between explicit temporal and spatial information in relation to rhetorical structure. In particular, if overt temporal information is included in (1a-c), as in example (2), there can be a disruption to the temporal sequence of narrative events. (2)
a. Pascale rolled over at 12:00. b. She pushed herself up at 11:45. c. And crawled around at 11:00.
However, if overt spatial information is included, as in example (3), there is less disruption (if at all) compared to (1). Overt spatial information appears to have inconsequential effects on rhetorical structure. (3)
a. Pascale rolled over on the mat. b. She pushed herself up off of the floor. c. And crawled around to Cati.
On the local linguistic surface of the clause, where semantic and pragmatic relationships are constructed, temporal, rather than spatial, information appears to be the governing factor in discourse. Recent research has, however, suggested a more pervasive structural contribution of spatial information by demonstrating systematic relationships between explicit spatial information and the rhetorical and event structure of narrative discourse [5–7]. In this paper, we consider both explicit and implicit spatial information as they relate to elements of discourse structure. The central hypothesis tested is whether or not a pervasive spatiotemporal event structure exists in narrative discourses despite the perceived limited contribution of spatial information. In order to detect patterns in spatial information, 75 narratives from three different corpora were annotated with explicit spatial information (figure, ground, verb, preposition and frame of reference), rhetorical relations, text sequence, event and temporal information (tense, aspect, explicit reference - following certain elements of the TimeML specification [4]). These annotations were then extended to clauses without explicit spatial information and used in several supervised machine learning experiments to demonstrate the high predictability of, and consequent systematic relationship between, types of spatial information and elements of discourse structure. The results of the machine learning experiments facilitate the representation of spatiotemporal information that is argued to be generalizeable to all narrative discourses. The remainder of this paper is organized as follows. Section 2 provides background research on spatial information and discourse structure and the motivation for considering implicit spatial information. Section 3 discusses the details of the analyzed data, the spatial and temporal information coding scheme (including inter-rater reliability), and the patterning of annotated elements relative to text sequence. Section 4 briefly reviews supervised machine learning and presents the results of the machine learning tasks. Section 5 discusses these results relative to the role of implicit space, text sequence and the construction of a general model of spatiotemporality in narrative. Section 6 concludes with implications of this model for current and future linguistic and interdisciplinary research.
436
2
B.S. Howald and E.G. Katz
Background
Despite the limited contribution of spatial information to discourse structure, attention to spatial information in linguistic discourse is nothing new (e.g., “descriptive modes” of discourse having a spatial progression [8], responses to describing apartment layouts being tour rather than map-like [9]). In narrative discourse, space typically plays a role in “setting the scene” for the narrative actions (e.g. Orientations [10]). However, while it is logical that discourses of experience are spatiotemporal (events in the past happend at some time at some place), the small amount of spatial information on the linguistic surface requires that such a structure be intuited rather than empirically demonstrated. Research by Herman [5] has more closely evaluated the role of spatial information in discourse structure by focusing on narrative discourses that are rich in explicit spatial reference - ghost story narratives that require the narrator to describe the environment in detail in order to locate super-natural entities. Herman argues that the spatial information carves out narrative domains which group narrative events. Herman focuses on linguistic spatial information, theoretically based in spatial cognition and perception research, and indicates several discourse cues that contribute to the structure of spatially rich narrative discourses. These discourse cues include figure, ground and path (manner) relationships [11]; frames of reference [12]; and deictic shifts [13]. Building on Herman, Howald [6], using a corpus of serial killer first person narratives, also rich in spatial reference, relied on three spatial features (figure, verb, ground) to predict spatially-defined Pre-Crime, Crime and Post-Crime events to a 82% accuracy (40% above baseline) in machine learning experiments. Focusing on spatially rich and non-spatially rich narrative discourses, Howald and Katz [7] predicted the narration and background / elaboration rhetorical relations in machine learning experiments to 84% (16% above baseline) and 57% (22% above baseline) respectively based on five spatial features (figure, ground, verb, preposition and frame of reference). Howald and Katz also demonstrated that the same five spatial features of narrative event clauses were predictable to a 55% average (12% above baseline), but boosted to 71% (28% above baseline) when implicit spatial information was considered. It is the use of implicit spatial information that will be further explored in this paper. 2.1
Spatial, Rhetorical, Event and Temporal Information
Linguistic reference to physical space involves spatial prepositions and verbs which create figure and ground relationships [11]. The types of relationships can vary qualitatively depending on the semantics of the prepositions and verbs. Further, the perspective taken on the spatial relationship and the level of detail can also be linguistically encoded. Considering (4), and restricting the discussion of spatial information to English, figure and ground relationships are indexed by prepositions - Pascale in Minnesota (4a), Pascale on the couch (4b) - and verbs - she left there (4c). Further, the prepositions and verbs can indicate not only static but dynamic figure and ground relationships - she crawled around me (4d).
The Spatiotemporal Architecture of Narrative Discourse
(4)
a. b. c. d. e. f.
437
Pascale lives in Minnesota. Pascale is on the couch. She left there. Then she crawled around me. She pulled some candy out of her pocket. She pressed her face against the north window.
Spatial verbs and prepositions can be modeled mereotopologically (e.g., defining spatial relationships in terms of regions and connections (RCC-8 [14])) giving rise to a number of useful analytical categories.1 For prepositions, we follow Asher and Sablayrolles [17] who classify prepositions based on the position (Position - at, Initial Direction - from, Medial Position - through, Final Position - to) and contact (Inner - in, Contact - against, Outer - along, and Outer-Most - beyond) between two regions (figure and ground). For verbs, Muller [18] proposes six mereotopological classes: Reach, Leave, Internal, External, Hit, and Cross. Pustejovsky and Moszkowicz [19] mapped these classes to FrameNet and VerbNet and expanded these classes into ten general classes of motion (Move, Move-External, Move-Internal, Leave, Reach, Detach, Hit, Follow, Deviate, Stay). The perspective used to describe spatial relationships can vary as well. For this paper, perspective takes two forms, granularity of spatial description and frame of reference. Granularity refers to the level of detail in spatial description and is linguistically linked to the ground expression in the figure/ ground dyad. Montello [20] indicates four spatial granularities based on the cognitive organization of spatial knowledge. These are illustrated in (4). (4e) is a Figural granularity her pocket describing space smaller than the human body. (4b-c) are Vista granularities (the couch, there) describing space from a single point of view. (4d) is an Environmental granularity ((around) me) describing space larger than the body with multiple (scanning) point(s) of view. (4a) is a Geographic granularity (Minnesota) describing space even larger than the body and is learned by symbolic representation. Frames of reference provide different ways of describing spatial relationships and are based on preposition and ground information. (4a-c) are non-coordinated frames of reference (Named Location, Contiguity, Deictic, respectively) as they relate just the figure (Pascale, she) and ground. (4d-f) are coordinated frames of reference (Intrinsic, Relative, Absolute, respectively), relating the figure (she, some candy, her face) to an additional entity within the ground (me, her pocket, the north window). In terms of temporal information, we used elements of the TimeML specification [4]. In particular, explicit references to times and dates - December 31, 2010, 23:15 GMT, 6 minutes, every Thursday (Dates, Times, Durations and Sets, respectively) - and tense and aspect - past, present, future, past perfect, present progressive, etc. We also relied on TimeML’s categorization of events. Events are 1
Albeit true that there are many metrics available to capture and model the natural language of spatial information in formal terms, some much more robust than RCC-8 alone (e.g., [15, 16]), we are relying primarily on insights from literature in linguistics for purposes of consistency with the underlying discourse theories employed.
438
B.S. Howald and E.G. Katz
indicated by the semantics of the verb and include Reporting (said, cited); Perception (saw, heard); Aspectual (start, restart, stop, complete, continue) (This is a consolidation of five classes: initiation, reinitiation, termination, culmination and continuation.); Intensional Action (attempt, avoid, claime, defer); Alternate Worlds (hope, demand, believe); States; and Occurrences (events not falling into one of the delineated categories).2 2.2
Rhetorical Relations and Spatial Persistence
Rhetorical relations describe the role that one clause plays with respect to another in a text and contributes to a text’s coherence [21]. As such, these relations are pragmatic features of a text. We rely on the inventory of rhetorical relations in Segmented Discourse Representation Theory (SDRT) [2]. This inventory includes the following relations, illustrated by example: narration: elaboration: background: explanation: result: consequence: alternation: continuation:
Pascale got up. She walked to the kitchen. Pascale got her aardvark. It was under the crib. Pascale got her aardvark. It was dirty. The aardvark was dirty. Pascale had dropped it in a puddle. Pascale dropped the aardvark in a puddle. It got really dirty. If Pascale dropped the aardvark in the puddle, then it got dirty. Pascale got her aardvark or her stuffed bunny. Pascale got her aardvark. Grimsby got his rawhide.
The spatial information mentioned in one clause can frequently be taken to apply to subsequent clauses, a phenomenon we term the persistence of spatial reference. For example, in (5), it is arguably the case that (5b) and (5c) occur in the same location indicated in (5a) (the kitchen). It could be the case that (5b) and (5c) occur in locations other than the kitchen, but the lack of explicit spatial information creates the expectation that the kitchen is the location for the entirety of (5). (5)
a. Grimsby entered the kitchen. b. He drank some water. c. Then he begged for a biscuit.
The persistence of spatial information across clauses is clearly closely related to the rhetorical relations holding between the clauses. The narration relation has defined spatiotemporal consequences, but for other relations it is less clear. There does seem to be a range of rhetorical relations that, to varying degrees, support persistence. For example, implicit spatial information can be extended to elaboration and background clauses - the fact that the aardvark was under the crib and that the aardvark was dirty exists in the same location as where Pascale got the aardvark. explanation, consequence and result can be considered this way as well. The events of the second clause in consequence 2
It should be noted that we did not code events consistent with TimeML’s guidelines, which require that the event be additionally anchored to some time or duration.
The Spatiotemporal Architecture of Narrative Discourse
439
and result occur in the same location as the first clause. The reverse is true for explanation. alternation and continuation are less straightforward because these relations tend to obtain between clauses that contrast some information. However, the persistence of spatial information is possible in the example clauses. Although, for continuation, the event described in the second clause (assuming it is temporally simultaneous with the first) suggests the possibility that the location is different. In the absence of explicit updating of the spatial information, extension of spatial information is acceptable if not underspecified. The implicit spatial information will, most likely, be underspecified in terms of actual location; for example, in (5), the drinking of water and the begging for biscuits arguably occur in different, separate locations within the kitchen. The idea behind considering persistent space comes from the temporal inertia that is pragmatically resolved from text progression - as the text moves forward, the progression gives the impression of time moving forward inertially relative to the content of a given utterance (see generally, [8, 22]). We suggest here that similar principles apply to spatial information. Again, it should be noted that the narration relation is unique in that the extension of implicit space is germane to the underlying theory, extension to non-narration relations is not. While the present examples are constructed in such a way as to facilitate the extension of implicit spatial information, and assuming an analysis within experiential narrative discourses, this may not always be the case. Sensitivity to this issue was considered in coding the data. In particular, if the content of the utterance precluded persistence then information was not extended.
3
Methodology
3.1
Data
25 narratives from three corpora were selected for analysis: (1) oral narratives emerging in sociolinguistic interviews from the American National Corpus Charlotte Narrative and Conversation Collection [23] (CNCC); (2) written narratives describing adventure travel to latitude and longitude intersections from the Degree Confluence Project (DCP); and (3) oral and written confession statements and guilty pleas from criminals (CRI). These narratives describe a central unique chain of events - as opposed to discourses which, for example, describe habitual past actions. The data is summarized in Table 1. Table 1. Summary of data Corpus CNCC (n=25) DCP (n=25) CRI (n=25) Total (N=75) Total Clauses 870 608 1,938 3,416 Spatial Clauses 348 340 828 1,516 Average 40.00 55.92 42.72 44.37
440
B.S. Howald and E.G. Katz
“Clauses” refers to independent clauses and “Spatial Clauses” refers to those independent clauses that contain an explicit reference to physical space. There is a total of 3,416 clauses, 1,516 of which contain explict spatial information (44.37% average, ranging from 11.76% to 81.25% with similar standard deviations for each corpus: CNCC = 14.13%; DCP = 13.28%; CRI = 10.85%).3 3.2
Spatial and Temporal Information
Based on the discussion in Section 2.1, the following spatial information was annotated in the 75 narratives: – FIGURE - an indication of grammatical person or a non-person entity (1 = I, my; 2 = you, your; 3 = he, her; 4 = we, our; 5 = you, your; 6 = they, their; NP = the purse, three cars; AREA = the park, the clearing; EVENT = a conversation); – VERB - four mereotopological classes (State = stay, was sitting; Move = run, go; Outside = follow, pass; Hit = attach, strike); – PREPOSITION - sixteen mereotopological classes ([Positional, Initial, Medial, Final] x [Inner, Contact, Outer, Outer-Most]); and 0; – GROUND - four granularities (Figural, Environmental, Vista, Geographic); – FRAME - six frames of reference (Deictic, Contiguity, Named Location, Relative, Intrinsic, Absolute). Based on Sections 2.1 and 2.2, the following rhetorical, event and temporal information was annotated in the 75 narratives: – SDRT rhetorical relation (narration, background, elaboration, alternation, continuation, result, consequentce, explanation); – EXPLICIT TIME (Date, Time, Duration, Set); – TENSE (Past, Present, Future, Infinitive); – ASPECT (Progressive, Perfect, Perfect Progressive, None); – SEQUENCE (clause position normalized to the unit interval); – EVENT (Reporting, Perception, Aspectual, Intensional Action, Alternate Worlds, States, Occurrences). The distribution of the spatial, rhetorical, event and temporal elements is summarized in Table 2. The spatial codings are presented in the left five columns 3
Consistent with Herman [5] and the discussion of (4) above, the type of spatial language attended to in this paper is based on spatial cognition and perception. Consequently, clauses indicating metaphorical (The thought crossed my mind), reported (She told me to drive to the lake), negative (We didn’t fly to Iowa) or alternate world space (I always wanted to go to Japan) were not considered. There were 219 total clauses that fell into these categories: idiom or metaphor = 14; reported speech = 110; alternate worlds = 75; and negation = 20. This indicates that there were 1735 total clauses with a linguistic spatial construction and 1516 (87.37%) of these were physical spatial constructions.
The Spatiotemporal Architecture of Narrative Discourse
441
Table 2. Per corpus distribution of spatial, rhetorical, event and temporal attributes Coding Figure 1 NP 3 4 Area 6 Event Verb State Motion Hit Outside Preposition Position Final 0 Medial Initial Ground Vista Enviro. Fig. Geo. Frame Name L. Cont. Int. Deictic Abs. Rel.
CNCC DCP CRI TOTAL 53 107 66 68 13 29 12
43 39 9 158 65 5 21
354 212 201 23 14 20 4
450 358 276 249 92 54 37
176 111 58 3
150 104 66 20
215 304 297 12
541 519 421 35
161 87 51 33 16
132 71 89 25 23
319 252 70 91 96
612 410 210 149 135
154 128 52 14
102 122 8 108
336 211 191 90
592 461 251 212
164 98 3 58 0 25
174 21 82 13 47 3
328 193 159 119 8 21
666 312 244 190 55 49
Coding Rhetorical nar back elab con exp res alt Time None Date Time Duration Set Tense Past Present Future Infinitive Aspect None Prog. Perf. Perf. Prog. Event Asp. State Alt.W. Occ. Rep. Per. Inten.
CNCC DCP CRI TOTAL 358 233 197 17 25 19 1
263 155 107 30 15 13 0
1032 486 229 115 31 20 19
1653 874 533 162 71 52 20
824 21 19 10 1
532 28 25 21 2
1859 35 36 6 3
3215 84 80 37 6
615 214 5 6
465 110 7 5
1813 111 5 4
2893 435 17 15
767 106 0 2
550 51 6 1
767 106 0 2
2084 263 6 5
248 267 127 26 119 64 24
246 164 62 52 10 24 50
628 398 287 302 165 84 74
1122 829 476 380 294 172 148
and the rhetorical, event and temporal codings in the right five columns. For ease of presentation, We arbitrarily collapsed the preposition codings along the positional (Positional, Initial, Medial and Final), rather than the contact dimension. Sequence is considered in Section 3.4. In general, the distribution of spatial elements is roughly similar for each corpus. There is a balance between person and non-person figures, verbs are largely either motion or states, prepositions are either positional or final (prepositions that were a part of particle verb constructions (run up, drove out) were treated as part of the prepositional phrase), granularities are singular or multiple points of view, and frames of reference are more non-coordinated than coordinated. The analyzed narrative discourses consist predominately of narration, background and elaboration relations. The narratives are largely in the past
442
B.S. Howald and E.G. Katz
tense with little aspect and explicit temporal reference. In terms of events, the corpora are balanced between aspectual and state types. Variations in this trend can be accounted for based on the subject matter of the narratives - for example, there are more Figural granularities and Hit verbs in the CRI narratives (narrating interpersonal violent criminal activity) and there are more Area figures and Geographic granularities in the DCP narratives (narrating global travel with GPS devices). Machine learning algorithms will gravitate toward certain statistical distributions in a given set of data. For example, figures in the CRI corpus are largely 1 whereas figures in the DCP corpus are largely 4 and NP in the CNCC corpus. Further, Motion and Hit verbs occur in the larger proportions in the CRI corpus than Motion and State in the CNCC and DCP corpora; this is discussed in more detail in Section 4. The next section presents the distribution of coded elements relative to inter-rater reliability and suggestions for consolidations of the coding scheme are considered. 3.3
Inter-rater Reliability and Consolidation of Coding
The data was coded by one of the authors. For inter-rater reliability statistics (Cohen’s Kappa [24]), an additional individual coded a subset of the data 15 narratives, 5 from each corpus, totaling 610 clauses, 320 of which contained explicit spatial information. The coder was given the data already segmented into independent clauses and guidelines that included the categories to be coded and examples of those categories. The coder could take as much time as needed and no other instruction was given. Agreement and kappa statistics are summarized in Table 3. Table 3. Agreement and kappa statistics for annotation Coding Agreement (%) Kappa (κ) Col. Agr. Col. Kappa Tense 99.65 .9945 N/A N/A Aspect 99.30 .9937 N/A N/A Event 73.62 .5004 86.15 .6445 Rhetorical 69.38 .5670 81.02 .6361 Figure 90.51 .8105 N/A .N/A Verb 75.00 .5126 87.50 .7229 Preposition 76.29 .5262 83.62 .6875 Ground 74.56 .5126 83.62 .6875 Frame 76.29 .5262 84.05 .6093
Agreement and kappa statistics are a good measure for how intuitive an annotation scheme is. They also serve as a way to improve the annotation scheme by potentially collapsing certain elements. Starting with the rhetorical, event and temporal elements, both tense and aspect proved to be highly consistent. There is nothing to improve upon with these categories. Kappa statistics closer to 1.0 are best, but statistics above .60 are taken to be acceptable [25]. There
The Spatiotemporal Architecture of Narrative Discourse
443
are no specific comparisons of these numbers to other research. However, human agreement for temporal information such as this is typically high (e.g. [26]). More variability is seen in the event codings. Largest confusions were between Aspectual and Intensional Action events, but we see no way to collapse these categories. The discrepancy will have to stand for now. There are no specific comparisons of human performance on TimeML event classifications. There is research where humans are asked to classify the underlying verbs [27] or relative durations of events [28], which are typically high, but not the events themselves. For rhetorical relations, the largest confusions were between background and elaboration and narration and continuation. While similar, narration and continuation are theoretically different and cannot be collapsed. Consistent with Howald and Katz, background and elaboration, which exhibit a subtle temporal distinction between temporal overlap and inclusion respectively, were collapsed into one category. This improved accuracy by 12% and is consistent with previously reported performances (e.g. Agreement = 71.25/ κ = 61.00 [29]; Agreement = 71.97/ κ = 60.27) [7]). For the spatial elements, with the exception of figures, the average agreement between the two coders is low with kappa falling into an unacceptable range. This indicates that the coding scheme may be too complex or too nuanced. The results suggest that humans do not necessarily make judgments about language by thinking mereotopologically or fully in terms of the granularity and frame of reference schemes proposed by the coding. Consequently, we sought ways to improve performance by collapsing the coding where possible. For verbs, Outside and Hit verbs are subtypes of motion and can be subsumed under the Motion category. Similarly, prepositions group with verbs based on stasis or motion. The preposition category can be collapsed into positional (Position) and motion (Initial, Medial, Final) as well (with a third category for no preposition). For frame of reference, collapsing the six categories into complex (Intrinsic, Relative, Absolute) and non-complex (Named Location, Contiguity, Deixis) is consistent with Levinson [12]. For granularity, it was difficult for the coders to distinguish between Vista and Environmental. However, collapsing these two - essentially two types of singular points of view improved agreement. Overall, the collapsed agreement and kappa statistics are consistent with Howald and Katz and actually improve previous reported accuracies for preposition (Agreement = 78.35/ κ = 56.70) and frame (Agreement = 69.38/ κ = 38.76) [7]. 3.4
Discourse Sequence
Progression through discourse indicates continuous movement forward through speech time and, depending on the temporal makeup of a given clause, movement forward through narrative time. Further, discourse sequence has been shown to improve machine learning performance in tasks similar to those presented here [6, 7]. Based on the collapsed codings, and averaging all corpora, reasons why sequence boosts performance are illustrated in the following series of graphs.
444
B.S. Howald and E.G. Katz
Figures 1 and 2 represent the rhetorical relation and event architecture of the analyzed narratives. We have focused on narration and background/ elaboration consistent with the distribution of these relations in the data (cf. Table 2). We have also focused on aspectual (A) and stative (S) events which, like the rhetorical relations, appear to be in complementary distribution. Both figures demonstrate shifts from background/ elaboration relations and State events in the first 20% of the text to narration and Aspectual events, peaking between 30 and 50% of the text, and then attenuating in percentage distribution to the end of the text (possibly crossing back over - rhetorical relations).
Fig. 1. Rhetorical relation architecture
Fig. 2. Event architecture
Fig. 3. Tense architecture
Fig. 4. Aspect architecture
Figures 3-5 represent the annotated temporal information. Because of the dominant trends in this information - i.e., the narratives are almost exclusively past tense with little aspect and very little explicit reference to time - the patterns exhibited are weaker. Nonetheless, trends do emerge. In particular, present tense is in higher proportion in the first third of the data, giving way to past tense, and then increasing in the final third (Figure 3). The progressive aspect remains stable throughout with a slight peak at the beginning of the final third of the data (Figure 4). Explicit reference to time occurs in the first 20% of the discourse and then disappears (Figure 5).
The Spatiotemporal Architecture of Narrative Discourse
Fig. 5. Time architecture
445
Fig. 6. Spatial verb architecture
Against this temporal backdrop, similar trends are noted in the spatial elements. Figures 6 and 7 represent the percentage distributions of Motion (M) and State (S) spatial verbs and Motion (M) and Positional (P) spatial prepositions. In particular, State spatial verbs and Positional spatial prepositions are highest in the first 20% of the discourse and transition to Motion spatial verbs and prepositions, again peaking between 30 and 50% of the text, and then attenuating in the last third of the text.
Fig. 7. Spatial preposition architecture
Fig. 8. Figure architecture
This same pattern is seen for figures - self (1) vs. other (including NP) (3) (Figure 8); frames of reference - complex (C) vs. non-complex (NC) (Figure 9); and granularity - Figural (F) vs. Geographic (G) (Figure 10). Figures delay the crossing over until 30-40% but then conform to the demonstrated pattern. The pattern for frames of reference is more subtle and does not involve crossing over, but rather tendencies toward an inversion of the distribution as the discourse progresses. The pattern in granularity indicates that described space moves from larger to smaller to larger scales (Vista is excluded from Figure 10 as it is uniformly high). In sum, Figures 1-10 present the spatiotemporal event and rhetorical structural profile narrative discourses. In taking some proportional slice of the
446
B.S. Howald and E.G. Katz
Fig. 9. Frame architecture
Fig. 10. Granularity architecture
discourse, the suggestion emerges that attention to where we are in a given narrative text correlates to the type of spatial, temporal, event and rhetorical information that is statistically more likely to emerge. This suggestion will be explored further below in Sections 4.2-4.3. But first, consider a sample narrative from the DCP corpus. (6)
a. Oct-2009 –To reach the CP we have created a combined unit, consisting of a pointman (ex military guy), a como specialist and a interpreter-MRE bearer. b. We started our advance in Hanoi at 13-42. c. And hit our RON site in SonLa at 19-06. d. 31/10/2009 we left our vehicle on the road at a distance of 1.2 km to the C point. e. Our unit began to climb on the crest of the mountain. f. Our pointman quickly cut down the trail. g. And after 2,5 hours we reached the CP. h. The C point is located on the hillside, thickly overgrown with jungle.
(6) indicates a prototypical narrative that conforms to the sequence profiles indicate in Figures 1 - 10. (6) shifts from background / elaboration relations (6a) to narration (6b-c). (6) also exhibits expected shifts in tense without aspect (present (6a) to past (6b-e, 6g) to present (6h)); events (state (6a) to aspectual (6b-c); and temporal reference (6a-d). In terms of space, (6) also exhibits predicted shifts in granularity (larger (6b-c) to smaller (6d-e, 6g-h)); verb and preposition (motion (6b-c), state (6d) motion (6e, 6g) state (6h)); figure (self (6b), other (6c-d), self (6e), self (6g), other (6h)); and frame of reference (non-complex (6b-c), complex (6d-e), non-complex (6g-h)). Of course, not every narrative is going to be a perfect fit to the sequential profile. The idea is simply that of all of the narratives analyzed, there was no significant departure from this profile. There was no deviation or significant variation based on number of authors, contextual paramaters or degree of spatial density.
The Spatiotemporal Architecture of Narrative Discourse
4
447
Results and Analysis
Three experiments using supervised machine learning methods were constructed to test the posed hypothesis of whether or not a pervasive spatiotemporal event structure exists in narrative discourses despite the perceived limited contribution of spatial information. The first experiment is based only on explicit spatial information coded in the corpus. The second experiment is based on both implicit and explicit spatial information. The third experiment is based on both implicit and explicit spatial information as collapsed based on the results of the inter-rater reliability. Each experiment creates seven classifiers - five spatial features (Figure: 7-way classifier; Verb: 4-way classifier; Ground: 4-way classifier; Preposition: 17-way classifier; and Frame: 6-way classifier) and a rhetorical (7way classifier) and an event classifier (7-way classifier). We utilized the WEKA toolkit [30] and analyzed information within single clauses for spatial, temporal and event information and dual clauses for rhetorical relations. To illustrate, consider (7): (7)
a. Grimsby brought his Kong into the kitchen. b. NP, MOTION, FI, VIS, NL, .33, 0, ASP c. He barked. d. ?, ?, ?, ?, ?, .66, NAR, OCC e. NP, MOTION, FI, VIS, NL, .66, NAR, OCC f. Then he ran to the window g. 3, MOTION, FC, ENV, NL, 1, NAR, ASP
(7a) is coded with figure, verb, preposition, ground and frame (7b) NP, MOTION, FI, VIS, NL) consistent with the guidelines discussed above. (7f) is coded similarly ((7g) 3, MOTION, FC, ENV, NL). In addition to the spatial information, the clause’s proportional position is included, the rhetorical relation, and event. This string of information (henceforth “vector”) is then used to predict a given piece of information (henceforth “class”) based on the remaining seven pieces of information (henceforth “attributes”) - e.g., predicting a 3 figure (class) based on MOTION, FC, ENV, NL, 1 NAR, ASP. (7a-b) and (7f-g) contain explicit information whereas (7c) does not. In the first group of machine learning experiments, which focus on explicit spatial information, (7c) would simply not be included despite non-spatial information existing - .66, NAR, OCC (7d). In the second group of machine learning experiments, which focus on implicit spatial information, this clause would receive the same spatial coding as the previous clause (7b), with the sequence, rhetorical relation and event being updated (7e). The K*, Na¨ıve Bayes and J48 classifiers performed the best on our experiments. K* is a vector-based classifier. The K* classifier considers the differences between vectors in a given class as a distance. This distance is the Shannon Entropy [31] of the summed probability of all transformations of a vector in a given class to the vector of another class [32]. The Na¨ıve Bayes classifier utilizes Bayes Theorem which determines the conditional probability of two events in a given
448
B.S. Howald and E.G. Katz
data set. This classifier is “na¨ıve” as it only computes conditional probabilities between vector attributes and the class of that vector (attributes are assumed to be independent). A decision rule then selects the maximum probability class for a given series of attributes [33]. The J48 classifier is WEKA’s version of the C4.5 Decision Tree [34]. C4.5 relies on information entropy to decide which attributes should serve as nodes in a tree. The attributes selected are those that lead to the most effective split in classes as determined by the maximum gain in entropy. To be clear, what is being “predicted” is a specific type (class) of information (e.g., figure type, event type, etc. depending on the classifier) in an individual clause (or pair of clauses in the case of rhetorical relations), based the remaining information in an individual clause (attributes). 4.1
Experiment 1: Explicit Spatial Information
The results presented here are based on all three corpora combined. The total results, run at 10-fold cross-validation, mirror those found in the CNCC, DCP and CRI corpora individually. Table 4 presents the results of classifier performance based only on those clauses with explicit spatial information (and associated tense, aspect, time and sequence information). Table 4. Explicit spatial information classification performance (n = 1516) Coding NBayes J48Tree Figure 41.75 39.77 Verb 70.58 70.91 Preposition 32.45 31.33 Ground 57.05 53.56 Frame 51.91 54.61 Event 75.35 78.10 Rhetorical* 60.75 67.92
KStar NB(Seq) J48(Seq) K*(Seq) MC 42.21 42.54 39.51 40.69 29 70.38 57.51 70.05 68.60 35 33.70 34.36 31.86 30.73 16 58.64 57.51 54.41 55.73 39 52.70 52.63 54.41 47.95 43 77.55 74.54 78.46 73.12 61 58.84 60.00 67.73 57.86 54
The Na¨ıve Bayes, J48 and K* classifiers all perform similarly when classifying explicit spatial information only (no tense, aspect, time or sequence information) (NBayes, J48Tree and KStar columns). In particular, the K* Figure classifier performs up to 13% above the majority class baseline (MC); the J48 Verb classifier performs up to 35% above MC; the K* Preposition classifier performs up to 17% above MC; the K* Ground classifier performs up to 19% above MC; the J48 Frame classifier performs up to 11% above MC; the J48 Rhetorical classifier (*based on vector pairs) performs 13% above MC; and the J48 Event classifier performs up to 17% above MC. These differences between accuracy and MC are statistically significant (χ2 = 76.87, d.f. = 6, p ≤ .001). The inclusion of temporal (tense, aspect, time) and sequence information (NB(Seq), J48(Seq) and K*(Seq) columns) does not improve performance. Despite performances being above baseline across the board, overall accuracies, with the exception of verb and event, are low. F-scores (a measure of precision
The Spatiotemporal Architecture of Narrative Discourse
449
and recall) are similar to prediction accuracies - indicating that the prediction accuracies are not skewed. 4.2
Experiment 2: Explicit and Implicit Spatial Information
The results in Table 5 present classifier performance based on both explicit and implicit spatial information. As compared to Table 4, including implicit spatial information greatly increases the performance of, particularly, the J48 and K* classifiers. The J48 Figure classifier performs up to 15% above MC; the K* Verb classifier performs up to 37% above MC; the J48 Preposition classifier performs up to 27% above MC; the J48 Ground classifier performs up to 28% above MC; the J48 Frame classifier performs up to 20% above MC; the J48 Rhetorical classifier performs 18% above MC; and the K* Event classifier performs up to 24% above MC. These differences between accuracy and MC are statistically significant (χ2 = 142.57, d.f. = 6, p ≤ .001). Table 5. Explicit and implicit classification performance (n = 3421) Coding NBayes J48Tree Figure 42.32 57.01 Verb 65.30 73.65 Preposition 35.80 48.62 Ground 57.64 70.17 Frame 54.83 65.96 Event 52.38 54.89 Rhetorical 64.07 66.28
KStar NB(Seq) J48(Seq) K*(Seq) MC 54.86 42.70 68.65 72.25 32 75.85 65.06 77.57 84.12 38 46.91 37.70 63.42 65.68 21 64.01 57.05 77.42 79.50 42 64.01 55.21 74.85 76.52 45 56.12 52.32 55.10 51.06 32 58.42 64.49 66.76 55.02 48
These results are boosted further when considering temporal and sequence information. The K* Figure classifier performs up to 40% above MC; the K* Verb classifier performs up to 46% above MC; the K* Preposition classifier performs up to 44% above MC; the K* Ground classifier performs up to 37% above MC; and the K* Frame classifier performs up to 31% above MC. The rhetorical and event classifiers do not improve with the inclusion of temporal and sequence information; the J48 Rhetorical classifier performs 18% above MC; and the K* Event classifier performs up to 23% above MC. Nonetheless, all of these differences between accuracy and MC are statistically significant (χ2 = 275.15, d.f. = 6, p ≤ .001). F-scores remain similar to prediction accuracies. 4.3
Experiment 3: Explicit and Implicit Spatial Information (Collapsed Coding)
To provide a more “normalized” view of the implicit spatial information results, we ran the same classifiers with coded elements collapsed relative to inter-rater reliability performance. The collapsed codings change the following classifiers:
450
B.S. Howald and E.G. Katz
Verb: 2-way classifier; Preposition: 3-way classifier; Ground: 3-way classifier, Frame: 2-way classifier; and Rhetorical: 6-way classifier. These results are summarized in Table 6. Table 6. Collapsed explicit and implicit classification performance (n = 3421) Coding NBayes J48Tree Figure 44.15 47.39 Verb 74.26 77.80 Preposition 62.33 64.06 Ground 71.25 71.84 Frame 82.04 83.21 Event 52.52 53.34 Rhetorical 75.90 77.43
KStar NB(Seq) J48(Seq) K*(Seq) MC 48.50 46.16 56.95 61.87 32 78.09 76.08 79.59 84.64 61 65.14 63.04 68.50 73.65 45 72.77 71.75 77.25 79.38 45 83.12 81.95 83.85 86.25 81 53.17 52.99 53.49 54.85 32 68.99 76.56 77.70 64.65 48
The accuracy is increased for the K* and sequence information, but the performance over the majority class is lessened as compared to Table 5. The K* Figure classifier performs up to 29% above MC; the K* Verb classifier performs up to 23% above MC; the K* Preposition classifier performs up to 28% above MC; the K* Ground classifier performs up to 34% above MC; the K* Frame classifier performs up to 5% above MC; the Rhetorical classifier performs 29% above MC; and the Event classifier performs up to 22% above MC. Nonetheless, the results remain statistically significant (χ2 = 111.01, d.f. = 6, p ≤ .001). F-scores remain similar to prediction accuracies. The Rhetorical Relation classifier improves with the collapse of the background and elaboration relations. This makes sense as the background and elaboration relations are two of the three majority relations. Not having to distinguish between these two relations appears to improve performance.
5
Discussion
To restate, the central hypothesis tested is whether or not a pervasive spatiotemporal event structure exists in narrative discourses despite the perceived limited contribution of spatial information. The results based on explicit spatial information (Section 4.1), while modest, are consistent with previous research [6, 7] and suggest that spatial information demonstrates patterns relative to the structure of narrative discourse. These results extend previous research by considering temporal (tense, aspect, explicit time) and event information and suggest a more robust systematic relationship between spatial information and structural elements of narrative discourses. The results based on explicit and implicit spatial information (Section 4.2) are similar to previously reported results in Howald and Katz [7]. However, those results only considered implicit space for narrative event clauses. The results here appear to provide similar improvement to other clauses in narrative discourse. Because the implicit spatial information is a representation of maintaining explicit spatial information, which, in and of themselves, represent shifts in
The Spatiotemporal Architecture of Narrative Discourse
451
information, the high accuracy and statistically significant above baseline performance of the classifiers indicate that spatial, rhetorical and event information indeed pattern relative to temporal elements of discourse structure. Consequently, there does appear to be a generalized spatiotemporal event and rhetorical structure to narrative discourses, relative to sequence information, that exists despite spatial information failing to satisfy traditional diagnostics for determining the structural status of certain elements. However, temporal information in these tasks is largely based on textual sequence, rather than tense, aspect and explicit reference. Consequently, a question remains as to why the performance of the J48 and K* classifiers are better than the Na¨ıve Bayes classifier and why sequence information increases J48 and K* performance so dramatically. We suggest several reasons that are informative about the observed spatiotemporal phenomenon (Figures 1-10). First of all, J48 and K* rely on entropy of information for statistical calculations [31]. This suggests that information theoretic distributions of the attributes are somehow more useful, especially against a sequential backdrop, as opposed to other types of distributions (e.g. Bayesian). However, beyond simply observing this state of affairs, little more can be said at this point as to why this is the case (i.e., the success of modeling linguistic phenomena in information theoretic terms). Second, the J48 and K* classifiers make use of relationships between attributes and not just relationships between attributes and classes. This suggests that the co-occurrence of all attributes is relevant rather than just independent attributes. These relationships against text sequence (Figures 1-10) are also supported by Spearman’s rank correlation coefficient (ρ). In particualr, using 10% sections of discourse as the ”rank”, for ρ ≥ .794 (corresponding to a p ≤ .01) narration is negatively correlated with background/ elaboration relations, States are negatively correlated with Aspectual events, Aspectual events are negatively correlated with background/ elaboration relations, figures 1 and 3 are negatively correlated with granularities Figural and Geographic, Spatial verbs are negatively correlated with Motion prepositions, and Present tense is positively correlated with Stative verbs. Consequently, it is possible to model certain aspects of the spatiotemporal structure of narratives discourses generally. Based on a given clause’s position in a narrative discourse, we can say, with some discernible accuracy, what the temporal, rhetorical, event and spatial details are. Albeit true that the representation is underspecified along theoretical and methodological grounds - e.g. self vs. other, motion vs. state, large vs. small, complex vs. non-complex - it nonetheless appears to be feasible and not restricted to the occurrence of explicit space nor subject matter. The results further demonstrate how apparent spatiotemporality is as a backdrop in narrative discourses generally. There is, of course, going to be variations in the patterns in any given narrative - dependent on a number of different factors (e.g. context, subject matter), but we suggest that, certainly based on the collapsed codings, the spatiotemporality of narrative discourses will pattern similarly.
452
6
B.S. Howald and E.G. Katz
Conclusion
We have restricted the discussion here to the formal linguistics of narrative discourse. However, narratives are a fairly ubiquitous form of discourse that is studied in many different fields focusing on common elements. The results here suggest a spatiotemporal event profile based on sequence that is potentially useful for a number of inference, knowledge representation, GIS and artificial intelligence tasks. Specifically, the present research indicates that knowing which item in an information series within a narrative discourse allows for, broadly, the prediction of the type of spatial, temporal, event and rhetorical information expected to be seen. Further, because the type of linguistic information attended to bears a close relationship with artificial intelligence and knowledge representation theories and is consistent with other standardizations (e.g. GUM Space, ISO Space). A relatively coarse level of specification is represented here. However, the annotation scheme is applicable to interdisciplinary tasks. Overall, there appears to be some discursive constraints on the type of linguistic spatial information. The relationship between these and other cognitive constraints remains to be seen, but could have a profound impact on the interpretation of verbal components of spatial informaiton systems. There are a number of avenues of future research. First, we would like to determine the spatiotemporal model of non-narrative discourses (e.g. news text). It could be that the theoretical link between the occurrence of spatial information and narrative discourses being based on spatial cognitive encodings restricts the applicability of the methodological approach. Second, we would like to further explore the use of spatial information as an additional methodological tool for natural language processing discourse research. In particular, similar to artificial intelligence and knowledge representation applications, utilizing template information for information retrieval, discourse inference and resolving informaiton in Q/A systems. Third, we would like to determine if the patterns exhibited in spatial and temporal information are relative to other types of semantic information or something more abstract such as specificity of detail (granularity). Lastly, there is more variation in the non-collapsed codings and interesting trends emerge in the individual corpora. Future analysis will delve deeper into the nature of the variation. Acknowledgements. We would like to thank our inter-rater Evan Leibowitz. Thank you to Heidi Hamilton, David Herman, James Pustejovsky and Laure Vieu for insightful discussions. Thank you also to four anonymous reviewers for beneficial comments.
References 1. Mann, W., Thompson, S.: Rhetorical Structure Theory: A Framework for The Analysis of Texts. International Pragmatics Association Papers in Pragmatics 1, 79–105 (1987)
The Spatiotemporal Architecture of Narrative Discourse
453
2. Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003) 3. Partee, B.: Nominal and Temporal Anaphora. Linguistics and Philosophy 7(3), 243–286 (1984) 4. Pustejovsky, J., Casta˜ no, J., Ingria, R., Saur, R., Gaizauskas, R., Setzer, A., Katz, G.: TimeML: Robust Specification of Event and Temporal Expressions in Text. In: Proceedings of the IWCS-5, Fifth International Workshop on Computational Semantics (2003) 5. Herman, D.: Spatial Reference in Narrative Domains. Text 2(4), 515–541 (2001) 6. Howald, B.: Linguistic Spatial Classifications of Event Domains in Narratives of Crime. Journal of Spatial Information Science 1, 75–93 (2010) 7. Howald, B., Katz, G.: The Exploitation of Spatial Information in Narrative Discourse. In: Bos, J., Pulman, S. (eds.) Proceedings of the Ninth International Workshop on Computational Semantics, pp. 175–184 (2011) 8. Smith, C.: Modes of Discourse: The Local Structure of Texts. Cambridge University Press, New York (2003) 9. Linde, C., Labov, W.: Spatial Networks as a Site for the Study of Language and Thought. Language 51, 924–939 (1975) 10. Labov, W.: The Transformation of Experience in Narrative Syntax. In: Labov, W. (ed.) Language in the Inner City, pp. 354–396. University of Pennsylvania Press, Philadelphia (1972) 11. Talmy, L.: Toward a Cognitive Semantics, vol. 1. The MIT Press, Cambridge (2000) 12. Levinson, S.: Language and Space. Annual Review of Anthropology 25(1), 353–382 (1996) 13. B¨ uhler, K.: The Deictic Field of Language and Deictic Words. In: Jarvella, R., Wolfgang, K. (eds.) Speech, Place and Action: Studies in Deixis and Related Topics, pp. 9–30. John Wiley and Sons, Chichester (1982) 14. Randell, D., Cui, Z., Cohn, A.: A Spatial Logic Based on Regions and Connection. In: Proceedings of KR 1992, pp. 394–398. Morgan Kaufmann, Los Altos (1992) 15. Rashid, A., Shariff, B., Egenhofer, M.: Natural-Language Spatial Relations Between Linear and Areal Objects: The Topology and Metric of English Language Terms. International Journal of Geographical Information Science 12(3), 215–246 (1998) 16. Hois, J., Tenbrink, T., Ross, R., Bateman, J.: GUM-Space The Generalized Upper Model Spatial Extension: A Linguistically-Motivated Ontology For The Semantics of Spatial Language. OntoSpace Technical Report 09 (2009) 17. Asher, N., Sablayrolles, P.: A Typology and Discourse Semantics for Motion Verbs and Spatial PPs in French. Journal of Semantics 12(2), 163–209 (1995) 18. Muller, P.: Topological Spatio-temporal Reasoning and Representation. Computational Intelligence 18(3), 420–450 (2002) 19. Pustejovsky, J., Moszkowicz, J.: Integrating Motion Predicate Classes with Spatial and Temporal Annotations. In: COLING 2008, pp. 95–98 (2008) 20. Montello, D.: Scale and Multiple Psychologies of Space. In: Campari, I., Frank, A. (eds.) COSIT 1993. LNCS, vol. 716, pp. 312–321. Springer, Heidelberg (1993) 21. Hobbs, J.: On The Coherence and Structure of Discourse. CSLI Technical Report, 85–37 (1985) 22. Rapaport, W., Segal, E., Shapiro, S., Zubin, D., Bruder, G., Duchan, J., Almeida, M., Daniels, J., Galbraith, M., Wiebe, J., Yuhan, A.: Deictic Centers and the Cognitive Structure of Narrative Comprehension. Technical Report No. 89-01. SUNY Buffalo Department of Computer Science, Buffalo, NY (1994)
454
B.S. Howald and E.G. Katz
23. Ide, N., Suderman, K.: The Open American National Corpus (OANC), http://www.AmericanNationalCorpus.org/OANC 24. Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960) 25. Carletta, J.: Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics 22(2), 249–254 (1996) ¨ 26. Wiebe, J., O’Hara, T., Ohrstr¨ om-Sandgren, T., McKeever, K.: An Empirical Approach to Temporal Reference Resolution. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP-1997), pp. 174– 186. Association for Computational Linguistics SIGDAT, Providence (1997) 27. Puscasu, G., Mititelu, V.: Annotation of WordNet Verbs with TimeML Event Classes. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Tapias, D. (eds.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008). European Language Resources Association (2008) 28. Pan, F., Mulka, R., Hobbs, J.: Extending TimeML with Typical Durations of Events. In: Proceedings of Annotating and Reasoning about Time and Events workshop at the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pp. 38–45 (2006) 29. Sporleder, C., Lascarides, A.: Exploiting Linguistic Cues to Classify Rhetorical Relations. In: Proceedings of Recent Advances in Natural Language Processing (RANLP-2005), pp. 532–539 (2005) 30. Witten, I., Frank, E.: Data Mining Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2002) 31. Shannon, C.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423, 623–656 (1948) 32. Cleary, J., Trigg, L.: K*: An Instance-based Learner Using an Entropic Distance Measure. In: Prieditis, A., Russel, S. (eds.) Proceedings of the 12th International Conference on Machine Learning, pp. 108–113. Morgan Kaufmann, San Francisco (1995) 33. Zhang, H.: The Optimality of Naive Bayes. In: Barr, V., Markov, Z. (eds.) Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS). AAAI Press, Menlo Park (2004) 34. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Author Index
Abarbanell, Linda Asadzadeh, Parvin
245 168
Bhatt, Mehul 210 Bruny´e, Tad T. 231 Chipofya, Malumbo 20 Cialone, Claudia 391 Claramunt, Christophe 148 Cohn, Anthony G. 110 De Felice, Giorgio 188 Del Mondo, G´eraldine 148 Duckham, Matt 126 Eschenbach, Carola
283, 328
Fabrikant, Sara Irina 1 Fogliaroni, Paolo 188 Gagnon, Stephanie A. Guan, Lin-Jie 126
231
Habel, Christopher 328 Hamhoum, Fathi 40 Hirtle, Stephen C. 73 Hogg, David C. 110 Howald, Blake Stephen 434 Ishikawa, Toru
90
Janowicz, Krzysztof
350
Katz, E. Graham 434 Kray, Christian 40 Kuhn, Werner 304, 371 Kulik, Lars 168
Lee, Jae Hee 210 Lee, Wang-Chien 350 Li, Peggy 245 Lindner, Felix 283 Lohmann, Kris 328 Maddox, Keith B. 231 Mahpeykar, Narges 413 Montana, Rachel 245 Montello, Daniel R. 264 M¨ ulligann, Christoph 350 Scheider, Simon 304 Schultz, Carl 210 Schwering, Angela 20 Shirabe, Takeshi 57 Sridhar, Muralikrishna 110 Stell, John 148 Stock, Kristin 391 Takemiya, Makoto 90 Tanin, Egemen 168 Taylor, Holly A. 231 Tenbrink, Thora 73, 371 Thibaud, Remy 148 Timpf, Sabine 73 Tyler, Andrea 413 Wallgr¨ un, Jan Oliver 188 Wang, Jia 20 Wang, Qi 231 Wilkening, Jan 1 Wirth, Anthony 168 Xiao, Danqing Ye, Mao
350
264